Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-30 Thread Jason Gunthorpe
On Wed, Sep 30, 2009 at 11:44:56AM +0200, Ingo Molnar wrote: OK. It would be nice to tie into something more general, but I think I agree -- perf counters are missing the filtering and the no lost events that ummunotify does have. [...] Performance events filtering is being worked

Re: [ofa-general] This list expires... tomorrow?

2009-09-29 Thread Jason Gunthorpe
On Tue, Sep 29, 2009 at 01:25:13PM -0400, Jeff Squyres wrote: Plus, won't it be harder on the spam controls on the new list to recognize a variety of old names for this list? Many spam controls happen at the SMTP session level, forwarding messages will defeat that. Jason

Re: [ofa-general] help install ofed 1.4 on Centos 5.2

2009-09-28 Thread Jason Gunthorpe
On Mon, Sep 28, 2009 at 10:15:08AM -0400, Brian J. Murrell wrote: This is a problem we run into with Lustre somewhat frequently. The issue is that deploying OFED 1.5 (i.e. beta software) in a production environment is completely unacceptable, yet leaving one's systems open to kernel

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-28 Thread Jason Gunthorpe
On Mon, Sep 28, 2009 at 10:49:23PM +0200, Pavel Machek wrote: I don't remember seeing discussion of this on lkml. Yes it is in -next... eg http://lkml.org/lkml/2009/7/31/197 and followups, or search for v2 and earlier patches. Well... it seems little overspecialized. Just

Re: [ofa-general] Problem while running ib tests

2009-09-23 Thread Jason Gunthorpe
On Wed, Sep 23, 2009 at 12:01:02PM +0300, Moni Shoua wrote: If I try to run any IB bandwidth test or latency test it end us with warning Conflicting CPU frequency values detected: 2394.00 != 1596.00. One more option to solve it (besides the -F) is to disable the power saving of the

Re: [ofa-general] Problem while running ib tests

2009-09-23 Thread Jason Gunthorpe
On Wed, Sep 23, 2009 at 08:32:20AM -0700, Ira Weiny wrote: On Wed, 23 Sep 2009 09:11:49 -0600 Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Sep 23, 2009 at 12:01:02PM +0300, Moni Shoua wrote: If I try to run any IB bandwidth test or latency test it end us

Re: [ofa-general] Problem while running ib tests

2009-09-23 Thread Jason Gunthorpe
On Wed, Sep 23, 2009 at 06:43:12PM +0300, Moni Shoua wrote: In the man page for clock_gettime() I see that CLOCK_PROCESS_CPUTIME_ID may be more accurate for (high) performance measurements. What do you think? The CPUTIME counters are of the 'time spent running code' variety, they do not time

Re: [ofa-general] OFED interfering with Ethernet

2009-09-20 Thread Jason Gunthorpe
. Hmm, double weird.. bonding related perhaps? -- Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada ___ general mailing list general@lists.openfabrics.org http

Re: [ofa-general] OFED interfering with Ethernet

2009-09-19 Thread Jason Gunthorpe
On Sat, Sep 19, 2009 at 11:57:01PM +0200, Sebastian Kalcher wrote: If I do an openib stop and restart iperf everything returns to normal. The behavior is reproducible on different nodes with the same constellation. I tried OFED 1.4 and 1.5beta (kernel 2.6.24). What completely confuses

Re: [ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)

2009-09-18 Thread Jason Gunthorpe
On Fri, Sep 18, 2009 at 03:22:22PM -0700, Ira Weiny wrote: main() { foo = libibnetdisc_setup(); libibnetdisc_discover_all(foo,res); // Do interesting things with res. } That is the current use case. However I can see use cases were discover is called periodically to get a

Re: [ofw] Re: [ofa-general] [RFC] 3/5: IB ACM: libibacm

2009-09-18 Thread Jason Gunthorpe
On Fri, Sep 18, 2009 at 03:28:48PM -0700, Ira Weiny wrote: One is for static defines CL_NTOH and the other is for variables at run time. I found this code in Linux. Thats gross, and is exactly why you don't do this yourself... bswap64/32/16 do this all automatically. ntohl also do it and are

Re: [ofw] Re: [ofa-general] [RFC] 3/5: IB ACM: libibacm

2009-09-18 Thread Jason Gunthorpe
On Fri, Sep 18, 2009 at 05:05:39PM -0700, Ira Weiny wrote: I'd say just use the ntohl, ntohs, and bswap64 macros directly and Window can provide headers with whatever it needs instead. They are already doing this.. I agree but ntohl etc do not seem to work for the macros which are

Re: [ofw] Re: [ofa-general] [RFC] 3/5: IB ACM: libibacm

2009-09-17 Thread Jason Gunthorpe
On Thu, Sep 17, 2009 at 02:40:50PM -0700, Ira Weiny wrote: I'm not sure this is a good idea. ibutils (ibis and ibmgtsim) wants ib_types.h but does not want libibumad. Would a separate library be a better solution then? I would prefer that as well. Please no more libraries, there are too

Re: [ofw] Re: [ofa-general] [RFC] 3/5: IB ACM: libibacm

2009-09-17 Thread Jason Gunthorpe
On Thu, Sep 17, 2009 at 04:26:44PM -0700, Sean Hefty wrote: libibcm needs to learn how to do PR queries, it should have a good PR query API since libibcm is pretty useless without being able to do PR queries.. PR queries don't work - regardless of what the API looks like or where it

Re: [ofw] Re: [ofa-general] [RFC] 3/5: IB ACM: libibacm

2009-09-17 Thread Jason Gunthorpe
On Thu, Sep 17, 2009 at 04:47:17PM -0700, Sean Hefty wrote: PR queries work fine, I don't understand your comment. MPI does not use PR queries because it does not scale. Not all the world is MPI. Your new acm stuff still does PR queries. Anyone using libibverbs multicast needs to do PR

Re: [ofa-general] Distributing MSI-X interrupts over multiple cores / CPUs ?

2009-09-15 Thread Jason Gunthorpe
On Tue, Sep 15, 2009 at 07:04:13PM -0400, Richard Frank wrote: We are running a 2.6.18 kernel.. with OFED 1.3.1 and OFED 1.4.2 on multiple different Intel based hardware platforms. Well, a quick glance at the MSI-X code in 2.6.18 shows the trouble, I think: unsigned int dest_cpu =

[ofa-general] Re: [PATCH mthca] Update function prototypes to match ibverbs

2009-09-14 Thread Jason Gunthorpe
On Mon, Sep 14, 2009 at 11:23:27AM -0700, Roland Dreier wrote: thanks, applied both this and mlx4 version. seems that libcxgb3, libipathverbs and libnes would want similar treatment? Hmm, yes.. Say, have you thought much about bringing libibverbs, libmlx4, libmthca, libcxgb3, libnes and

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-11 Thread Jason Gunthorpe
On Thu, Sep 10, 2009 at 11:22:20PM -0700, Roland Dreier wrote: As I said, it does mean that MPI can invalidate cached registrations for COWed memory, which might be useful in case a parent forks and then touches memory it used to use for RDMA, but I think that's the easier part of the

Re: [ofa-general] Fedora 10 OFED support plans

2009-09-10 Thread Jason Gunthorpe
On Thu, Sep 10, 2009 at 01:43:10PM -0500, Jeremy Enos wrote: So I accepted that I'd have to move Fedora version to get OFED support... and I was ok with that. However, I see now that FC12 is not released, and won't be until November. I have tested FC11, and it doesn't work w/ the OFED 1.5

Re: [ofa-general] Fedora 10 OFED support plans

2009-09-10 Thread Jason Gunthorpe
On Thu, Sep 10, 2009 at 03:00:56PM -0500, Jeremy Enos wrote: Fails w/ ofa_kernel like the others have... I didn't test excluding this rpm with FC11, but the others also fail elsewhere w/ this rpm excluded- so I'm guessing FC11 would as well. I included the output (and last 50 lines of log) in

Re: [ofa-general] [PATCH] IPoIB: check multicast address format (V2)

2009-09-01 Thread Jason Gunthorpe
Check that the format of the multicast link address is correct before taking it from dev-mc_list to priv-multicast_list. This way we never try to send a bogus address to the SA, and prevents badness from erronous 'ip maddr addr add', broken bonding drivers, or whatever. Signed-off-by: Jason

[ofa-general] [PATCH] Remove duplicated umad_get_mad.3 from Makefile.am

2009-08-28 Thread Jason Gunthorpe
Fixes builds on FC11. Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- libibumad/Makefile.am |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/libibumad/Makefile.am b/libibumad/Makefile.am index 50222df..27c6ff2 100644 --- a/libibumad/Makefile.am +++ b

Re: [ofa-general] [PATCH] IPoIB: check multicast address format

2009-08-26 Thread Jason Gunthorpe
, it has the down-side-effect of e.g loosing routes already set for the the bond while adding the underline IPoIB devices, so if Jason's patch is enough Is this true? That is pretty ugly, but probably manageable.. -- Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832 Chief

Re: [ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)

2009-08-26 Thread Jason Gunthorpe
On Wed, Aug 26, 2009 at 04:40:26PM -0700, Ira Weiny wrote: Of course! :-) But first I would like to mention some numbers from the prototype code I have. When running on a small fabric the additional overhead of thread creation actually slows down the scan. :-( It seems strange to me to

[ofa-general] Re: [PATCHv2 RESEND] IB/IPoIB: Don't let a bad muticast address in the join list stop subsequent joins

2009-08-24 Thread Jason Gunthorpe
On Mon, Aug 24, 2009 at 04:51:03PM +0300, Moni Shoua wrote: http://lists.openfabrics.org/pipermail/general/2009-July/060496.html The discussion in the link above didn't end with a decision. You were asking about a way to inject illegal mcast addresses from userspace to ib_ipoib and Jason

Re: [ofa-general] Re: [PATCHv2 RESEND] IB/IPoIB: Don't let a bad muticast address in the join list stop subsequent joins

2009-08-24 Thread Jason Gunthorpe
On Mon, Aug 24, 2009 at 07:48:51PM +0300, Yossi Etigin wrote: Are you suggesting to sort the list each time we have add/remove a new entry, or search for the correct location to insert the new entry? I'm afraid that would add too much complexity and be inefficient (in O() terms). 1) This is

[ofa-general] [PATCH] IPoIB: check multicast address format

2009-08-20 Thread Jason Gunthorpe
Check that the format of the multicast link address is correct before taking it from dev-mc_list to priv-multicast_list. This way we never try to send a bogus address to the SA, and prevents badness from erronous 'ip maddr addr add', broken bonding drivers, or whatever. Signed-off-by: Jason

Re: [ofa-general] Setting the rate in Infiniband.

2009-08-06 Thread Jason Gunthorpe
On Wed, Aug 05, 2009 at 08:03:04PM -0400, Ashwath Narasimhan wrote: The reason why I need such small rates is because I interface the Infiniband HCA to an FPGA via an Infiniband physical link. Imagine the FPGA as a simple repeater that simply forwards the infiniband signals to the Target

Re: [ofa-general] [PATCH] ib_send_bw -b can hang due to too few CQ entries

2009-08-06 Thread Jason Gunthorpe
On Thu, Aug 06, 2009 at 12:48:10PM -0700, Ralph Campbell wrote: When ib_send_bw is run in bi-directional mode (-b), it doesn't create enough completion queue entries for both the send *and* the receive completions. Thus, CQ entries are lost due to the queue being full and the test can hang.

Re: [ofa-general] [PATCHv4 10/10] mlx4: Add RDMAoE support - allow interfaces to correspond to each other

2009-08-05 Thread Jason Gunthorpe
On Wed, Aug 05, 2009 at 11:30:23AM +0300, Eli Cohen wrote: for setting the GID table of a port has been added. Currently, each IB port has a single GID entry in its table and that GID entery equals the link local IPv6 address. FWIW, I like this approach, and mapping to/from this GID to the

Re: [ofa-general] [PATCH] ipoib: refresh path when remote lid changes

2009-08-03 Thread Jason Gunthorpe
On Mon, Aug 03, 2009 at 08:10:07PM +0300, Yossi Etigin wrote: We have customers with large fabrics and different machines/operation systems, where the LID does not always stay the same.They are experiencing loss of IPoIB connectivity. The patch above solved that. Besides, according to the

[ofa-general] [PATCH] Do not use enum object types for bitfields

2009-07-30 Thread Jason Gunthorpe
and sparc also appear compatible with this choice. Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- include/infiniband/driver.h |8 include/infiniband/verbs.h | 28 ++-- man/ibv_modify_qp.3 |2 +- man/ibv_modify_srq.3

Re: [ofa-general] [PATCH] ipoib: refresh path when remote lid changes

2009-07-27 Thread Jason Gunthorpe
On Mon, Jul 27, 2009 at 08:11:42PM +0300, Yossi Etigin wrote: If the LID of an ipoib neighbour changes without a SM event on the local node, IPoIB will keep caching the invalid path until the device is flushed. The patch below will remove the path for every incoming ARP packet where the

Re: [ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

2009-07-23 Thread Jason Gunthorpe
On Thu, Jul 23, 2009 at 03:53:07PM +0300, Yevgeny Kliteynik wrote: I would be very surprised if any implementation had a significant overhead for the actual set operation compared to the packet handling path. Certainly in our products the incremental cost of a set vs processing a DR is

[ofa-general] [PATCH ibverbs] Do not use enum object types for bitfields

2009-07-23 Thread Jason Gunthorpe
-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- include/infiniband/driver.h |8 include/infiniband/verbs.h | 28 ++-- src/cmd.c | 10 +- src/compat-1_0.c| 18 +- src/verbs.c |8

[ofa-general] [PATCH mlx4] Update function prototypes to match ibverbs

2009-07-23 Thread Jason Gunthorpe
Replace certain enums with int. Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- src/mlx4.h |8 src/verbs.c |8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/mlx4.h b/src/mlx4.h index 0c658cf..4445998 100644 --- a/src/mlx4.h +++ b

[ofa-general] [PATCH mthca] Update function prototypes to match ibverbs

2009-07-23 Thread Jason Gunthorpe
Replace certain enums with int. Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- src/mthca.h |8 src/verbs.c | 10 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/mthca.h b/src/mthca.h index 9a2e362..bd1e7a2 100644 --- a/src/mthca.h

Re: [ofa-general] [PATCH ibverbs] Do not use enum object types for bitfields

2009-07-23 Thread Jason Gunthorpe
On Thu, Jul 23, 2009 at 02:09:11PM -0700, Roland Dreier wrote: Thanks... I guess we can take a stab at doing this and see if anyone breaks because of it. I do notice that this patch doesn't update the man pages and therefore leaves them out of sync with the actual code. And that makes me

Re: [ofa-general] [PATCH] Allow paths to the device specific library to be absolute

2009-07-23 Thread Jason Gunthorpe
On Thu, Jul 23, 2009 at 02:17:17PM -0700, Roland Dreier wrote: I don't have a debian logon, but I do have an ia64 machine: Ah, in the good old days every DD had one of those :) BTW I'm not a dd -- just have a sponsor for my packages. Maybe someday I'll try to become a dd but for now

Re: [ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

2009-07-23 Thread Jason Gunthorpe
On Thu, Jul 23, 2009 at 05:43:38PM -0400, Hal Rosenstock wrote: The proposed algorithm is less aggressive in terms of sending DR SMPs than the current one which seems to work well enough based on field experience. If the concern with this approach is breaking things over the current approach,

Re: [ofa-general] Infiniband and Ubuntu

2009-07-23 Thread Jason Gunthorpe
On Thu, Jul 23, 2009 at 05:27:23PM -0400, Mark J. Pearrow wrote: The short question is is there a howto for getting Infiniband and Ubuntu working? I've done this and other similar things, it isn't too hard.. I like to use 2.6.30 these days for ConnectX, this is the base kernel for OFED 1.5

Re: [ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

2009-07-22 Thread Jason Gunthorpe
On Wed, Jul 22, 2009 at 03:40:50PM -0400, Hal Rosenstock wrote: Doing this without also using LID routing to the target switch is just going to overload the SMAs in the intermediate switches with too many DR SMPs. The processing time of LR (LID routing) v. DR forwarding (direct routed)

Re: [ofa-general] [PATCH] Allow paths to the device specific library to be absolute

2009-07-22 Thread Jason Gunthorpe
On Wed, Jul 22, 2009 at 11:55:53AM -0700, Roland Dreier wrote: Didn't reply to this before, sorry. But yes I am interested. As far as using enum for bit flags, what is the C++ idiom for that? Well, it isn't just C++, it applies to C too - but gcc isn't as sticky with the errors. Basically,

Re: [ofa-general] [PATCH] Allow paths to the device specific library to be absolute

2009-07-22 Thread Jason Gunthorpe
On Wed, Jul 22, 2009 at 03:05:48PM -0700, Roland Dreier wrote: I'd like to see a result for ia64 and ppc64.. Roland do you have a Debian machine logon? Could you check this on merulo.debian.org (or merkel)? Unfortunately I don't think pescetti is a ppc64 :| I don't have a debian

Re: [ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

2009-07-22 Thread Jason Gunthorpe
On Wed, Jul 22, 2009 at 08:28:25PM -0400, Hal Rosenstock wrote: But you overload the switch the SM is connected to with processing N*limit DR SMPs rather than just 'limit' SMPs. That is what concerns me. As I said, the current algorithm is worse as it sends N*no limit DR SMPs (where no

Re: [ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

2009-07-21 Thread Jason Gunthorpe
On Tue, Jul 21, 2009 at 02:03:12PM -0400, Hal Rosenstock wrote: Currently, MADs are pipelined to a single switch at a time which effectively serializes these requests due to processing at the SMA. This patch pipelines (stripes) them across the switches first before proceeding with successive

[ofa-general] [PATCH] Clarify the syntax of the hop_weights_file

2009-07-20 Thread Jason Gunthorpe
- GUID is a port guid, and is specified in hex with 0x prefix - Lines with # are comments - Weights are simplex not duplex. --- opensm/man/opensm.8.in | 13 + 1 files changed, 9 insertions(+), 4 deletions(-) I met some people who were trying to use this and we had to look through

Re: [ofa-general] [PATCH] add pkgconfig support to ibverbs library

2009-07-20 Thread Jason Gunthorpe
On Mon, Jul 20, 2009 at 11:17:29AM -0700, Steven Dake wrote: The attached patch adds support for pkgconfig to the ibverbs library. Cloned from Roland's kernel.org git tree. Erm, am I missing something? Shouldn't your patches include some use of the .pc files for libraries downstream of

[ofa-general] Re: [PATCH ibverbs] Make the gid argument to ibv_attach_mcast and ibv_detach_mcast const

2009-07-20 Thread Jason Gunthorpe
On Mon, Jul 20, 2009 at 10:48:55AM -0700, Roland Dreier wrote: -int ibv_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); +int ibv_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid); Seems fine to me... I can't think of any risk of this breaking

Re: [ofa-general] [PATCH] add pkgconfig support to ibverbs library

2009-07-20 Thread Jason Gunthorpe
On Mon, Jul 20, 2009 at 12:07:19PM -0700, Steven Dake wrote: On Mon, 2009-07-20 at 12:43 -0600, Jason Gunthorpe wrote: On Mon, Jul 20, 2009 at 11:17:29AM -0700, Steven Dake wrote: The attached patch adds support for pkgconfig to the ibverbs library. Cloned from Roland's kernel.org git

[ofa-general] [PATCH mthca] Remove empty stubs for detach/attach_mcast

2009-07-20 Thread Jason Gunthorpe
Just use ibv_cmd_* directly. Solves const correctness warnings due to changes in libibverbs Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- src/mthca.c |4 ++-- src/mthca.h |2 -- src/verbs.c | 10 -- 3 files changed, 2 insertions(+), 14 deletions(-) diff

[ofa-general] [PATCH mlx4] Remove empty stubs for detach/attach_mcast

2009-07-20 Thread Jason Gunthorpe
Just use ibv_cmd_* directly. Solves const correctness warnings due to changes in libibverbs Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- src/mlx4.c |4 ++-- src/mlx4.h |2 -- src/verbs.c | 10 -- 3 files changed, 2 insertions(+), 14 deletions(-) diff

Re: [ofa-general] [PATCH] add pkgconfig support to ibverbs library

2009-07-20 Thread Jason Gunthorpe
On Mon, Jul 20, 2009 at 04:12:34PM -0700, Steven Dake wrote: The librdmacm maintainer would have to decide if it is worth it for him to use pkgconfig functionality in his configure scripts. For other downstream projects that link against librdmacm or libibverbs that follow the upstream

Re: [ofa-general] [PATCH] add pkgconfig support to ibverbs library

2009-07-20 Thread Jason Gunthorpe
On Mon, Jul 20, 2009 at 07:02:45PM -0700, Steven Dake wrote: have git locations for libibumad and libibmad? Hmm, these are part of opensm git://git.openfabrics.org/~sashak/management.git Jason ___ general mailing list general@lists.openfabrics.org

Re: [ofa-general] [PATCH] IB/IPoIB: Don't let a bad muticast address in the join list stop subsequent joins

2009-07-19 Thread Jason Gunthorpe
On Sun, Jul 19, 2009 at 11:35:25AM +0300, Or Gerlitz wrote: Jason Gunthorpe wrote: Is there any way userspace can inject a bogus multicast address? Can you do it with netlink? ip maddr add address ... dev ib0 aren't the permissions needed for this being the same as for those

[ofa-general] [PATCH ibverbs] Make the gid argument to ibv_attach_mcast and ibv_detach_mcast const

2009-07-18 Thread Jason Gunthorpe
This constness flows through to the driver call struct and into the drivers and back into ibv_cmd_attach_mcast/ibv_cmd_detach_mcast. Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- include/infiniband/driver.h |4 ++-- include/infiniband/verbs.h |8 man

Re: [ofa-general] [PATCH] Allow paths to the device specific library to be absolute

2009-07-17 Thread Jason Gunthorpe
++ or'ing enum members results in a type that is the enum base integral type, not the enum itself. g++ produces a warning by default. Very annoying :) Are you interested in patches for these too? -- Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832 Chief Technology Officer

Re: [ofa-general] [PATCH] IB/IPoIB: Don't let a bad muticast address in the join list stop subsequent joins

2009-07-17 Thread Jason Gunthorpe
On Wed, Jul 15, 2009 at 09:01:05AM -0700, Roland Dreier wrote: I took your advice and sent a patch to bonding to fix the issue there to which I am waiting for comment) but I still think the patch for IPoIB is still needed. Without it, IPoIB is exposed to a DoS attack by a module

[ofa-general] SDP and stock kernel gets BUG?

2009-07-14 Thread Jason Gunthorpe
Hey all, I'm trying to use SDP with stock 2.6.30.1 plus the 'drivers/infiniband/ulp/sdp' directory from ofa_kernel-1.5-ofed20090713.src.rpm and I get this BUG ON: BUG: scheduling while atomic: ib_cm/0/4209/0x0003 Modules linked in: ib_sdp w83793 hwmon_vid rdma_ucm rdma_cm iw_cm ib_addr

Re: [ofa-general] SDP and stock kernel gets BUG?

2009-07-14 Thread Jason Gunthorpe
On Tue, Jul 14, 2009 at 04:14:59PM +0300, Amir Vadai wrote: Hi, I will post a fix soon. Thanks Amir! BTW - we are testing SDP here and trying to track down a performance regression - using the 2.6.30.1 combined with OFED-1.5 SDP performs poorly while 2.6.27.10 combined with OFED-1.4 SDP

[ofa-general] [PATCH] Allow paths to the device specific library to be absolute

2009-07-14 Thread Jason Gunthorpe
If the driver line starts with a / then no lib prefix is applied and the full path is passed to dlopen. This lets a completely contained installation exist that relies on RPATH for the binaries and this mechanism for the drivers. Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com

Re: [ofa-general] patch to ib_addr for sending arps

2009-07-13 Thread Jason Gunthorpe
On Sun, Jul 12, 2009 at 08:38:38PM -0700, leo.tomi...@oracle.com wrote: Associating the device with the source IP seems to be the correct thing to do in general, but I initially avoided it in favor of source based routing rules/tables since Linux does not do this by default. Source based

Re: [ofa-general] patch to ib_addr for sending arps

2009-07-13 Thread Jason Gunthorpe
On Mon, Jul 13, 2009 at 01:35:08PM -0700, leo.tomi...@oracle.com wrote: Right, there should only be one route lookup call. And the send_arp should match what TCP/UDP are doing, I'm pretty sure they don't use neigh_event_send like ib_addr is, or if they do, they are not using

[ofa-general] Re: [PATCH] rdma_cm: Add debugfs entries to monitor rdma_cm connections

2009-07-08 Thread Jason Gunthorpe
On Wed, Jul 08, 2009 at 12:56:40PM +0300, Or Gerlitz wrote: Jason Gunthorpe wrote: Well, thats the rub isn't it? debugfs is not 'production ready' (by definition) so why spend time on it? to allow debugging, diagnosing problems Why the resistance to doing a proper job and solving

[ofa-general] Re: [PATCH] rdma_cm: Add debugfs entries to monitor rdma_cm connections

2009-07-07 Thread Jason Gunthorpe
On Tue, Jul 07, 2009 at 01:12:54PM +0300, Or Gerlitz wrote: Steve Wise wrote: I agree that this is useful for debugging. Roland, Sean, I agree with Steve and Moni. Today there's no way to know what rdma-cm connections/sessions are open now and with patch there is a way, so debugfs support

Re: [ofa-general] [infiniband-diags] [PATCH] [5/5] libibnetdisc cleanup patches

2009-07-07 Thread Jason Gunthorpe
On Tue, Jul 07, 2009 at 03:25:11PM -0700, Al Chu wrote: Use IBPANIC consistently in libibnetdisc, in particular, since IBPANIC calls exit, there's no use in returning a value after an error. Calling abort/exit from within a general use lib on error is quite unfriendly as well.. Jason

[ofa-general] Re: [PATCH v3] libibmad: Handle MAD redirection

2009-07-01 Thread Jason Gunthorpe
On Wed, Jul 01, 2009 at 09:59:41AM -0400, Hal Rosenstock wrote: +static int redirect_port(ib_portid_t *port, uint8_t *mad) +{ + ?? ?? ?? port-lid = mad_get_field(mad, 64, IB_CPI_REDIRECT_LID_F); + ?? ?? ?? if (!port-lid) { + ?? ?? ?? ?? ?? ?? ?? IBWARN(GID-based redirection is not

[ofa-general] Re: [PATCH v3] libibmad: Handle MAD redirection

2009-07-01 Thread Jason Gunthorpe
On Wed, Jul 01, 2009 at 11:54:13AM -0400, Hal Rosenstock wrote: I think it depends on the interpretation of If redirection is not being performed, this shall be set to zero. in the RedirectGID description as to whether it is referring to redirection in general or just GID redirection.

[ofa-general] Re: [PATCH v3] libibmad: Handle MAD redirection

2009-07-01 Thread Jason Gunthorpe
On Wed, Jul 01, 2009 at 03:39:01PM -0400, Hal Rosenstock wrote: Clearly the only sane way this can work is if the GID is always filled in for the redirection case. Why is that ? Why must the redirector provide GRH info when it's not required for subnet local cases ? Because the

Re: [ofa-general] Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-30 Thread Jason Gunthorpe
On Tue, Jun 30, 2009 at 02:04:03PM +0200, Joachim Fenkes wrote: On Tuesday 30 June 2009 00:01, Hal Rosenstock wrote: On Mon, Jun 29, 2009 at 8:10 AM, Joachim Fenkesfen...@de.ibm.com wrote: Previously, libibmad reacted to GSI MAD responses with a redirect status by throwing an error. IBM

Re: [ofa-general] Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-30 Thread Jason Gunthorpe
* not be used when the LID is returned, but they still must be set. -- Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada ___ general mailing list general

Re: [ofa-general] Re: [PATCH v2 RESEND] rdma_cm: Add debugfs entries to monitor rdma_cm connections

2009-06-24 Thread Jason Gunthorpe
On Wed, Jun 24, 2009 at 06:55:43PM +0300, Moni Shoua wrote: I believe that Jason and I still disagree but... Jason suggests that I implement this feature with netlink. This approach might have an advantage but if I understand it right this approach requires a patch also to some user

Re: [ofa-general] [PATCH 2/9] ib_core: kernel API for GID --MAC translations

2009-06-17 Thread Jason Gunthorpe
On Wed, Jun 17, 2009 at 11:41:28AM +0300, Liran Liss wrote: Why not just use IP to MAC calls? Or use the MAC as the GUID? We do use standard OS services to map the IP addresses (that were encoded in the GID) to MACs. GIDs encode IP addresses rather than MACs to enable users to use the

Re: [ofa-general] [PATCH 2/9] ib_core: kernel API for GID --MAC translations

2009-06-17 Thread Jason Gunthorpe
On Wed, Jun 17, 2009 at 11:20:26AM -0700, Roland Dreier wrote: Hum, This is a very tricky subject. Co-mingling the IB GID address space and the IPv6 address space like this is not really something that was envisioned from the IBA side. Doesn't the IB spec say that an IB GID *is* an

Re: [ofa-general] [PATCH 2/9] ib_core: kernel API for GID --MAC translations

2009-06-17 Thread Jason Gunthorpe
On Wed, Jun 17, 2009 at 11:38:43AM -0700, Roland Dreier wrote: It is like an IPv6 address but it was expressly envisioned to be a seperate space. The IBA authors copied many of the conventions from IPv6 for numbering this new space, like link local, and multicast prefixes, but it was

Re: [ofa-general] [PATCH 0/9] RDMAoE - RDMA over Ethernet

2009-06-16 Thread Jason Gunthorpe
On Tue, Jun 16, 2009 at 09:32:25AM -0700, Sean Hefty wrote: RDMA over Ethernet (RDMAoE) allows running the IB transport protocol over Ethernet, providing IB capabilities for Ethernet fabrics. The packets are standard Ethernet frames with an Ethertype, an IB GRH, unmodified IB transport

Re: [ewg] Re: [ofa-general] [PATCH 0/9] RDMAoE - RDMA over Ethernet

2009-06-16 Thread Jason Gunthorpe
On Tue, Jun 16, 2009 at 02:07:03PM -0700, Paul Grun wrote: If I might chime in here...I've been working to actively squash the expression 'IBoE' or any variation that includes IB in the name. The reason is because the InfiniBand Architecture is defined as a cohesive solution that includes

Re: [ofa-general] [PATCH] libibverbs: Add RDMAoE support

2009-06-15 Thread Jason Gunthorpe
On Mon, Jun 15, 2009 at 04:42:36PM +0300, Eli Cohen wrote: +void str2gid(char *grh, union ibv_gid *gid) +{ + char tmp; + + tmp = grh[8]; + grh[8] = 0; + gid-dwords[0] = htonl(strtoul(grh, NULL, 16)); + grh[8] = tmp; + + tmp = grh[16]; + grh[16] = 0; +

Re: [ofa-general] RE: [ewg] [PATCH 0/9] RDMAoE - RDMA over Ethernet -- some procedural questions

2009-06-15 Thread Jason Gunthorpe
On Mon, Jun 15, 2009 at 06:54:49PM -0700, Ryan, Jim wrote: Recall the bylaws of OFA requires that any ULPs that OFA supports will be produced by some recognized standards organization. No such organization was known to be associated with the Mellanox proposal. Is this being submitted as

Re: [ofa-general] Re: When is the next planned release of libmlx4?

2009-06-13 Thread Jason Gunthorpe
On Fri, Jun 12, 2009 at 09:24:42PM -0700, Roland Dreier wrote: I'm not sure what the chip's expectation is for the actual bus transfers in this area, but I think you are right to be concerned about atomicity, even when transfering based on longs. The chip docs seem to suggest that

Re: [ofa-general] Re: When is the next planned release of libmlx4?

2009-06-12 Thread Jason Gunthorpe
On Fri, Jun 12, 2009 at 09:52:15AM -0700, John Gyllenhaal wrote: Valgrind replaces the libc memcpy call with a simple version that copies a byte at a time (in order). If libmlx4 is not built with --with-valgrind, valgrind considers each write an invalid write and spends a very long time

Re: [ofa-general] IPoIB CM mode and packet drops

2009-06-05 Thread Jason Gunthorpe
On Fri, Jun 05, 2009 at 11:29:58AM -0400, Hal Rosenstock wrote: I've seen some comments about this occuring prior to path MTU discovery. Is the issue is that some IP services don't utilize path MTU discovery ? This is the standard problem with mixed MTU segments, it happens on ethernet too.

Re: [ofa-general] RE: [ewg] RFC: Do we wish to take MPI out of OFED?

2009-06-05 Thread Jason Gunthorpe
On Fri, Jun 05, 2009 at 09:44:09AM -0400, Jeff Squyres wrote: 3. As Doug described, packaging MPI and OFED together actually makes it *harder* for distros. Remember that RHEL and SUSE don't end up using any of the OFED packaging; they essentially use the individual SRPMs. I would almost

Re: [ofa-general] Memory registration redux

2009-05-26 Thread Jason Gunthorpe
On Tue, May 26, 2009 at 04:13:08PM -0700, Roland Dreier wrote: Or, ignore the overlapping problem, and use your original technique, slightly modified: - Userspace registers a counter with the kernel. Kernel pins the page, sets up mmu notifiers and increments the

Re: [ofa-general] RE: [PATCH] core/mthca: Distinguish multiple IB cards in /proc/interrupts

2009-05-21 Thread Jason Gunthorpe
On Thu, May 21, 2009 at 06:23:17PM -0500, Arputham Benjamin wrote: I already suggested adding MSI-X vector information to /sys/devices/... to match the existing irq file there. That would allow userspace to figure out which interrupt belonged where. Jason's idea of adding the PCI device

Re: [ofa-general] RE: [PATCH] core/mthca: Distinguish multiple IB cards in /proc/interrupts

2009-05-20 Thread Jason Gunthorpe
On Wed, May 20, 2009 at 06:54:05PM -0500, Arputham Benjamin wrote: I was thinking that we can fix /proc/interrupts issue for case#1 first and worry about #2 later because the design to fix /proc/interrupts for mlx4 case is going to be different and independent just as the driver design is

Re: [ofa-general] Re: [RFC] OpenSM and IPv6 Scalability Proposal

2009-05-11 Thread Jason Gunthorpe
On Sat, May 09, 2009 at 01:32:06PM +0300, Eli Dorfman wrote: On Fri, May 8, 2009 at 4:57 PM, Hal Rosenstock hal.rosenst...@gmail.com wrote: On Wed, May 6, 2009 at 6:24 AM, Slava Strebkov sla...@voltaire.com wrote: In addition to the original proposal we suggest allocating special MLID

Re: [ofa-general] Memory registration redux

2009-05-11 Thread Jason Gunthorpe
On Mon, May 11, 2009 at 02:23:58PM -0700, Caitlin Bestler wrote: On Thu, May 7, 2009 at 3:48 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: Right, I was only thinking of a new driver call that was along the lines of update_mr_pages() that just updates the HCA's mapping

Re: [ofa-general] Memory registration redux

2009-05-07 Thread Jason Gunthorpe
On Thu, May 07, 2009 at 02:46:55PM -0700, Roland Dreier wrote: Using register/unregister exposes a race for the original case you brought up - but that race is completely unfixable without hardware support. At least it now becomes a hw specific race that can be printk'd and someday

Re: [ofa-general] Re: [PATCH v2] rdma_cm: Add debugfs entries to monitor rdma_cm connections

2009-05-06 Thread Jason Gunthorpe
On Wed, May 06, 2009 at 07:06:52PM +0300, Moni Shoua wrote: Reall the thinking should be 'I want to make lsof work usefully' not 'I want some random and different hack to let me see something'. And yes, that is harder. But the IB stack is now at the point where these small hard things are

Re: [ofa-general] Memory registration redux

2009-05-06 Thread Jason Gunthorpe
On Wed, May 06, 2009 at 01:10:47PM -0700, Roland Dreier wrote: By the way, what's the desired behavior of the cache if a process registers, say, address range 0x1000 ... 0x3fff, and then the same process registers address range 0x2000 ... 0x2fff (with all the same permissions, etc)? The

Re: [ofa-general] Memory registration redux

2009-05-06 Thread Jason Gunthorpe
On Wed, May 06, 2009 at 02:56:25PM -0700, Roland Dreier wrote: Yuk, doesn't this problem pretty much doom this method entirely? You can't tear down the entire registration of 0x1000 ... 0x3fff if the app does something to change 0x2000 .. 0x2fff because it may have active RDMAs going

Re: [ofa-general] Memory registration redux

2009-05-06 Thread Jason Gunthorpe
On Wed, May 06, 2009 at 03:39:54PM -0700, Roland Dreier wrote: Well, this conceptually doesn't seem hard. Go through all the pages in the MR, if any have changed then pin the new page and replace the pages physical address in the HCA's page table. Once done, synchronize with the

Re: [ofa-general] Re: New proposal for memory management

2009-05-01 Thread Jason Gunthorpe
On Fri, May 01, 2009 at 07:56:48AM -0400, Jeff Squyres wrote: On Apr 30, 2009, at 6:22 PM, Jason Gunthorpe wrote: After reading all the postings, I think my idea to fix the verbs API to not, essentially, corrupt an existing registration when the virtual address space changes is the best bet

Re: [ofa-general] New proposal for memory management

2009-05-01 Thread Jason Gunthorpe
On Fri, May 01, 2009 at 09:25:33AM -0400, Tom Talpey wrote: Completely agree. I will add that enterprise network programmers are going to reject registration caching as well, because it introduces vulnerabilities into the data path - silent data corruption. For example, storage won't tolerate

Re: [ofa-general] Re: [PATCH v2] rdma_cm: Add debugfs entries to monitor rdma_cm connections

2009-05-01 Thread Jason Gunthorpe
On Thu, Apr 30, 2009 at 04:27:05PM +0300, Or Gerlitz wrote: Jason Gunthorpe wrote: including a PID is not best, you should include enough information to figure out the pid(s) from proc/xx/fd, and vice versa. maybe its not the best solution but it seems to me good enough Well, we have

Re: [ofa-general] Re: New proposal for memory management

2009-04-30 Thread Jason Gunthorpe
On Thu, Apr 30, 2009 at 02:24:47PM -0400, Tom Talpey wrote: At 06:11 PM 4/29/2009, Barrett, Brian W wrote: On 4/29/09 15:55 , Jason Gunthorpe jguntho...@obsidianresearch.com wrote: The problem is that MPI needs to be aware of the application doing the free() and unregister or flush its MR

Re: [ofa-general] Re: New proposal for memory management

2009-04-30 Thread Jason Gunthorpe
On Thu, Apr 30, 2009 at 09:52:32AM -0400, Jeff Squyres wrote: I think Jason is the only one who is remaining at least somewhat on-topic here. Thanks, but I have no stake in this, it is just interesting :) After reading all the postings, I think my idea to fix the verbs API to not,

Re: [ofa-general] Re: New proposal for memory management

2009-04-29 Thread Jason Gunthorpe
On Wed, Apr 29, 2009 at 08:15:57AM -0400, Jeff Squyres wrote: On Apr 29, 2009, at 12:03 AM, Jason Gunthorpe wrote: I've often wondered, wouldn't it just be fine for MPI if the entire process address space is kept pinned, registered and consistent with the HCA? The process would opt

Re: [ofa-general] Re: New proposal for memory management

2009-04-29 Thread Jason Gunthorpe
a huge overcommit. It would be very interesting to see /proc/PID/smaps information for a running MPI job to compute how many unallocated pages are present in a job. -- Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832 Chief Technology Officer, Obsidian Research Corp

  1   2   3   >