[ofa-general] Re: [RFC V3 PATCH 4/5] rdma/cma: implement RDMA_CM_EVENT_NETDEV_CHANGE notification
Sean Hefty wrote: +static void cma_ndev_work_handler(struct work_struct *_work) +{ + struct cma_ndev_work *work = container_of(_work, struct cma_ndev_work, work); + struct rdma_id_private *id_priv = work-id; + int destroy = 0; + + mutex_lock(id_priv-handler_mutex); + + if (id_priv-id.event_handler(id_priv-id, work-event)) { How do we know that the user hasn't tried to destroy the id from another callback? We need some sort of state check here. correct, will be fixed. + cma_exch(id_priv, CMA_DESTROYING); + destroy = 1; + } + + cma_enable_remove(id_priv); I didn't see the matching cma_disable_remove() call. As you can see also in the patch 3/5, places in the code which originally did --not-- call cma_enable_remove() but rather just did atomic_inc(conn_id-dev_remove) were just replaced with mutex_lock(id_priv-handler_mutex). This is b/c cma_enable_remove does two things: 1) it does the state validation 2) it locks the handler_mutex, so places in the code which don't need the state validation don't call it... a bit dirty. +static int cma_netdev_align_id(struct net_device *ndev, struct rdma_id_private *id_priv) +{ nit - function name isn't clear to me. Maybe something like cma_netdev_change_handler()? Although I'm not sure that netdev change is what the user is really interested in. What they really want to know is if IP address mapping/resolution changed. netdev is hidden from the user. OK, I will see how to improve the name Maybe call this RDMA_CM_EVENT_ADDR_CHANGE? let me think about it +static int cma_netdev_callback(struct notifier_block *self, unsigned long event, + void *ctx) +{ + struct net_device *ndev = (struct net_device *)ctx; + struct cma_device *cma_dev; + struct rdma_id_private *id_priv; + int ret = NOTIFY_DONE; + + if (dev_net(ndev) != init_net) + return NOTIFY_DONE; + + if (event != NETDEV_BONDING_FAILOVER) + return NOTIFY_DONE; + + if (!(ndev-flags IFF_MASTER) || !(ndev-priv_flags IFF_BONDING)) + return NOTIFY_DONE; + + mutex_lock(lock); + list_for_each_entry(cma_dev, dev_list, list) It seems like we just need to find the cma_dev that has the current mapping correct. So I can take the lock, find the device in the list, increment its refcount and release the lock. For that end I would have to save in the cma device structure the names of the network devices which are associated with it i can't use a comarison of the pdev etc pointers to the dma device since some network devices (eg bonding / vlan interfaces in ethernet) have NULL pdev (they are virtual devices) Later I can scan this device ID list, but I must do it under the lock inorder not to race with the device removal code which removed IDs from this list in cma_process_remove(), correct? Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [RFC PATCH 3/5] rdma/cma: simplify locking needed for serialization of callbacks
Sean Hefty wrote: @@ -359,7 +358,7 @@ static int cma_disable_remove(struct rdm spin_lock_irqsave(id_priv-lock, flags); if (id_priv-state == state) { - atomic_inc(id_priv-dev_remove); + mutex_lock(id_priv-handler_mutex); This just tried to acquire a mutex while holding a spinlock. I see. So can taking this spin lock be avoided here? I understand that spin lock came to protect the state check, correct? Or. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
At 11:33 AM 5/27/2008, Tom Tucker wrote: So I think from an NFSRDMA coding perspective it's a wash... Just to be clear, you're talking about the NFS/RDMA server. However, it's pretty much a wash on the client, for different reasons. When posting the WR, We check the fastreg capabilities bit + transport type bit: If fastreg is true -- Post FastReg If iWARP (or with a cap bit read-with-inv-flag) post rdma read w/ invalidate ... For iWARP's case, this means rdma-read-w-inv, plus rdma-send-w-inv, etc... Maybe I'm confused, but I don't understand this. iWARP RDMA Read requests don't support remote invalidate. At least, the table in RFC5040 (p.22) doesn't: ---+---+---+--+---+---+-- RDMA | Message | Tagged| STag | Queue | Invalidate| Message Message| Type | Flag | and | Number| STag | Length OpCode | | | TO | | | Communicated | | | | | | between DDP | | | | | | and RDMAP ---+---+---+--+---+---+-- b | RDMA Write| 1 | Valid| N/A | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 0001b | RDMA Read | 0 | N/A | 1 | N/A | Yes | Request | | | | | ---+---+---+--+---+---+-- 0010b | RDMA Read | 1 | Valid| N/A | N/A | Yes | Response | | | | | ---+---+---+--+---+---+-- 0011b | Send | 0 | N/A | 0 | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 0100b | Send with | 0 | N/A | 0 | Valid | Yes | Invalidate| | | | | ---+---+---+--+---+---+-- 0101b | Send with | 0 | N/A | 0 | N/A | Yes | SE| | | | | ---+---+---+--+---+---+-- 0110b | Send with | 0 | N/A | 0 | Valid | Yes | SE and| | | | | | Invalidate| | | | | ---+---+---+--+---+---+-- 0111b | Terminate | 0 | N/A | 2 | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 1000b | | to | Reserved | Not Specified b | | ---+---+- I want to take this opportunity to also mention that the RPC/RDMA client-server exchange does not support remote-invalidate currently. Because of the multiple stags supported by the rpcrdma chunking header, and because the client needs to verify that the stags were in fact invalidated, there is significant overhead, and the jury is out on that benefit. In fact, I suspect it's a lose at the client. Tom (Talpey). ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] OpenSM?
Charles, Here at LLNL we have been running OpenSM for some time. Thus far we are very happy with it's performance. Our largest cluster is 1152 nodes and OpenSM can bring it up (not counting boot time) in less than a minute. Here are some details. We are running v3.1.10 of OpenSM with some minor modifications (mostly patches which have been submitted upstream and been accepted by Sasha but are not yet in a release.) Our clusters are all Fat-tree topologies. We have a node which is more or less dedicated to running OpenSM. We have some other monitoring software running on it, but OpenSM can utilize the CPU/Memory if it needs to. A) On our large clusters this node is a 4 socket, dual core (8 cores total) Opteron running at 2.4Gig with 16Gig of memory. I don't believe OpenSM needs this much but the nodes were built all the same so this is what it got. B) On one of our smaller clusters (128 nodes) OpenSM is running on a dual socket, single core (2 core) 2.4Gig Opteron nodes with 2Gig of memory. We have not seen any issues with this cluster and OpenSM. We run with the up/down algorithm, ftree has not panned out for us yet. I can't say how that would compare to the Cisco algorithms. In short OpenSM should work just fine on your cluster. Hope this helps, Ira On Tue, 27 May 2008 11:15:14 -0400 Charles Taylor [EMAIL PROTECTED] wrote: We have a 400 node IB cluster.We are running an embedded SM in failover mode on our TS270/Cisco7008 core switches.Lately we have been seeing problems with LID assignment when rebooting nodes (see log messages below). It is also taking far too long for LIDS to be assigned as it takes on the order of minutes for the ports to transition to ACTIVE. This seems like a bug to us and we are considering switching to OpenSM on a host. I'm wondering about experience with running OpenSM for medium to large clusters (Fat Tree) and what resources (memory/cpu) we should plan on for the host node. Thanks, Charlie Taylor UF HPC Center May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]: ** NEW SWEEP May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover the subnet May 27 14:14:13 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM OUT_OF_SERVICE trap for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59 May 27 14:14:13 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:256]: An existing IB node GUID 00:02:c9:02:00:21:4b:59 LID 194 was removed May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM DELETE_MC_GROUP trap for GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:21:4b:59 May 27 14:14:14 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]: Topology changed May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by discovering removed ports May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events require sweep May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]: ** NEW SWEEP May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover the subnet May 27 14:16:28 topspin-270sc ib_sm.x[812]: [ib_sm_discovery.c:1009]: no routing required for port guid 00:02:c9:02:00:21:4b:59, lid 194 May 27 14:16:30 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]: Topology changed May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by discovering new ports May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by multicast membership change May 27 14:16:30 topspin-270sc ib_sm.x[812]: [ib_sm_assign.c:588]: Force port to go down due to LID conflict, node - GUID=00:02:c9:02:00:21:4b:58, port=1 May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:562]: Program port state, node=00:02:c9:02:00:21:4b:58, port= 16, current state 2, neighbor node=00:02:c9:02:00:21:4b:58, port= 1, current state 2 May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:733]: Failed to negotiate MTU, op_vl for node=00:02:c9:02:00:21:4b:58, port= 1, mad status 0x1c May 27 14:18:42 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM IN_SERVICE trap for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59 May 27 14:18:42 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:144]: A new IB node 00:02:c9:02:00:21:4b:59 was discovered and assigned LID 0 May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events require sweep May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]: ** NEW SWEEP May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover the subnet May 27 14:18:46 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: No topology change May 27 14:18:46 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by previous GET/SET operation
[ofa-general] Re: [PATCH] saquery: --smkey command line option
On 04:33 Tue 27 May , Hal Rosenstock wrote: Maybe yes, but could you be more specific? Store SMKey in read-only file on a client side? Treat smkey as su treats password rather than a command line parameter is another alternative. Ok, let's do it as '--smkey X' and then saquery will ask for a value, just like su does. Good? I'm not proposing to expose SM_Key, just added such option where this key could be specified. How is that not exposing it ? Because (1) and (2) below. Sasha -- Hal So: 1) this is *optional*, 2) there is no suggestions about how the right value should be determined. Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH] saquery: --smkey command line option
On 04:29 Tue 27 May , Hal Rosenstock wrote: On Fri, 2008-05-23 at 05:52 -0700, Hal Rosenstock wrote: Following your logic we will need to disable root passwords typing too. That's taking it too far. Root passwords are at least hidden when typing. At least hide the key typing from plain sight when typing like su does. There are lot of tools where password can be specified as clear text in command line (wget, smbclient, etc..) - it is an user responsibility to keep his sensitive data safe. Sasha ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] mthca MR attrs userspace change
My kernel started complaining to me recently: ib_mthca :02:00.0: Process 'tgtd' did not pass in MR attrs. ib_mthca :02:00.0: Update libmthca to fix this. It comes from commit cb9fbc5c37b69ac584e61d449cfd590f5ae1f90d (IB: expand ib_umem_get() prototype) and the fix in baaad380c0aa955f7d62e846467316c94067f1a5 (IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute). Nice that everything still works with old userspace, but where is the latest libmthca these days? The one at kernel.org still has ABI_VERSION 1: http://git.kernel.org/?p=libs/infiniband/libmthca.git;a=blob;f=src/mthca-abi.h;h=2557274e4cbd9f36df2be42379644d31b4ff5da3;hb=HEAD By the way, Roland, your efforts at Fedora packaging are certainly appreciated here. If the new libmthca just showed up in updates to F-9, that would be most convenient. -- Pete ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: mthca MR attrs userspace change
ib_mthca :02:00.0: Process 'tgtd' did not pass in MR attrs. ib_mthca :02:00.0: Update libmthca to fix this. Heh, thanks for the reminder. Will fix libmthca to handle this properly today. By the way, Roland, your efforts at Fedora packaging are certainly appreciated here. If the new libmthca just showed up in updates to F-9, that would be most convenient. It will, although the Fedora process takes a while to grind through ;) - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] RE: [RFC PATCH 3/5] rdma/cma: simplify locking needed for serialization of callbacks
spin_lock_irqsave(id_priv-lock, flags); if (id_priv-state == state) { - atomic_inc(id_priv-dev_remove); + mutex_lock(id_priv-handler_mutex); This just tried to acquire a mutex while holding a spinlock. I see. So can taking this spin lock be avoided here? I understand that spin lock came to protect the state check, correct? I think we should just remove cma_disable_remove() and cma_enable_remove(), and instead call mutex_lock/unlock directly in their places. Where cma_disable_remove() is called, add in appropriate state checks after acquiring the mutex. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
On Tue, 2008-05-27 at 10:33 -0500, Tom Tucker wrote: On Mon, 2008-05-26 at 16:02 -0700, Roland Dreier wrote: The invalidate local stag part of a read is just a local sink side operation (ie no wire protocol change from a read). It's not like processing an ingress send-with-inv. It is really functionally like a read followed immediately by a fenced invalidate-local, but it doesn't stall the pipe. So the device has to remember the read is a with inv local stag and invalidate the stag after the read response is placed and before the WCE is reaped by the application. Yes, understood. My point was just that in IB, at least in theory, one could just use an L_Key that doesn't have any remote permissions in the scatter list of an RDMA read, while in iWARP, the STag used to place an RDMA read response has to have remote write permission. So RDMA read with invalidate makes sense for iWARP, because it gives a race-free way to allow an STag to be invalidated immediately after an RDMA read response is placed, while in IB it's simpler just to never give remote access at all. So I think from an NFSRDMA coding perspective it's a wash... When creating the local data sink, We need to check the transport type. If it's IB -- only local access, if it's iWARP -- local + remote access. When posting the WR, We check the fastreg capabilities bit + transport type bit: If fastreg is true -- Post FastReg If iWARP (or with a cap bit read-with-inv-flag) post rdma read w/ invalidate else /* IB */ post rdma read Steve pointed out a good optimization here. Instead of fencing the RDMA READ here in advance of the INVALIDATE, we should post the INVALIDATE when the READ WR completes. This will avoid stalling the SQ. Since IB doesn't put the LKEY on the wire, there's no security issue to close. We need to keep a bunch of fastreg MR around anyway for concurrent RPC. Thoughts? Tom post invalidate fi else ... today's logic fi I make the observation, however, that the transport type is now overloaded with a set of required verbs. For iWARP's case, this means rdma-read-w-inv, plus rdma-send-w-inv, etc... This also means that new transport types will inherit one or the other set of verbs (IB or iWARP). Tom - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: mthca MR attrs userspace change
Here's the patch I plan to add to match the kernel, in case someone wants to check it over: diff --git a/src/mthca-abi.h b/src/mthca-abi.h index 2557274..7e47d70 100644 --- a/src/mthca-abi.h +++ b/src/mthca-abi.h @@ -36,7 +36,8 @@ #include infiniband/kern-abi.h -#define MTHCA_UVERBS_ABI_VERSION 1 +#define MTHCA_UVERBS_MIN_ABI_VERSION 1 +#define MTHCA_UVERBS_MAX_ABI_VERSION 2 struct mthca_alloc_ucontext_resp { struct ibv_get_context_resp ibv_resp; @@ -50,6 +51,17 @@ struct mthca_alloc_pd_resp { __u32 reserved; }; +struct mthca_reg_mr { + struct ibv_reg_mr ibv_cmd; +/* + * Mark the memory region with a DMA attribute that causes + * in-flight DMA to be flushed when the region is written to: + */ +#define MTHCA_MR_DMASYNC 0x1 + __u32 mr_attrs; + __u32 reserved; +}; + struct mthca_create_cq { struct ibv_create_cqibv_cmd; __u32 lkey; diff --git a/src/mthca.c b/src/mthca.c index e00c4ee..dd95636 100644 --- a/src/mthca.c +++ b/src/mthca.c @@ -282,9 +282,11 @@ static struct ibv_device *mthca_driver_init(const char *uverbs_sys_path, return NULL; found: - if (abi_version MTHCA_UVERBS_ABI_VERSION) { - fprintf(stderr, PFX Fatal: ABI version %d of %s is too new (expected %d)\n, - abi_version, uverbs_sys_path, MTHCA_UVERBS_ABI_VERSION); + if (abi_version MTHCA_UVERBS_MAX_ABI_VERSION || + abi_version MTHCA_UVERBS_MIN_ABI_VERSION) { + fprintf(stderr, PFX Fatal: ABI version %d of %s is not in supported range %d-%d\n, + abi_version, uverbs_sys_path, MTHCA_UVERBS_MIN_ABI_VERSION, + MTHCA_UVERBS_MAX_ABI_VERSION); return NULL; } diff --git a/src/verbs.c b/src/verbs.c index 6c9b53a..3d273d4 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -117,12 +117,21 @@ int mthca_free_pd(struct ibv_pd *pd) static struct ibv_mr *__mthca_reg_mr(struct ibv_pd *pd, void *addr, size_t length, uint64_t hca_va, -enum ibv_access_flags access) +enum ibv_access_flags access, +int dma_sync) { struct ibv_mr *mr; - struct ibv_reg_mr cmd; + struct mthca_reg_mr cmd; int ret; + /* +* Old kernels just ignore the extra data we pass in with the +* reg_mr command structure, so there's no need to add an ABI +* version check here. +*/ + cmd.mr_attrs = dma_sync ? MTHCA_MR_DMASYNC : 0; + cmd.reserved = 0; + mr = malloc(sizeof *mr); if (!mr) return NULL; @@ -132,11 +141,11 @@ static struct ibv_mr *__mthca_reg_mr(struct ibv_pd *pd, void *addr, struct ibv_reg_mr_resp resp; ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, mr, -cmd, sizeof cmd, resp, sizeof resp); +cmd.ibv_cmd, sizeof cmd, resp, sizeof resp); } #else ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, mr, -cmd, sizeof cmd); +cmd.ibv_cmd, sizeof cmd); #endif if (ret) { free(mr); @@ -149,7 +158,7 @@ static struct ibv_mr *__mthca_reg_mr(struct ibv_pd *pd, void *addr, struct ibv_mr *mthca_reg_mr(struct ibv_pd *pd, void *addr, size_t length, enum ibv_access_flags access) { - return __mthca_reg_mr(pd, addr, length, (uintptr_t) addr, access); + return __mthca_reg_mr(pd, addr, length, (uintptr_t) addr, access, 0); } int mthca_dereg_mr(struct ibv_mr *mr) @@ -202,7 +211,7 @@ struct ibv_cq *mthca_create_cq(struct ibv_context *context, int cqe, cq-mr = __mthca_reg_mr(to_mctx(context)-pd, cq-buf.buf, cqe * MTHCA_CQ_ENTRY_SIZE, - 0, IBV_ACCESS_LOCAL_WRITE); + 0, IBV_ACCESS_LOCAL_WRITE, 1); if (!cq-mr) goto err_buf; @@ -297,7 +306,7 @@ int mthca_resize_cq(struct ibv_cq *ibcq, int cqe) mr = __mthca_reg_mr(to_mctx(ibcq-context)-pd, buf.buf, cqe * MTHCA_CQ_ENTRY_SIZE, - 0, IBV_ACCESS_LOCAL_WRITE); + 0, IBV_ACCESS_LOCAL_WRITE, 1); if (!mr) { mthca_free_buf(buf); ret = ENOMEM; @@ -405,7 +414,7 @@ struct ibv_srq *mthca_create_srq(struct ibv_pd *pd, if (mthca_alloc_srq_buf(pd, attr-attr, srq)) goto err; - srq-mr = __mthca_reg_mr(pd, srq-buf.buf, srq-buf_size, 0, 0); + srq-mr = __mthca_reg_mr(pd, srq-buf.buf, srq-buf_size, 0, 0, 0); if (!srq-mr)
Re: [ofa-general] Re: [PATCH v2 12/13] QLogic VNIC: Driver Kconfig and Makefile.
Makes sense. We will get rid of this CONFIG option. Apart from this are there any other changes you would like to see in the patch series ? Have not reviewed the latest in detail but I think we are at least pretty close to something ready to merge. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: mthca MR attrs userspace change
Nice that everything still works with old userspace, but where is the latest libmthca these days? The one at kernel.org still has ABI_VERSION 1: Actually this tricked me... the kernel ABI didn't get bumped so there was no reason to bump the libmthca ABI. I'll actually use this slightly simpler patch: diff --git a/src/mthca-abi.h b/src/mthca-abi.h index 2557274..4fbd98b 100644 --- a/src/mthca-abi.h +++ b/src/mthca-abi.h @@ -50,6 +50,17 @@ struct mthca_alloc_pd_resp { __u32 reserved; }; +struct mthca_reg_mr { + struct ibv_reg_mr ibv_cmd; +/* + * Mark the memory region with a DMA attribute that causes + * in-flight DMA to be flushed when the region is written to: + */ +#define MTHCA_MR_DMASYNC 0x1 + __u32 mr_attrs; + __u32 reserved; +}; + struct mthca_create_cq { struct ibv_create_cqibv_cmd; __u32 lkey; diff --git a/src/verbs.c b/src/verbs.c index 6c9b53a..def0f30 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -117,12 +117,22 @@ int mthca_free_pd(struct ibv_pd *pd) static struct ibv_mr *__mthca_reg_mr(struct ibv_pd *pd, void *addr, size_t length, uint64_t hca_va, -enum ibv_access_flags access) +enum ibv_access_flags access, +int dma_sync) { struct ibv_mr *mr; - struct ibv_reg_mr cmd; + struct mthca_reg_mr cmd; int ret; + /* +* Old kernels just ignore the extra data we pass in with the +* reg_mr command structure, so there's no need to add an ABI +* version check here (and indeed the kernel ABI was not +* incremented due to this change). +*/ + cmd.mr_attrs = dma_sync ? MTHCA_MR_DMASYNC : 0; + cmd.reserved = 0; + mr = malloc(sizeof *mr); if (!mr) return NULL; @@ -132,11 +142,11 @@ static struct ibv_mr *__mthca_reg_mr(struct ibv_pd *pd, void *addr, struct ibv_reg_mr_resp resp; ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, mr, -cmd, sizeof cmd, resp, sizeof resp); +cmd.ibv_cmd, sizeof cmd, resp, sizeof resp); } #else ret = ibv_cmd_reg_mr(pd, addr, length, hca_va, access, mr, -cmd, sizeof cmd); +cmd.ibv_cmd, sizeof cmd); #endif if (ret) { free(mr); @@ -149,7 +159,7 @@ static struct ibv_mr *__mthca_reg_mr(struct ibv_pd *pd, void *addr, struct ibv_mr *mthca_reg_mr(struct ibv_pd *pd, void *addr, size_t length, enum ibv_access_flags access) { - return __mthca_reg_mr(pd, addr, length, (uintptr_t) addr, access); + return __mthca_reg_mr(pd, addr, length, (uintptr_t) addr, access, 0); } int mthca_dereg_mr(struct ibv_mr *mr) @@ -202,7 +212,7 @@ struct ibv_cq *mthca_create_cq(struct ibv_context *context, int cqe, cq-mr = __mthca_reg_mr(to_mctx(context)-pd, cq-buf.buf, cqe * MTHCA_CQ_ENTRY_SIZE, - 0, IBV_ACCESS_LOCAL_WRITE); + 0, IBV_ACCESS_LOCAL_WRITE, 1); if (!cq-mr) goto err_buf; @@ -297,7 +307,7 @@ int mthca_resize_cq(struct ibv_cq *ibcq, int cqe) mr = __mthca_reg_mr(to_mctx(ibcq-context)-pd, buf.buf, cqe * MTHCA_CQ_ENTRY_SIZE, - 0, IBV_ACCESS_LOCAL_WRITE); + 0, IBV_ACCESS_LOCAL_WRITE, 1); if (!mr) { mthca_free_buf(buf); ret = ENOMEM; @@ -405,7 +415,7 @@ struct ibv_srq *mthca_create_srq(struct ibv_pd *pd, if (mthca_alloc_srq_buf(pd, attr-attr, srq)) goto err; - srq-mr = __mthca_reg_mr(pd, srq-buf.buf, srq-buf_size, 0, 0); + srq-mr = __mthca_reg_mr(pd, srq-buf.buf, srq-buf_size, 0, 0, 0); if (!srq-mr) goto err_free; @@ -525,7 +535,7 @@ struct ibv_qp *mthca_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) pthread_spin_init(qp-rq.lock, PTHREAD_PROCESS_PRIVATE)) goto err_free; - qp-mr = __mthca_reg_mr(pd, qp-buf.buf, qp-buf_size, 0, 0); + qp-mr = __mthca_reg_mr(pd, qp-buf.buf, qp-buf_size, 0, 0, 0); if (!qp-mr) goto err_free; ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH RFC v4 0/2] RDMA/Core: MEM_MGT_EXTENSIONS support
The following patch series proposes: - The API and core changes needed to implement the IB BMMR and iWARP equivalient memory extensions. - cxgb3 support. Changes since version 3: - better comments to ib_alloc_fast_reg_page_list() function to explicitly state the page list is owned by the device until the fast_reg WR completes _and_ that the page_list can be modified by the device. - cxgb3 - when allocating a page list, set max_page_list_len. - page_size - page_shift in fast_reg union of ib_send_wr struct. - key support via ib_update_fast_reg_key() Changes since version 2: - added device attribute max_fast_reg_page_list_len - added cxgb3 patch Changes since version 1: - ib_alloc_mr() - ib_alloc_fast_reg_mr() - pbl_depth - max_page_list_len - page_list_len - max_page_list_len where it makes sense - int - unsigned int where needed - fbo - first_byte_offset - added page size and page_list_len to fast_reg union in ib_send_wr - rearranged work request fast_reg union of ib_send_wr to pack it - dropped remove_access parameter from ib_alloc_fast_reg_mr() - IB_DEVICE_MM_EXTENSIONS - IB_DEVICE_MEM_MGT_EXTENSIONS - compiled Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH RFC v4 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
Support for the IB BMME and iWARP equivalent memory extensions to non shared memory regions. This includes: - allocation of an ib_mr for use in fast register work requests - device-specific alloc/free of physical buffer lists for use in fast register work requests. This allows devices to allocate this memory as needed (like via dma_alloc_coherent). - fast register memory region work request - invalidate local memory region work request - read with invalidate local memory region work request (iWARP only) Design details: - New device capability flag added: IB_DEVICE_MEM_MGT_EXTENSIONS indicates device support for this feature. - New send WR opcode IB_WR_FAST_REG_MR used to issue a fast_reg request. - New send WR opcode IB_WR_INVALIDATE_MR used to invalidate a fast_reg mr. - New API function, ib_alloc_mr() used to allocate fast_reg memory regions. - New API function, ib_alloc_fast_reg_page_list to allocate device-specific page lists. - New API function, ib_free_fast_reg_page_list to free said page lists. - New API function, ib_update_fast_reg_key to allow the key portion of the R_Key and L_Key of a fast_reg MR to be updated. Applications call this if desired before posting the IB_WR_FAST_REG_MR. Usage Model: - MR allocated with ib_alloc_mr() - Page lists allocated via ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key key field updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_INVALIDATE_MR) - MR deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast_reg mr once, and then can repeatedly bind the mr to different physical memory SGLs via posting work requests to the For each outstanding mr-to-pbl binding in the SQ pipe, a fast_reg_page_list needs to be allocated. Thus pipelining can be achieved while still allowing device-specific page_list processing. The 4B fast_reg rkey or stag is composed of a 3B index, and a 1B key. The application can change the key each time it fast-registers thus allowing more control over the peer's use of the rkey (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/core/verbs.c | 46 include/rdma/ib_verbs.h | 76 +++ 2 files changed, 122 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 0504208..0a334b4 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -755,6 +755,52 @@ int ib_dereg_mr(struct ib_mr *mr) } EXPORT_SYMBOL(ib_dereg_mr); +struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) +{ + struct ib_mr *mr; + + if (!pd-device-alloc_fast_reg_mr) + return ERR_PTR(-ENOSYS); + + mr = pd-device-alloc_fast_reg_mr(pd, max_page_list_len); + + if (!IS_ERR(mr)) { + mr-device = pd-device; + mr-pd = pd; + mr-uobject = NULL; + atomic_inc(pd-usecnt); + atomic_set(mr-usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_alloc_fast_reg_mr); + +struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list( + struct ib_device *device, int max_page_list_len) +{ + struct ib_fast_reg_page_list *page_list; + + if (!device-alloc_fast_reg_page_list) + return ERR_PTR(-ENOSYS); + + page_list = device-alloc_fast_reg_page_list(device, max_page_list_len); + + if (!IS_ERR(page_list)) { + page_list-device = device; + page_list-max_page_list_len = max_page_list_len; + } + + return page_list; +} +EXPORT_SYMBOL(ib_alloc_fast_reg_page_list); + +void ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list) +{ + page_list-device-free_fast_reg_page_list(page_list); +} +EXPORT_SYMBOL(ib_free_fast_reg_page_list); + /* Memory windows */ struct ib_mw *ib_alloc_mw(struct ib_pd *pd) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 911a661..ede0c80 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -106,6 +106,7 @@ enum ib_device_cap_flags { IB_DEVICE_UD_IP_CSUM= (118), IB_DEVICE_UD_TSO= (119), IB_DEVICE_SEND_W_INV= (121), + IB_DEVICE_MEM_MGT_EXTENSIONS= (122), }; enum ib_atomic_cap { @@ -151,6 +152,7 @@ struct ib_device_attr { int max_srq; int max_srq_wr; int max_srq_sge; + unsigned intmax_fast_reg_page_list_len; u16 max_pkeys; u8 local_ca_ack_delay; }; @@ -414,6 +416,8 @@ enum ib_wc_opcode {
[ofa-general] [PATCH RFC v4 2/2] RDMA/cxgb3: MEM_MGT_EXTENSIONS support
- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit. - set max_fast_reg_page_list_len device attribute. - add iwch_alloc_fast_reg_mr function. - add iwch_alloc_fastreg_pbl - add iwch_free_fastreg_pbl - adjust the WQ depth for kernel mode work queues to account for fastreg possibly taking 2 WR slots. - add fastreg_mr work request support. - add invalidate_mr work request support. - add send_with_inv and send_with_se_inv work request support. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/hw/cxgb3/cxio_hal.c | 13 ++- drivers/infiniband/hw/cxgb3/cxio_hal.h |1 drivers/infiniband/hw/cxgb3/cxio_wr.h | 51 ++- drivers/infiniband/hw/cxgb3/iwch_provider.c | 78 - drivers/infiniband/hw/cxgb3/iwch_qp.c | 123 +++ 5 files changed, 216 insertions(+), 50 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 3f441fc..6315c77 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -145,7 +145,7 @@ static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) } wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); memset(wqe, 0, sizeof(*wqe)); - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 0, qpid, 7); + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 0, qpid, 7, 3); wqe-flags = cpu_to_be32(MODQP_WRITE_EC); sge_cmd = qpid 8 | 3; wqe-sge_cmd = cpu_to_be64(sge_cmd); @@ -558,7 +558,7 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); memset(wqe, 0, sizeof(*wqe)); build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 0, - T3_CTL_QP_TID, 7); + T3_CTL_QP_TID, 7, 3); wqe-flags = cpu_to_be32(MODQP_WRITE_EC); sge_cmd = (3ULL 56) | FW_RI_SGEEC_START 8 | 3; wqe-sge_cmd = cpu_to_be64(sge_cmd); @@ -674,7 +674,7 @@ static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr, build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_BP, flag, Q_GENBIT(rdev_p-ctrl_qp.wptr, T3_CTRL_QP_SIZE_LOG2), T3_CTRL_QP_ID, - wr_len); + wr_len, 3); if (flag == T3_COMPLETION_FLAG) ring_doorbell(rdev_p-ctrl_qp.doorbell, T3_CTRL_QP_ID); len -= 96; @@ -816,6 +816,13 @@ int cxio_deallocate_window(struct cxio_rdev *rdev_p, u32 stag) 0, 0); } +int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid) +{ + *stag = T3_STAG_UNSET; + return __cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR, +0, 0, 0ULL, 0, 0, 0, 0); +} + int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr) { struct t3_rdma_init_wr *wqe; diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h index 6e128f6..e7659f6 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h @@ -165,6 +165,7 @@ int cxio_reregister_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid, int cxio_dereg_mem(struct cxio_rdev *rdev, u32 stag, u32 pbl_size, u32 pbl_addr); int cxio_allocate_window(struct cxio_rdev *rdev, u32 * stag, u32 pdid); +int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid); int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag); int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr); void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb); diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h index f1a25a8..2a24962 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_wr.h +++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h @@ -72,7 +72,8 @@ enum t3_wr_opcode { T3_WR_BIND = FW_WROPCODE_RI_BIND_MW, T3_WR_RCV = FW_WROPCODE_RI_RECEIVE, T3_WR_INIT = FW_WROPCODE_RI_RDMA_INIT, - T3_WR_QP_MOD = FW_WROPCODE_RI_MODIFY_QP + T3_WR_QP_MOD = FW_WROPCODE_RI_MODIFY_QP, + T3_WR_FASTREG = FW_WROPCODE_RI_FASTREGISTER_MR } __attribute__ ((packed)); enum t3_rdma_opcode { @@ -89,7 +90,8 @@ enum t3_rdma_opcode { T3_FAST_REGISTER, T3_LOCAL_INV, T3_QP_MOD, - T3_BYPASS + T3_BYPASS, + T3_RDMA_READ_REQ_WITH_INV, } __attribute__ ((packed)); static inline enum t3_rdma_opcode wr2opcode(enum t3_wr_opcode wrop) @@ -170,11 +172,46 @@ struct t3_send_wr { struct t3_sge sgl[T3_MAX_SGE]; /* 4+ */ }; +#define T3_MAX_FASTREG_DEPTH 18 + +struct t3_fastreg_wr { + struct fw_riwrh wrh;/* 0 */ + union t3_wrid wrid; /* 1 */ + __be32 page_type_perms;
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
On Tue, 2008-05-27 at 12:39 -0400, Talpey, Thomas wrote: At 11:33 AM 5/27/2008, Tom Tucker wrote: So I think from an NFSRDMA coding perspective it's a wash... Just to be clear, you're talking about the NFS/RDMA server. However, it's pretty much a wash on the client, for different reasons. Tom: What client side memory registration strategy do you recommend if the default on the server side is fastreg? On the performance side we are limited by the min size of the read/write-chunk element. If the client still gives the server a 4k chunk, the performance benefit (fewer PDU on the wire) goes away. Tom When posting the WR, We check the fastreg capabilities bit + transport type bit: If fastreg is true -- Post FastReg If iWARP (or with a cap bit read-with-inv-flag) post rdma read w/ invalidate ... For iWARP's case, this means rdma-read-w-inv, plus rdma-send-w-inv, etc... Maybe I'm confused, but I don't understand this. iWARP RDMA Read requests don't support remote invalidate. At least, the table in RFC5040 (p.22) doesn't: ---+---+---+--+---+---+-- RDMA | Message | Tagged| STag | Queue | Invalidate| Message Message| Type | Flag | and | Number| STag | Length OpCode | | | TO | | | Communicated | | | | | | between DDP | | | | | | and RDMAP ---+---+---+--+---+---+-- b | RDMA Write| 1 | Valid| N/A | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 0001b | RDMA Read | 0 | N/A | 1 | N/A | Yes | Request | | | | | ---+---+---+--+---+---+-- 0010b | RDMA Read | 1 | Valid| N/A | N/A | Yes | Response | | | | | ---+---+---+--+---+---+-- 0011b | Send | 0 | N/A | 0 | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 0100b | Send with | 0 | N/A | 0 | Valid | Yes | Invalidate| | | | | ---+---+---+--+---+---+-- 0101b | Send with | 0 | N/A | 0 | N/A | Yes | SE| | | | | ---+---+---+--+---+---+-- 0110b | Send with | 0 | N/A | 0 | Valid | Yes | SE and| | | | | | Invalidate| | | | | ---+---+---+--+---+---+-- 0111b | Terminate | 0 | N/A | 2 | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 1000b | | to | Reserved | Not Specified b | | ---+---+- I want to take this opportunity to also mention that the RPC/RDMA client-server exchange does not support remote-invalidate currently. Because of the multiple stags supported by the rpcrdma chunking header, and because the client needs to verify that the stags were in fact invalidated, there is significant overhead, and the jury is out on that benefit. In fact, I suspect it's a lose at the client. Tom (Talpey). ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
Tom Tucker wrote: On Tue, 2008-05-27 at 12:39 -0400, Talpey, Thomas wrote: At 11:33 AM 5/27/2008, Tom Tucker wrote: So I think from an NFSRDMA coding perspective it's a wash... Just to be clear, you're talking about the NFS/RDMA server. However, it's pretty much a wash on the client, for different reasons. Tom: What client side memory registration strategy do you recommend if the default on the server side is fastreg? On the performance side we are limited by the min size of the read/write-chunk element. If the client still gives the server a 4k chunk, the performance benefit (fewer PDU on the wire) goes away. Tom I would hope that dma_mr usage will be replaced with fast_reg on both the client and the server. When posting the WR, We check the fastreg capabilities bit + transport type bit: If fastreg is true -- Post FastReg If iWARP (or with a cap bit read-with-inv-flag) post rdma read w/ invalidate ... For iWARP's case, this means rdma-read-w-inv, plus rdma-send-w-inv, etc... Maybe I'm confused, but I don't understand this. iWARP RDMA Read requests don't support remote invalidate. At least, the table in RFC5040 (p.22) doesn't: ---+---+---+--+---+---+-- RDMA | Message | Tagged| STag | Queue | Invalidate| Message Message| Type | Flag | and | Number| STag | Length OpCode | | | TO | | | Communicated | | | | | | between DDP | | | | | | and RDMAP ---+---+---+--+---+---+-- b | RDMA Write| 1 | Valid| N/A | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 0001b | RDMA Read | 0 | N/A | 1 | N/A | Yes | Request | | | | | ---+---+---+--+---+---+-- 0010b | RDMA Read | 1 | Valid| N/A | N/A | Yes | Response | | | | | ---+---+---+--+---+---+-- 0011b | Send | 0 | N/A | 0 | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 0100b | Send with | 0 | N/A | 0 | Valid | Yes | Invalidate| | | | | ---+---+---+--+---+---+-- 0101b | Send with | 0 | N/A | 0 | N/A | Yes | SE| | | | | ---+---+---+--+---+---+-- 0110b | Send with | 0 | N/A | 0 | Valid | Yes | SE and| | | | | | Invalidate| | | | | ---+---+---+--+---+---+-- 0111b | Terminate | 0 | N/A | 2 | N/A | Yes | | | | | | ---+---+---+--+---+---+-- 1000b | | to | Reserved | Not Specified b | | ---+---+- I want to take this opportunity to also mention that the RPC/RDMA client-server exchange does not support remote-invalidate currently. Because of the multiple stags supported by the rpcrdma chunking header, and because the client needs to verify that the stags were in fact invalidated, there is significant overhead, and the jury is out on that benefit. In fact, I suspect it's a lose at the client. Tom (Talpey). ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
At 02:58 PM 5/27/2008, Tom Tucker wrote: On Tue, 2008-05-27 at 12:39 -0400, Talpey, Thomas wrote: At 11:33 AM 5/27/2008, Tom Tucker wrote: So I think from an NFSRDMA coding perspective it's a wash... Just to be clear, you're talking about the NFS/RDMA server. However, it's pretty much a wash on the client, for different reasons. Tom: What client side memory registration strategy do you recommend if the default on the server side is fastreg? Whatever is fastest and safest. Given that the client and server won't necessarily be using the same hardware, nor the same kernel for that matter, I don't think we can or should legislate it. That said, I am hopeful that fastreg does turn out to be fast and therefore will become the only logical choice for the NFS/RDMA Linux client. But the future Linux client is only one such system. I cannot speak for others. Tom. On the performance side we are limited by the min size of the read/write-chunk element. If the client still gives the server a 4k chunk, the performance benefit (fewer PDU on the wire) goes away. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: mthca MR attrs userspace change
On Tue, May 27, 2008 at 11:33:55AM -0700, Roland Dreier wrote: ... Actually this tricked me... the kernel ABI didn't get bumped so there was no reason to bump the libmthca ABI. I'll actually use this slightly simpler patch: I was wondering about that... FWIW, the patch (which I deleted below) looked good to me. -- Arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core:MEM_MGT_EXTENSIONS support
At 02:40 PM 5/27/2008, Steve Wise wrote: Talpey, Thomas wrote: At 12:58 PM 5/27/2008, Felix Marti wrote: RDMA Read with Local Invalidate does not affect the wire. The 'must invalidate' state is kept in the RNIC that issues the RDMA Read Request... Aha, okay that was not clear to me. What information does the RNIC use to line up the arrival of the RDMA Read response with the must invalidate state? The rnic already tracks outstanding read requests. It now also will track the local stag to invalidate when the read completes. Ah - okay, so the stag that actually gets invalidated was provided with the RDMA Read request posting, and is not necessarily the stag that arrived in the peer's RDMA Read response. That helps. What happens if the upper layer gives up and invalidates the stag itself, and the peer's RDMA Read response arrives later? Nothing bad, I assume, and the peer's response is denied? Also, how does the RNIC signal whether the invalidation actually occurred, so the upper layer can defend itself from attack? The stag is guaranteed to be in the invalid state by the time the app reaps the read-inv-local work completion... Ok, given my correct understanding of the source of the stag above. Tom. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: mthca MR attrs userspace change
On Tue, May 27, 2008 at 03:18:25PM -0400, Pete Wyckoff wrote: ... Only CQ changes need dmasync, apparently. Right. -- Arthur ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core:MEM_MGT_EXTENSIONS support
Talpey, Thomas wrote: At 02:40 PM 5/27/2008, Steve Wise wrote: Talpey, Thomas wrote: At 12:58 PM 5/27/2008, Felix Marti wrote: RDMA Read with Local Invalidate does not affect the wire. The 'must invalidate' state is kept in the RNIC that issues the RDMA Read Request... Aha, okay that was not clear to me. What information does the RNIC use to line up the arrival of the RDMA Read response with the must invalidate state? The rnic already tracks outstanding read requests. It now also will track the local stag to invalidate when the read completes. Ah - okay, so the stag that actually gets invalidated was provided with the RDMA Read request posting, and is not necessarily the stag that arrived in the peer's RDMA Read response. That helps. What happens if the upper layer gives up and invalidates the stag itself, and the peer's RDMA Read response arrives later? Nothing bad, I assume, and the peer's response is denied? It behaves just like any other tagged message arriving and the target stag is invalid. The connection is torn down via an RDMAP TERMINATE... Also, how does the RNIC signal whether the invalidation actually occurred, so the upper layer can defend itself from attack? The stag is guaranteed to be in the invalid state by the time the app reaps the read-inv-local work completion... Ok, given my correct understanding of the source of the stag above. Tom. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH v2] IB/ipoib: copy small SKBs in CM mode
thanks, applied for 2.6.27... +/* + * we rely on this condition when copying small skbs and we + * pass ownership of the first fragment only. + */ +if (SKB_TSHOLD IPOIB_CM_HEAD_SIZE) { +printk(%s: SKB_TSHOLD(%d) must not be larger then %d\n, + THIS_MODULE-name, SKB_TSHOLD, IPOIB_CM_HEAD_SIZE); +return -EINVAL; +} I changed this to a build bug, to avoid waiting until runtime to notice this problem: + /* +* When copying small received packets, we only copy from the +* linear data part of the SKB, so we rely on this condition. +*/ + BUILD_BUG_ON(IPOIB_CM_COPYBREAK IPOIB_CM_HEAD_SIZE); ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core:MEM_MGT_EXTENSIONS support
Talpey, Thomas wrote: At 03:59 PM 5/27/2008, Steve Wise wrote: Talpey, Thomas wrote: What happens if the upper layer gives up and invalidates the stag itself, and the peer's RDMA Read response arrives later? Nothing bad, I assume, and the peer's response is denied? It behaves just like any other tagged message arriving and the target stag is invalid. The connection is torn down via an RDMAP TERMINATE... I was wondering more about the dangling stag reference that the original work request carried. Normally, it would reference the still-valid stag, but if that stag was torn down (causing the invalidation to point to nothing), or worse, re-bound (causing it to point at something else!), then it's a possible issue? Sorry to seem paranoid here. Storage is pretty sensitive to silent data corruption avenues. Because they always find a way to happen. These WRs allow the user to shoot himself in the foot. You're describe one such case... Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [ANNOUNCE] libmthca 1.0.5 released
libmthca is a userspace driver for Mellanox InfiniBand HCAs. It is a plug-in module for libibverbs that allows programs to use Mellanox hardware directly from userspace. A new stable release, libmthca 1.0.5, is available from http://www.openfabrics.org/downloads/mlx4/libmthca-1.0.5.tar.gz with sha1sum a68b1de47d320546c7bcc92bfa9c482f7d74fac1 /data/home/roland/libmthca-1.0.5.tar.gz I also tagged the 1.0.5 release of libmthca and pushed it out to my git tree on kernel.org: git://git.kernel.org/pub/scm/libs/infiniband/libmthca.git (the name of the tag is libmthca-1.0.5). Builds for the Ubuntu 7.10 and 8.04 releases will be available by adding the lines (replacing hardy by gutsy if needed) deb http://ppa.launchpad.net/roland-digitalvampire/ubuntu hardy main deb-src http://ppa.launchpad.net/roland-digitalvampire/ubuntu hardy main to your /etc/sources.list file, and updated Debian and Fedora packages will work their way into the main archives. This release fixes several bugs and adds support for the new kernel interface for mr attrs (which will be shipped in 2.6.26). The complete list of changes since 1.0.4 is: Eli Cohen (3): Ensure an receive WQEs are in memory before linking to chain Remove checks for srq-first_free 0 IB/ib_mthca: Pre-link receive WQEs in Tavor mode Jack Morgenstein (1): Clear context struct at allocation time Michael S. Tsirkin (2): Fix posting 255 recv WRs for Tavor Set cleaned CQEs back to HW ownership when cleaning CQ Roland Dreier (21): Fix paths in Debian install files for libibverbs 1.1 Update Debian changelog debian/rules: Remove DEB_DH_STRIP_ARGS Fix handling of send CQE with error for QPs connected to SRQ Add missing wmb() in mthca_tavor_post_send() Remove deprecated ${Source-Version} from debian/control Clean up NVALGRIND comment in config.h.in Fix Valgrind annotations so they can actually be built Remove ibv_driver_init from linker version script Fix spec file License: tag Mark driver file in sysconfdir with %config Update Debian policy version to 3.7.3 Fix Valgrind false positives in mthca_create_cq() and mthca_create_srq() Add debian/watch file Update Debian build to avoid setting RPATH Change openib.org URLs to openfabrics.org URLs Fix CQ cleanup when QP is destroyed Update libmthca to handle new kernel ABI Include spec file changes from Fedora CVS Remove %config tag from mthca.driver file Roll libmthca-1.0.5 release ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] OpenSM?
Charles, Ira Weiny wrote: Charles, Here at LLNL we have been running OpenSM for some time. Thus far we are very happy with it's performance. Our largest cluster is 1152 nodes and OpenSM can bring it up (not counting boot time) in less than a minute. OpenSM is successfully running on some large clusters with 4-5K nodes. It takes about 2-3 minutes to bring up such clusters. Here are some details. We are running v3.1.10 of OpenSM with some minor modifications (mostly patches which have been submitted upstream and been accepted by Sasha but are not yet in a release.) Our clusters are all Fat-tree topologies. We have a node which is more or less dedicated to running OpenSM. We have some other monitoring software running on it, but OpenSM can utilize the CPU/Memory if it needs to. A) On our large clusters this node is a 4 socket, dual core (8 cores total) Opteron running at 2.4Gig with 16Gig of memory. I don't believe OpenSM needs this much but the nodes were built all the same so this is what it got. B) On one of our smaller clusters (128 nodes) OpenSM is running on a dual socket, single core (2 core) 2.4Gig Opteron nodes with 2Gig of memory. We have not seen any issues with this cluster and OpenSM. We run with the up/down algorithm, ftree has not panned out for us yet. I can't say how that would compare to the Cisco algorithms. If the cluster topology is fat-tree, then there is a ftree and up/down routing. Ftree would be a good choice if you need LMC=0 (plus if the topology complies with certain fat-tree rules). For any other tree, or for LMC0, up/down should work. -- Yevgeny In short OpenSM should work just fine on your cluster. Hope this helps, Ira On Tue, 27 May 2008 11:15:14 -0400 Charles Taylor [EMAIL PROTECTED] wrote: We have a 400 node IB cluster.We are running an embedded SM in failover mode on our TS270/Cisco7008 core switches.Lately we have been seeing problems with LID assignment when rebooting nodes (see log messages below). It is also taking far too long for LIDS to be assigned as it takes on the order of minutes for the ports to transition to ACTIVE. This seems like a bug to us and we are considering switching to OpenSM on a host. I'm wondering about experience with running OpenSM for medium to large clusters (Fat Tree) and what resources (memory/cpu) we should plan on for the host node. Thanks, Charlie Taylor UF HPC Center May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]: ** NEW SWEEP May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover the subnet May 27 14:14:13 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM OUT_OF_SERVICE trap for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59 May 27 14:14:13 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:256]: An existing IB node GUID 00:02:c9:02:00:21:4b:59 LID 194 was removed May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM DELETE_MC_GROUP trap for GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:21:4b:59 May 27 14:14:14 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]: Topology changed May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by discovering removed ports May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events require sweep May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]: ** NEW SWEEP May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover the subnet May 27 14:16:28 topspin-270sc ib_sm.x[812]: [ib_sm_discovery.c:1009]: no routing required for port guid 00:02:c9:02:00:21:4b:59, lid 194 May 27 14:16:30 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]: Topology changed May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by discovering new ports May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by multicast membership change May 27 14:16:30 topspin-270sc ib_sm.x[812]: [ib_sm_assign.c:588]: Force port to go down due to LID conflict, node - GUID=00:02:c9:02:00:21:4b:58, port=1 May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:562]: Program port state, node=00:02:c9:02:00:21:4b:58, port= 16, current state 2, neighbor node=00:02:c9:02:00:21:4b:58, port= 1, current state 2 May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:733]: Failed to negotiate MTU, op_vl for node=00:02:c9:02:00:21:4b:58, port= 1, mad status 0x1c May 27 14:18:42 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM IN_SERVICE trap for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59 May 27 14:18:42 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:144]: A new IB node 00:02:c9:02:00:21:4b:59 was discovered and assigned LID 0 May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events require sweep May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
[ofa-general] Re: mthca MR attrs userspace change
libmthca-1.0.5 is now in Fedora 9 proposed updates -- http://koji.fedoraproject.org/koji/buildinfo?buildID=50682 I'm not sure exactly how it makes it into real F-9. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] MLX HCA: CQ request notification for multiple completions not implemented?
Thanks for your reply. I'm using one CQ for all the WRs. Do you know why there's no ARM-N support in MLX drivers? My concern is the performance. The overhead of software poll_cq loop is quite significant if there are multiple pieces of small amount of data to be transferred on both sender/receiver sides. For instance, on the sender, the data I have are 1k, 1k, 2k, 1k..., on the receiver side, the data size and blocks are the same, 1k, 1k, 2k, 1k Do you have a good solution for such kind of problem? Best, Yicheng Dotan Barak [EMAIL PROTECTED] 05/23/2008 01:27 PM To Yicheng Jia [EMAIL PROTECTED] cc general@lists.openfabrics.org Subject Re: [ofa-general] MLX HCA: CQ request notification for multiple completions not implemented? Hi. Yicheng Jia wrote: Hi Folks, I'm trying to use CQ Event notification for multiple completions (ARM_N) according to Mellanox Lx III user manual for scatter/gathering RDMA. However I couldn't find it in current MLX driver. It seems to me that only ARM_NEXT and ARM_SOLICIT are implemented. So if there are multiple work requests, I have to use poll_cq to synchronously wait until all the requests are done, is it correct? Is there a way to do asynchronous multiple send by subscribing for a ARM_N event? You are right: the low level drivers of Mellanox devices doesn't support ARM-N (This feature is supported by the devices, but it wasn't implemented in the low level drivers). You are right, in order to read all of the completions you need to use poll_cq. By the way: Do you have you have to create a completion for any WR? (if you are using one QP, this will maybe solve your problem). Dotan _ Scanned by IBM Email Security Management Services powered by MessageLabs. For more information please visit http://www.ers.ibm.com _ _ Scanned by IBM Email Security Management Services powered by MessageLabs. For more information please visit http://www.ers.ibm.com general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v4 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
enum ib_send_flags { @@ -676,6 +683,19 @@ struct ib_send_wr { u16 pkey_index; /* valid for GSI only */ u8 port_num; /* valid for DR SMPs on switch only */ } ud; + struct { + u64 iova_start; + struct ib_mr*mr; + struct ib_fast_reg_page_list*page_list; + unsigned intpage_shift; + unsigned intpage_list_len; + unsigned intfirst_byte_offset; + u32 length; + int access_flags; + } fast_reg; + struct { + struct ib_mr*mr; + } local_inv; } wr; }; Ok, while writing a test case for all this jazz, I see that passing the struct ib_mr pointer to both IB_WR_FAST_REGISTER_MR and IB_WR_INVALIDATE_MR is perhaps bad. Consider a chain of WRs: INVALIDATE_MR linked to a FAST_REGISTER_MR and passed to the provider via a single ib_post_send() call. You can't do that if you want to bump the key value between the invalidate and the fast_reg with the new key, which is probably what apps want to do. You are forced, under this proposed API, to post the two WRs separately and call ib_update_fast_reg_key() in between the ib_post_send() calls. Perhaps we should just pass in a u32 rkey for both WRs instead of the mr pointer? Then the code could put the old rkey in the invalidate WR, and the newly updated rkey in the fast_reg WR and chain the two together and do a single post. I think this is the way to go: change the fast_reg and local_inv unions to take a u32 rkey instead of a struct ib_mr *mr. Thoughts? Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] [PATCH RFC v4 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
Perhaps we should just pass in a u32 rkey for both WRs instead of the mr pointer? Then the code could put the old rkey in the invalidate WR, and the newly updated rkey in the fast_reg WR and chain the two together and do a single post. Makes sense to me. The only thing I would worry about would be if some device needs the actual mr struct pointer to post the work request, but mlx4 and I guess cxgb3 don't at least and I don't see a good reason why another device would. Let's go for it. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] ~After Effects CS 3 Pro~
~ Adobe CS3 Master Collection for PC or MAC includes: ~ InDesign CS3 ~ Photoshop CS3 ~ Illustrator CS3 ~ Acrobat 8 Professional ~ Flash CS3 Professional ~ Dreamweaver CS3 ~ Fireworks CS3 ~ Contribute CS3 ~ After Effects CS3 Professional ~ Premiere Pro CS3 ~ Encore DVD CS3 ~ Soundbooth CS3 ~ oemnewdeal . com in your Internet Exp1orer ~ System Requirements ~ For PC: ~ Intel Pentium 4 (1.4GHz processor for DV; 3.4GHz processor for HDV), Intel Centrino, Intel Xeon, (dual 2.8GHz processors for HD), or Intel Core ~ Duo (or compatible) processor; SSE2-enabled processor required for AMD systems ~ Microsoft Windows XP with Service Pack 2 or Microsoft Windows Vista Home Premium, Business, Ultimate, or Enterprise (certified for 32-bit editions) ~ 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components ~ 38GB of available hard-disk space (additional free space required during installation) ~ Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred ~ Microsoft DirectX compatible sound card (multichannel ASIO-compatible sound card recommended) ~ 1,280x1,024 monitor resolution with 32-bit color adapter ~ DVD-ROM drive ~ For MAC: ~ PowerPC G4 or G5 or multicore Intel processor (Adobe Premiere Pro, Encore, and Soundbooth require a multicore Intel processor; Adobe OnLocation CS3 is a Windows application and may be used with Boot Camp) ~ Mac OS X v.10.4.9; Java Runtime Environment 1.5 required for Adobe Version Cue CS3 Server ~ 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components ~ 36GB of available hard-disk space (additional free space required during installation) ~ Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred ~ Core Audio compatible sound card ~ 1,280x1,024 monitor resolution with 32-bit color adapter ~ DVD-ROM drive~ DVD+-R burner required for DVD creation U.N. agencies in Myanmar say international aid workers are finally moving into the Irrawaddy Delta. More visas for aid workers are being processed, and half a million people have now received food rations from the World Food Program. Dresden, Germany, has regained some of its prewar splendor, thanks to the restoration of its Baroque architectural masterpieces. The Elbe River Valley that runs through the city was declared a U.N. World Heritage site. But now, a new steel bridge being built across the river is threatening that status. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH v2 08/13] QLogic VNIC: sysfs interface implementation for the driver
+ssize_t vnic_create_primary(struct device *dev, +struct device_attribute *dev_attr, const char *buf, +size_t count) +ssize_t vnic_create_secondary(struct device *dev, + struct device_attribute *dev_attr, + const char *buf, size_t count) +ssize_t vnic_delete(struct device *dev, struct device_attribute *dev_attr, +const char *buf, size_t count) These are all only referenced from a sysfs attribute defined in the same file, so they can be made static (and don't need extern declarations in a header file). ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH v2 03/13] QLogic VNIC: Implementation of communication protocol with EVIC/VEx
+void viport_disconnect(struct viport *viport) +{ +VIPORT_FUNCTION(viport_disconnect()\n); +viport-disconnect = 1; +viport_failure(viport); +wait_event(viport-disconnect_queue, viport-disconnect == 0); +} + +void viport_free(struct viport *viport) +{ +VIPORT_FUNCTION(viport_free()\n); +viport_disconnect(viport); /* NOTE: this can sleep */ There are no other calls to viport_disconnect() that I can see, so it can be made static (and the declaration in vnic_viport.h can be dropped). in fact given how small the function is and the fact that it has only a single call site, it might be easier just to merge it into viport_free(). But that's a matter of taste. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general