[ofa-general] ofa_1_4_kernel 20090223-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.19 Failed: ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver
On Monday 23 February 2009 06:40, Roland Dreier wrote: Oh I see... we leave the sysfs stuff around way too long, since we want to use it for tracking the lifetime of our class device. the patch below fixes things for me here... there's still room for substantial cleanup but I think this gets the crashes fixed at least: I'm not sure that it does. This does not make sysfs access atomic wrt module unloading. I think an app can still lose it's timeslice while inside the sysfs access, and module unload can still occur while the app is waiting for a new time slice (although the code pages will not be removed as yet -- see below). While the module code pages will still be available, what prevents module cleanup from deleting all the module's resources? In this case, the app will succeed in invoking the low-level driver (its code is still loaded), but may cause an Oops when that low-level driver code attempts to access low-level driver data structures (which have been freed). What about the patch I just submitted? http://lists.openfabrics.org/pipermail/general/2009-February/057565.html ([ofa-general] [PATCH] ib_core: avoid race condition between sysfs access and low-level module unload) - Jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Too many calls to mlx4_CLOSE_PORT()?
Roland, browsing the code, I see that mlx4_CLOSE_PORT() gets called from, seemingly, too many places. I would expect it to get called only from __mlx4_ib_modify_qp() when QP0 gets closed, but mlx4_ib_remove() calls it too even though it is soon to be called by __mlx4_ib_modify_qp() due to destroying the MAD QP. It also gets called from mlx4_remove_one() even though by the time this function gets called, the port is already closed. Is there a reason for that? ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [PATCH 2.6.30] RDMA/cxgb3: Handle EEH events for active connections.
Roland Dreier wrote: - return (cxgb3_ofld_send(rdev_p-t3cdev_p, skb)); + return (iwch_cxgb3_ofld_send(rdev_p-t3cdev_p, skb)); minor but the parens around the function call are totally unnecessary. If we're touching the line anyway may as well leave them off. Sure. +static int iwch_post_qp_fatal(int id, void *p, void *data) +{ + struct ib_event event; + struct iwch_qp *qhp = p; + + event.event = IB_EVENT_DEVICE_FATAL; + event.device = qhp-ibqp.device; + event.element.qp = qhp-ibqp; + BUG_ON(qhp-rhp != data); + BUG_ON(qhp-wq.qpid != id); + if (qhp-ibqp.event_handler) { + PDBG(%s posting DEVICE_FATAL for qpid %u\n, + __func__, qhp-wq.qpid); + (*qhp-ibqp.event_handler)(event, qhp-ibqp.qp_context); This doesn't match the IB driver behavior (or the IB spec) -- the DEVICE_FATAL event is unaffiliated and delivered for the adapter as a whole. QP events are supposed to be for events connected to a single QP, not the whole adapter failing. I'll change this to QP_FATAL then. BTW I don't think you need the * here, do you? Would be easier to read to just call it like qhp-ibqp.event_handler(event, qhp-ibqp.qp_context) Ok. +int iwch_l2t_send(struct t3cdev *tdev, struct sk_buff *skb, struct l2t_entry *l2e) +{ + int error=0; + struct cxio_rdev *rdev; + + rdev = (struct cxio_rdev *)tdev-ulp; + if (rdev-flags) { Might be nice to wrap this rdev-flags test up in a trivial inline function (eg iwch_eeh_set() or something like that) in case other things get put into those flags later. Agreed. + kfree_skb(skb); + return -EIO; + } + error = l2t_send(tdev, skb, l2e); + if (error) + kfree_skb(skb); + return error; +} The kfree_skb() calls here change behavior -- eg you have the change: - l2t_send(ep-com.tdev, skb, ep-l2t); - return 0; + return iwch_l2t_send(ep-com.tdev, skb, ep-l2t); and now if l2t_send() returns an error the skb is freed, where before it wasn't. In looking at the l2t_send code, it doesn't free on failure, so I believe this was a memory leak in the existing error path. Also I'm wondering why you want these wrappers in iw_cxgb3 -- would it not make more sense for the cxgb3 l2t_send() to check the eeh state and always behave appropriately? Or is it more complicated than that? Maybe. Divy, what do you think? Steve. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] el5.3 backport of 1.4(.0)
hi all, i am preparing an upgrade from SL5.2 to SL5.3 (which are EL5 clones). one thing we would also like to look at is switching from OFED 1.3.2 to OFED 1.4. and one thing i noticed is that the necessary 5.3 backport fixes only exist in the current 1.4.1 daily snapshots. did anyone already try to backport the el5.3 backport fixes from 1.4.1 to 1.4.0? many thanks, stijn ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] OFED (EWG) meeting agenda for today (Feb 23)
Hi All, Due to unexpected thing I cannot attend the meeting today :-( I sent a mail to Gopal asking him to replace me but got no respond yet. If he can't maybe Woody or Betsy can In any case - these are the items that should be covered: a. OFED 1.4.1 release: 1. SLES 11 - backport progress - Jeff Becker 2. Open MPI 1.3.1 - Jeff Squyres 3. RDS with iWARP support - Steve Wise 4. NFS/RDMA backports - at least to RH 5.2/3 - Steve Wise 5. Critical bugs: 1287maj RHELja...@mellanox.co.il IPoIB datagram mode initial packet loss 1516cri RHELandy.gro...@oracle.com Kernel panic on RHAS4.x loading RDS Note: There is 1.4.1 release number in bugzilla - please change bug release number to 1.4.1 if you wish it to be fixed for OFED 1.4.1 b. Open discussion Tziporet ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] RE: OFED (EWG) meeting agenda for today (Feb 23)
Betsy can't make it today. I will be covering for her. Worst case, I will cover the items that you listed. -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, February 23, 2009 11:11 AM To: e...@lists.openfabrics.org Cc: general@lists.openfabrics.org Subject: [ewg] OFED (EWG) meeting agenda for today (Feb 23) Hi All, Due to unexpected thing I cannot attend the meeting today :-( I sent a mail to Gopal asking him to replace me but got no respond yet. If he can't maybe Woody or Betsy can In any case - these are the items that should be covered: a. OFED 1.4.1 release: 1. SLES 11 - backport progress - Jeff Becker 2. Open MPI 1.3.1 - Jeff Squyres 3. RDS with iWARP support - Steve Wise 4. NFS/RDMA backports - at least to RH 5.2/3 - Steve Wise 5. Critical bugs: 1287maj RHELja...@mellanox.co.il IPoIB datagram mode initial packet loss 1516cri RHELandy.gro...@oracle.com Kernel panic on RHAS4.x loading RDS Note: There is 1.4.1 release number in bugzilla - please change bug release number to 1.4.1 if you wish it to be fixed for OFED 1.4.1 b. Open discussion Tziporet ___ ewg mailing list e...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: [ewg] RE: OFED (EWG) meeting agenda for today (Feb 23)
John Russo wrote: Betsy can't make it today. I will be covering for her. Worst case, I will cover the items that you listed. Many thanks Tziporet ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: NFSRDMA connectathon prelim. testing status,
Vu: What memory registration model are you using? Vu Pham wrote: Hi Tom, I have both nfsrdma client and server on 2.6.29-rc5 kernel, nfs-utils-1.1.4. I'm using both Infinihost III (ib_mthca) and ConnectX (mlx4_ib) HCAs I have seen several problems during my testing at NFS Connectathon 2009 1. When I used ConnectX (mlx4_ib) HCAs on both client and server, the client can not mount. Talking to Tom Talpey and scanning the code, I saw that xprtrdma module is using ib_reg_phys_mr() and mlx4_ib verbs provider does not have the implementation for this verb. If I have client on mlx4_ib and server on ib_mthca, I hit the following crash because of bad error handling in xprtrdma (see file attached - mlx4_mount_problem.log) Because of this problem, I use InfiniHost III (ib_mthca) for all of my tests at Connectathon 2. Testing Linux nfsrdma client against both Linux and OpenSolaris nfsrdma servers, I hit the process hung problem during the connectathon's lock test (seeing sync_page_1.log and sync_page_2.log attached files). I can only reproduce it when I ran connectathon more than 500 iterations (-N 1000) I can NOT reproduce the problem with nfs client/server over IPoIB 3. Testing openSolaris nfsrdma client against linux nfsrdma server, I hit the following BUG_ON() right away(see file attached - svcrdma_send.log) thanks, -vu ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] opensm/osm_subnet: fix crash in qos string config parameters reloading
This fixes double free() crash in qos string config parameters reloading. Assuming that qos parameters can be specified using config file only we will always keep this in sync with options copy loaded from file. Signed-off-by: Sasha Khapyorsky sas...@voltaire.com --- On 09:40 Mon 23 Feb , Eli Dorfman (Voltaire) wrote: Command Line Arguments: Log File: /var/log/opensm.log - OpenSM 3.3.0_c4d9bcf [snip...] Using default GUID 0x2c9020022f019 Loading Cached Option:qos_vlarb_high = 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 *** glibc detected *** ./sbin/opensm: double free or corruption (!prev): 0x1bd932b0 *** This happens because qos string parameter is freed separately in subn_init_qos_options() and its mirror pointer in file config copy still refer already not allocated memory. Thanks for finding this. The patch should fix the issue. Sasha opensm/opensm/osm_subnet.c | 29 ++--- 1 files changed, 18 insertions(+), 11 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 01478be..b3100a4 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -640,7 +640,7 @@ static void subn_set_default_qos_options(IN osm_qos_options_t * opt) opt-sl2vl = OSM_DEFAULT_QOS_SL2VL; } -static void subn_init_qos_options(IN osm_qos_options_t * opt) +static void subn_init_qos_options(osm_qos_options_t *opt, osm_qos_options_t *f) { opt-max_vls = 0; opt-high_limit = -1; @@ -653,6 +653,8 @@ static void subn_init_qos_options(IN osm_qos_options_t * opt) if (opt-sl2vl) free(opt-sl2vl); opt-sl2vl = NULL; + if (f) + memcpy(f, opt, sizeof(*f)); } /** @@ -743,11 +745,11 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) p_opt-no_clients_rereg = FALSE; p_opt-prefix_routes_file = strdup(OSM_DEFAULT_PREFIX_ROUTES_FILE); p_opt-consolidate_ipv6_snm_req = FALSE; - subn_init_qos_options(p_opt-qos_options); - subn_init_qos_options(p_opt-qos_ca_options); - subn_init_qos_options(p_opt-qos_sw0_options); - subn_init_qos_options(p_opt-qos_swe_options); - subn_init_qos_options(p_opt-qos_rtr_options); + subn_init_qos_options(p_opt-qos_options, NULL); + subn_init_qos_options(p_opt-qos_ca_options, NULL); + subn_init_qos_options(p_opt-qos_sw0_options, NULL); + subn_init_qos_options(p_opt-qos_swe_options, NULL); + subn_init_qos_options(p_opt-qos_rtr_options, NULL); } /** @@ -1192,11 +1194,16 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) return -1; } - subn_init_qos_options(p_opts-qos_options); - subn_init_qos_options(p_opts-qos_ca_options); - subn_init_qos_options(p_opts-qos_sw0_options); - subn_init_qos_options(p_opts-qos_swe_options); - subn_init_qos_options(p_opts-qos_rtr_options); + subn_init_qos_options(p_opts-qos_options, + p_opts-file_opts-qos_options); + subn_init_qos_options(p_opts-qos_ca_options, + p_opts-file_opts-qos_ca_options); + subn_init_qos_options(p_opts-qos_sw0_options, + p_opts-file_opts-qos_sw0_options); + subn_init_qos_options(p_opts-qos_swe_options, + p_opts-file_opts-qos_swe_options); + subn_init_qos_options(p_opts-qos_rtr_options, + p_opts-file_opts-qos_rtr_options); while (fgets(line, 1023, opts_file) != NULL) { /* get the first token */ -- 1.6.1.2.319.gbd9e ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH] opensm/main.c: remove enable_stack_dump() call
enable_stack_dump() symbol was defined in already removed libibcommon. There still be conditional (undef #ifdef _DEBUG_) call to this function in opensm/main.c which breaks build opensm linkage when --enable-debug configured. Removing this. Signed-off-by: Sasha Khapyorsky sas...@voltaire.com --- opensm/opensm/main.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index e22c2c4..47fd658 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -596,9 +596,6 @@ int main(int argc, char *argv[]) osm_is_debug(), cl_is_debug()); exit(1); } -#if defined (_DEBUG_) defined (OSM_VENDOR_INTF_OPENIB) - enable_stack_dump(1); -#endif printf(-\n); printf(%s\n, OSM_VERSION); -- 1.6.1.2.319.gbd9e ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: NFSRDMA connectathon prelim. testing status,
Tom, Vu: What memory registration model are you using? It is 6 (when the connection/mount established) Vu Pham wrote: Hi Tom, I have both nfsrdma client and server on 2.6.29-rc5 kernel, nfs-utils-1.1.4. I'm using both Infinihost III (ib_mthca) and ConnectX (mlx4_ib) HCAs I have seen several problems during my testing at NFS Connectathon 2009 1. When I used ConnectX (mlx4_ib) HCAs on both client and server, the client can not mount. Talking to Tom Talpey and scanning the code, I saw that xprtrdma module is using ib_reg_phys_mr() and mlx4_ib verbs provider does not have the implementation for this verb. If I have client on mlx4_ib and server on ib_mthca, I hit the following crash because of bad error handling in xprtrdma (see file attached - mlx4_mount_problem.log) Because of this problem, I use InfiniHost III (ib_mthca) for all of my tests at Connectathon 2. Testing Linux nfsrdma client against both Linux and OpenSolaris nfsrdma servers, I hit the process hung problem during the connectathon's lock test (seeing sync_page_1.log and sync_page_2.log attached files). I can only reproduce it when I ran connectathon more than 500 iterations (-N 1000) I can NOT reproduce the problem with nfs client/server over IPoIB 3. Testing openSolaris nfsrdma client against linux nfsrdma server, I hit the following BUG_ON() right away(see file attached - svcrdma_send.log) thanks, -vu ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] Re: NFSRDMA connectathon prelim. testing status,
At 01:03 PM 2/23/2009, Vu Pham wrote: Tom, Vu: What memory registration model are you using? It is 6 (when the connection/mount established) i.e. all physical (get_dma_mr). Long chunk lists due to discontiguous physical pages. We'll try with ConnectX and frmr's later today here at Connectathon. This will reduce the chunk lists to roughly three entries (head, pages, tail). With the two assertions disabled, we're again passing all general and special tests from the OpenSolaris client, btw. :-) Tom. Vu Pham wrote: Hi Tom, I have both nfsrdma client and server on 2.6.29-rc5 kernel, nfs-utils-1.1.4. I'm using both Infinihost III (ib_mthca) and ConnectX (mlx4_ib) HCAs I have seen several problems during my testing at NFS Connectathon 2009 1. When I used ConnectX (mlx4_ib) HCAs on both client and server, the client can not mount. Talking to Tom Talpey and scanning the code, I saw that xprtrdma module is using ib_reg_phys_mr() and mlx4_ib verbs provider does not have the implementation for this verb. If I have client on mlx4_ib and server on ib_mthca, I hit the following crash because of bad error handling in xprtrdma (see file attached - mlx4_mount_problem.log) Because of this problem, I use InfiniHost III (ib_mthca) for all of my tests at Connectathon 2. Testing Linux nfsrdma client against both Linux and OpenSolaris nfsrdma servers, I hit the process hung problem during the connectathon's lock test (seeing sync_page_1.log and sync_page_2.log attached files). I can only reproduce it when I ran connectathon more than 500 iterations (-N 1000) I can NOT reproduce the problem with nfs client/server over IPoIB 3. Testing openSolaris nfsrdma client against linux nfsrdma server, I hit the following BUG_ON() right away(see file attached - svcrdma_send.log) thanks, -vu ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] build warnings on rhel4 U6
On Mon, 2009-02-23 at 09:28 +0200, Jack Morgenstein wrote: In the backport spinlock.h file, try the following: #ifndef assert_spin_locked #define assert_spin_locked(lock) do { (void)(lock); } while(0) #endif Indeed. That would be a solution for the end-user but that doesn't help us as a third-party software developer (i.e. being restricted to building our software with GA releases of OFED -- so that our release doesn't turn into a patching nightmare for our end-users). Indeed, this probably should have been a BZ filing as my goal was equally as much to alert somebody to the problem to ensure future releases don't have the same problem. Cheers and many thanks for the input. b. signature.asc Description: This is a digitally signed message part ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: NFSRDMA connectathon prelim. testing status,
Tom, What memory registration model are you using? It is 6 (when the connection/mount established) Vu Pham wrote: 2. Testing Linux nfsrdma client against both Linux and OpenSolaris nfsrdma servers, I hit the process hung problem during the connectathon's lock test (seeing sync_page_1.log and sync_page_2.log attached files). I can only reproduce it when I ran connectathon more than 500 iterations (-N 1000) I can NOT reproduce the problem with nfs client/server over IPoIB With mem_reg=4, I can not reproduce this problem (running against both OpenSolaris and Linux servers. 3. Testing openSolaris nfsrdma client against linux nfsrdma server, I hit the following BUG_ON() right away(see file attached - svcrdma_send.log) After disable two BUG_ON(), we can run test multiple times without problem yet -vu ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[ofa-general] [PATCH v2] RDMA/cxgb3: Handle EEH events for active connections.
- wrapper calls into cxgb3 and fail them if we're in the middle of an eeh event. - correctly unwind and release endpoint and other resources when we are in an EEH event. - post QP_FATAL event on all active QPs when cxgb3 notifies iw_cxgb3 of a fatal error. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb3/cxio_hal.c | 10 ++-- drivers/infiniband/hw/cxgb3/cxio_hal.h |6 ++ drivers/infiniband/hw/cxgb3/iwch.c | 26 + drivers/infiniband/hw/cxgb3/iwch.h |5 ++ drivers/infiniband/hw/cxgb3/iwch_cm.c | 90 +++- drivers/infiniband/hw/cxgb3/iwch_qp.c |4 + 6 files changed, 107 insertions(+), 34 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index eeae5f5..1db88dd 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -152,7 +152,7 @@ static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid) sge_cmd = qpid 8 | 3; wqe-sge_cmd = cpu_to_be64(sge_cmd); skb-priority = CPL_PRIORITY_CONTROL; - return (cxgb3_ofld_send(rdev_p-t3cdev_p, skb)); + return iwch_cxgb3_ofld_send(rdev_p-t3cdev_p, skb); } int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) @@ -571,7 +571,7 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) (unsigned long long) rdev_p-ctrl_qp.dma_addr, rdev_p-ctrl_qp.workq, 1 T3_CTRL_QP_SIZE_LOG2); skb-priority = CPL_PRIORITY_CONTROL; - return (cxgb3_ofld_send(rdev_p-t3cdev_p, skb)); + return iwch_cxgb3_ofld_send(rdev_p-t3cdev_p, skb); err: kfree_skb(skb); return err; @@ -701,7 +701,7 @@ static int __cxio_tpt_op(struct cxio_rdev *rdev_p, u32 reset_tpt_entry, u32 stag_idx; u32 wptr; - if (rdev_p-flags) + if (cxio_fatal_error(rdev_p)) return -EIO; stag_state = stag_state 0; @@ -858,7 +858,7 @@ int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr) wqe-qp_dma_size = cpu_to_be32(attr-qp_dma_size); wqe-irs = cpu_to_be32(attr-irs); skb-priority = 0; /* 0=ToeQ; 1=CtrlQ */ - return (cxgb3_ofld_send(rdev_p-t3cdev_p, skb)); + return iwch_cxgb3_ofld_send(rdev_p-t3cdev_p, skb); } void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb) @@ -1024,9 +1024,9 @@ void cxio_rdev_close(struct cxio_rdev *rdev_p) cxio_hal_pblpool_destroy(rdev_p); cxio_hal_rqtpool_destroy(rdev_p); list_del(rdev_p-entry); - rdev_p-t3cdev_p-ulp = NULL; cxio_hal_destroy_ctrl_qp(rdev_p); cxio_hal_destroy_resource(rdev_p-rscp); + rdev_p-t3cdev_p-ulp = NULL; } } diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h index 9ed65b0..2fd5d03 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.h +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h @@ -112,6 +112,11 @@ struct cxio_rdev { #defineCXIO_ERROR_FATAL1 }; +static inline int cxio_fatal_error(struct cxio_rdev *rdev_p) +{ + return (rdev_p-flags CXIO_ERROR_FATAL); +} + static inline int cxio_num_stags(struct cxio_rdev *rdev_p) { return min((int)T3_MAX_NUM_STAG, (int)((rdev_p-rnic_info.tpt_top - rdev_p-rnic_info.tpt_base) 5)); @@ -185,6 +190,7 @@ void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count); void cxio_flush_hw_cq(struct t3_cq *cq); int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe, u8 *cqe_flushed, u64 *cookie, u32 *credit); +int iwch_cxgb3_ofld_send(struct t3cdev *tdev, struct sk_buff *skb); #define MOD iw_cxgb3: #define PDBG(fmt, args...) pr_debug(MOD fmt, ## args) diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c index 37a4fc2..3548861 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.c +++ b/drivers/infiniband/hw/cxgb3/iwch.c @@ -162,15 +162,37 @@ static void close_rnic_dev(struct t3cdev *tdev) mutex_unlock(dev_mutex); } +static int iwch_post_qp_fatal(int id, void *p, void *data) +{ + struct ib_event event; + struct iwch_qp *qhp = p; + + event.event = IB_EVENT_QP_FATAL; + event.device = qhp-ibqp.device; + event.element.qp = qhp-ibqp; + BUG_ON(qhp-rhp != data); + BUG_ON(qhp-wq.qpid != id); + if (qhp-ibqp.event_handler) { + PDBG(%s posting QP_FATAL for qpid %u\n, + __func__, qhp-wq.qpid); + (*qhp-ibqp.event_handler)(event, qhp-ibqp.qp_context); + } + return 0; +} + static void iwch_err_handler(struct t3cdev *tdev, u32 status, u32 error) { struct cxio_rdev *rdev = tdev-ulp; + struct iwch_dev *rnicp = rdev_to_iwch_dev(rdev); - if (status == OFFLOAD_STATUS_DOWN) + if (status == OFFLOAD_STATUS_DOWN) {
[ofa-general] [PATCH] [ib-diag] saquery: add support for WinOF
A lot of type casting with include fix-ups. Luckily, because the macro CHECK_AND_SET_VAL() was added, I could add type casts into the macro and avoid sprinkling even more throughout the code. Signed-off-by: Sean Hefty sean.he...@intel.com --- infiniband-diags/src/saquery.c | 80 ++-- 1 files changed, 44 insertions(+), 36 deletions(-) diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c index 9726d22..9d5f475 100644 --- a/infiniband-diags/src/saquery.c +++ b/infiniband-diags/src/saquery.c @@ -37,20 +37,25 @@ * */ +#if HAVE_CONFIG_H +# include config.h +#endif /* HAVE_CONFIG_H */ + #include unistd.h #include stdio.h #include arpa/inet.h #include ctype.h #include string.h #include errno.h +#include assert.h #define _GNU_SOURCE #include getopt.h #include infiniband/umad.h #include infiniband/mad.h -#include infiniband/iba/ib_types.h -#include infiniband/complib/cl_nodenamemap.h +#include iba/ib_types.h +#include complib/cl_nodenamemap.h #include ibdiag_common.h @@ -170,7 +175,7 @@ recv_mad: if (ibdebug 1) xdump(stdout, SA Response:\n, mad, len); - method = mad_get_field(mad, 0, IB_MAD_METHOD_F); + method = (uint8_t) mad_get_field(mad, 0, IB_MAD_METHOD_F); offset = mad_get_field(mad, 0, IB_SA_ATTROFFS_F); result.status = mad_get_field(mad, 0, IB_MAD_STATUS_F); result.p_result_madw = mad; @@ -189,12 +194,12 @@ recv_mad: static void *get_query_rec(void *mad, unsigned i) { int offset = mad_get_field(mad, 0, IB_SA_ATTROFFS_F); - return mad + IB_SA_DATA_OFFS + i * (offset 3); + return (char *) mad + IB_SA_DATA_OFFS + i * (offset 3); } static unsigned valid_gid(ib_gid_t *gid) { - ib_gid_t zero_gid = { }; + ib_gid_t zero_gid = { 0 }; return memcmp(zero_gid, gid, sizeof(*gid)); } @@ -442,7 +447,7 @@ static void dump_multicast_member_record(void *data) char gid_str2[INET6_ADDRSTRLEN]; ib_member_rec_t *p_mcmr = data; uint16_t mlid = cl_ntoh16(p_mcmr-mlid); - int i = 0; + unsigned i = 0; char *node_name = unknown; /* go through the node records searching for a port guid which matches @@ -758,7 +763,7 @@ static void dump_one_mft_record(void *data) static void dump_results(struct query_res *r, void (*dump_func) (void *)) { - int i; + unsigned i; for (i = 0; i r-result_cnt; i++) { void *data = get_query_rec(r-p_result_madw, i); dump_func(data); @@ -768,7 +773,7 @@ static void dump_results(struct query_res *r, void (*dump_func) (void *)) static void return_mad(void) { if (result.p_result_madw) { - free(result.p_result_madw - umad_size()); + free((char *) result.p_result_madw - umad_size()); result.p_result_madw = NULL; } } @@ -839,7 +844,8 @@ get_lid_from_name(bind_handle_t h, const char *name, uint16_t* lid) { ib_node_record_t *node_record = NULL; ib_node_info_t *p_ni = NULL; - int i = 0, ret; + unsigned i; + int ret; ret = get_all_records(h, IB_SA_ATTR_NODERECORD, 0); if (ret) @@ -869,7 +875,7 @@ static uint16_t get_lid(bind_handle_t h, const char *name) if (isalpha(name[0])) assert(get_lid_from_name(h, name, rc_lid) == IB_SUCCESS); else - rc_lid = atoi(name); + rc_lid = (uint16_t) atoi(name); if (rc_lid == 0) fprintf(stderr, Failed to find lid for \%s\\n, name); return rc_lid; @@ -917,8 +923,8 @@ static int parse_lid_and_ports(bind_handle_t h, #define cl_hton8(x) (x) #define CHECK_AND_SET_VAL(val, size, comp_with, target, name, mask) \ - if (val comp_with) { \ - target = cl_hton##size(val); \ + if ((uint##size##_t) val (uint##size##_t) comp_with) { \ + target = cl_hton##size((uint##size##_t) val); \ comp_mask |= IB_##name##_COMPMASK_##mask; \ } @@ -951,7 +957,8 @@ static int get_issm_records(bind_handle_t h, ib_net32_t capability_mask) static int print_node_records(bind_handle_t h) { - int i = 0, ret; + unsigned i; + int ret; ret = get_all_records(h, IB_SA_ATTR_NODERECORD, 0); if (ret) @@ -1027,7 +1034,7 @@ static int query_path_records(const struct query_cmd *q, bind_handle_t h, CHECK_AND_SET_VAL(p-dlid, 16, 0, pr.dlid, PR, DLID); CHECK_AND_SET_VAL(p-hop_limit, 32, -1, pr.hop_flow_raw, PR, HOPLIMIT); CHECK_AND_SET_VAL(p-flow_label, 8, 0, flow, PR, FLOWLABEL); - pr.hop_flow_raw |= cl_hton32(flow 8); + pr.hop_flow_raw |= (uint8_t) cl_hton32(flow 8); CHECK_AND_SET_VAL(p-tclass, 8, 0, pr.tclass, PR, TCLASS); CHECK_AND_SET_VAL(p-reversible, 8, -1, reversible, PR, REVERSIBLE); CHECK_AND_SET_VAL(p-numb_path, 8, -1, pr.num_path, PR,
[ofa-general] Bandwidth of performance with multirail IB
I have implemented a uDAPL program to measure the bandwidth on IB with multirail connections. The HCA used in the cluster is Mellanox ConnectX HCA. Each HCA has two ports. The program utilize the two port on each node of cluster to build multirail IB connections. The peak bandwidth I can get is ~ 1.3 GB/s (not bi-directional), which is almost the same as single rail connections. Does anyone have similar experience? ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver
On Monday 23 February 2009 20:31, Roland Dreier wrote: I'm not sure that it does. This does not make sysfs access atomic wrt module unloading. I think an app can still lose it's timeslice while inside the sysfs access, and module unload can still occur while the app is waiting for a new time slice (although the code pages will not be removed as yet -- see below). Not sure I follow... the low-level driver must handle requests until ib_unregister_device() returns, and with the change I proposed, ib_unregister_device() will not return until all sysfs files are gone (and no open file handles remain). What about the patch I just submitted? I'd rather not add a superfluous mutex that adds complexity when a simpler solution is available. You're right, your solution does work. I was just concerned that the unregister-sysfs calls would simply prevent new accessors from seeing the files, but would return before the file reference count reached zero (thus allowing low-level driver cleanup while current accessors were still in progress). I checked, and this does not happen. As you mention in your answer, the unregister-sysfs calls do not return while someone still has an open file handle on these files. - Jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general