[ewg] RE: OFED 1.4.2 requires rebuild of kernel modules on each node
Sending this to the EWG openfabrics list, since this seems to be an OFED build/installation issue rather than a general code problem. One thing that you might try is to instead of copying the entire build directory and re-runing ./install.pl -c ofed.conf on each system, instead, after building on one node, just copy the binrary RPMS directory and the uninstall script to the other nodes, Then just run the uninstall script and install the RPMS manually... e.g, ./uninstall.sh cd ./RPMS/redhat-release-/x86_64 rpm -i * This method has worked for me in the past. woody -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Bryan Sent: Thursday, October 22, 2009 11:22 AM To: linux-r...@vger.kernel.org Subject: OFED 1.4.2 requires rebuild of kernel modules on each node I was referred to this list by the general mailing list on OFED. Emailing from my personal address since Lotus Notes insists that anything it sends has to contain some portion of HTML. This problem was observed on Red Hat Enterprise Linux 5 update 3. I searched the list but did not see anything immediately applicable. We've seen issues similar in the past where we were able to modify the script to solve an RPM that didn't match the expected naming scheme, but did not see anything immediately when looking at the scripts for this version. Copied from an internal bug reporting tool: On installing OFED 1.4.2, the tarball was extracted, in directory the code was extracted to, ./install.pl was run and all components of OFED were build/installed with the default settings. Then this directory was copied to another node, and ./install.pl -c ofed.conf was run. Previously this would just do the install of the already built components, but with OFED 1.4.2, the kernel RPM gets re-built when this is done. This means that the build tools have to be on each node, and that deployment of OFED takes longer. Bryan Reese -- bre...@us.ibm.com e1350 Linux Cluster Test Engineer -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr
>For ipv6 I ran what I described previously. What I do need to do is add >the option to rping to specify a source address and run it with various >address. Any help you can give defining what exactly needs to be tested >would be appreciated. You can also test with ucmatose to verify ipv4 still works. Use the -b option to bind to a specific address. - Sean ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr
On Wed, 2009-10-21 at 17:08 -0600, Jason Gunthorpe wrote: > This looks exactly like what I was thinking of - have you tested this? Yes I did do some testing, but that brings up a good question. I am not sure I know what all should be tested? I am running rping with different destination address (and scoping). On the ipv4 side: rping -c -a rping -c -a For ipv6 I ran what I described previously. What I do need to do is add the option to rping to specify a source address and run it with various address. Any help you can give defining what exactly needs to be tested would be appreciated. > > If it is OK, I'd make it the first in the series. > > There were two things I was not sure about in my example. > 1) Is 'init_net.loopback_dev' the correct reference for the loop device? Or > is it something like dev_net(rt->idev->dev)->loopback_dev ? > > I'm sensing it may be the latter, but can't investigate right now > Donno much about this new namespace stuff really I think you may be correct I will look at that closer. I did explicitly verify the test worked in both cases. > > 2) Was rt->idev->dev the right choice for the ipv4 case? Or is it > rt->u.dst.dev ? > > The TCP case kinda looks like > int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) > tmp = ip_route_connect(&rt, nexthop, inet->saddr, >RT_CONN_FLAGS(sk), sk->sk_bound_dev_if, >IPPROTO_TCP, >inet->sport, usin->sin_port, sk, 1); > sk_setup_caps(sk, &rt->u.dst); > > void sk_setup_caps(struct sock *sk, struct dst_entry *dst) > __sk_dst_set(sk, dst); > > And all later things key off the sk_get_dst. So I'm thinking > that u.dst.dev might be correct. > > I have no idea what the difference is though (can't look too hard > right now) > > The main other fixup I see is to remove > ret = cma_bind_addr(id, src_addr, dst_addr); > > From rdma_resolve_addr and rely on the routing lookup in > addr_resolve_remote called by addr_resolve_ip to setup the bind device > from the routing lookup. (This is what I mentioned in my last email) > > Which then lets you fixup the checking and handling of the > sin6_scopeid on the source address - and fixes the main other routing > difference against the TCP stack. > > Thanks for working on this! > > Jason Lots of discussion :) I will go through the mails, address the comments and post the entire series of patches. Thanks for all your input. Dave. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: Agenda for EWG/OFED meeting on next Monday
I also did a quick test of the OFED-1.5-RDMA stack with Intel MPI on EL5.3, x86_64 and Itianium. I was able to get both to run OK, although on Itanium, the startup script still tries to load the MLX4 driver and it fails to load. If I disable that, it all seems to work fine. woody From: Woodruff, Robert J Sent: Monday, October 19, 2009 10:13 AM To: 'Tziporet Koren'; ewg@lists.openfabrics.org Subject: RE: Agenda for EWG/OFED meeting on next Monday For my team, we have been testing the following on small clusters, 16 nodes or less. OS - RHEL 5.3 and 5.4 Arch: - X86_64, ia64 ULPs OpenSM, Intel MPI over IPoIB, Intel MPI over uDAPL, ibutils and management tools IHVs Mellanox, mthca and mlx4 Intel (NetEffect) iWarp For uDAPL, we are testing the latest package on a cluster of 338 nodes with Intel MPI, but that cluster is still runing the older base OFED. From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, October 19, 2009 8:31 AM To: Tziporet Koren; ewg@lists.openfabrics.org Subject: [ewg] RE: Agenda for EWG/OFED meeting on next Monday Mellanox testing for OFED 1.5 == Mellanox test OFED-RDMA package on most systems, and only few machines on OFED. We test All Mellanox HCAs with main focus on ConnectX and ConnectX-2 with QDR OS: - RHEL4: up6, up7, up8 - RHEL5: up2, up3, up4 - SLES10 SP2 - SLES10 SP3 (not started) - SLES11 - OEL5 up2 - CentOS5: up2, up3 - Kernel.org: 2.6.29, 2.6.30 Arch: - X64 - x86_64 - ppc64 - ia64 - partial testing only ULPs: - mvapich - Open MPI - IPoIB (with bonding too) - SDP - SRP - RDS - NFS/RDMA - Performance tests Management: - OpenSM on the host - Management utilities - ibutils ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ofa_1_5_kernel 20091022-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.18-164.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg