[ewg] ofa_1_5_kernel 20100208-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: Build failed on x86_64 with linux-2.6.18-164.el5 Log: /home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1832: warning: assignment from incompatible pointer type /home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c: In function 'iscsi_transport_init': /home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1935: warning: passing argument 3 of 'netlink_kernel_create' from incompatible pointer type /home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1949: error: implicit declaration of function 'netlink_kernel_release' make[3]: *** [/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.18-164.el5' make: *** [kernel] Error 2 -- ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Teleconf NEXT TUESDAY: Feb 16
WHAT: Need to pick a time for an EWG teleconf next Tues, Feb 16, 2010 (Mon, Feb 15 is a holiday for some in the US). WHY: A bunch of important issues came up on the EWG call today that require discussion in the near term (rather than just waiting 2 weeks for the next EWG teleconf). HOW: Please visit http://doodle.com/iicpg5scd9hk23nu before Wednesday and fill out your availability for an EWG teleconf. I will pick a time with maximum availability overlap around 6pm US Eastern on Wed, Feb 10, 2010, setup a webex for that time, and send an Outlook invite to the EWG list. Thanks! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices
Sean Hefty wrote: IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback. I disagree, but what does it matter? So, we add a 'software' loopback that uses 127.0.0.1. Openmpi still wouldn't work. I guess that's true. I will commit to get the fix in openmpi asap. If we don't care if the fix is in the kernel or user space, then we could add an a 'disable-loopback-support' build option to librdmacm, which can fail any attempt to bind to a loopback address. I'd rather see it removed from 2.6.33 kernel before it shipts, and then we fix openmpi, and then re-submit 127.0.0.1 support once openmpi publishes a release with its fix. See my other email that submits a potential commit to remove 127.0.0.1 support for 2.6.33. Steve. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
Sorry -- I missed many of these mails today due to mail filtering (don't ask). FWIW: - I'm not opposed to adding LOOPBACK checks into OMPI to avoid this problem (I'm waiting for a patch, actually). I'm just saying that we're not going to get a release out immediately with this fix. Our next release was scheduled to be 1.4.2, and it is still at least several weeks away. So allowing this in 2.6.33 would be Bad because a) we know it breaks OMPI, and b) OMPI can't get a release out immediately to fix the issue. - There are customers who are using RDMA CM with IB (e.g., Sandia with their Mesh/IB routing stuff). - I see the following in rdma_bind_addr(3): - DESCRIPTION Associates a source address with an rdma_cm_id. The address may be wildcarded. If binding to a specific local address, the rdma_cm_id will also be bound to a local RDMA device. - What RDMA device is bound to when you use 127.0.0.1? I'm not 100% sure, but I think that this might be where we got the rationale that we didn't need additional LOOPBACK tests in OMPI... (if anyone else agrees with this interpretation, then it's at least one argument that allowing binding to LOOPBACK devices *is* a change in semantics, and therefore should be treated extremely carefully) On Feb 8, 2010, at 4:16 PM, Steve Wise wrote: Sean Hefty wrote: IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback. I disagree, but what does it matter? So, we add a 'software' loopback that uses 127.0.0.1. Openmpi still wouldn't work. I guess that's true. I will commit to get the fix in openmpi asap. If we don't care if the fix is in the kernel or user space, then we could add an a 'disable-loopback-support' build option to librdmacm, which can fail any attempt to bind to a loopback address. I'd rather see it removed from 2.6.33 kernel before it shipts, and then we fix openmpi, and then re-submit 127.0.0.1 support once openmpi publishes a release with its fix. See my other email that submits a potential commit to remove 127.0.0.1 support for 2.6.33. Steve. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH resend] rdma/cma: Disallow binding rdma endpoints to 127.0.0.1.
Here is a not tested patch that I think removes support for binding to 127.0.0.1. Sean will this work? If we agree to do this for 2.6.33, then I'll build/test this and resubmit. rdma/cma: Disallow binding rdma endpoints to 127.0.0.1. Currently this functionality breaks openmpi. Once openmpi is fixed to correctly ignore 127.0.0.1 as a valid external rdma address, we can re-enable this functionality. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/core/cma.c | 16 ++-- 1 files changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc9b594..cd3d351 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -628,19 +628,9 @@ static inline int cma_zero_addr(struct sockaddr *addr) } } -static inline int cma_loopback_addr(struct sockaddr *addr) -{ - if (addr-sa_family == AF_INET) - return ipv4_is_loopback( - ((struct sockaddr_in *) addr)-sin_addr.s_addr); - else - return ipv6_addr_loopback( - ((struct sockaddr_in6 *) addr)-sin6_addr); -} - static inline int cma_any_addr(struct sockaddr *addr) { - return cma_zero_addr(addr) || cma_loopback_addr(addr); + return cma_zero_addr(addr); } static inline __be16 cma_port(struct sockaddr *addr) @@ -2115,9 +2105,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) if (ret) goto err1; - if (cma_loopback_addr(addr)) { - ret = cma_bind_loopback(id_priv); - } else if (!cma_zero_addr(addr)) { + if (!cma_zero_addr(addr)) { ret = rdma_translate_ip(addr, id-route.addr.dev_addr); if (ret) goto err1; ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
On Mon, Feb 08, 2010 at 04:56:23PM -0500, Jeff Squyres wrote: DESCRIPTION Associates a source address with an rdma_cm_id. The address may be wildcarded. If binding to a specific local address, the rdma_cm_id will also be bound to a local RDMA device. What RDMA device is bound to when you use 127.0.0.1? I'm not 100% sure, but I think that this might be where we got the rationale that we didn't need additional LOOPBACK tests in OMPI... (if anyone else agrees with this interpretation, then it's at least one argument that allowing binding to LOOPBACK devices *is* a change in semantics, and therefore should be treated extremely carefully) This statement is trying to say that if a source address is given then the rdma_cm_id will be bound to a device. Designating which APIs bind the device is important for the API user, once the device is bound you can allocate resource against it. It doesn't matter which device is picked, that is up to the kernel. For instance if the the same IP is assigned to multiple RDMA devices then the kernel will pick one. Jason ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
On Feb 8, 2010, at 5:13 PM, Sean Hefty wrote: Are you certain that rdma_bind_addr does NOT work with 127.0.0.1, and that this is now the problem? It does appear to work on OFED 1.4 and on 2.6.26 based on ucmatose. Is the problem really with rdma_bind_addr succeeding, or with rdma_connect, which now works, or rdma_bind_addr now assigning a device? On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to bind to 127.0.0.1 per the email I sent Friday: http://www.spinics.net/lists/linux-rdma/msg02568.html I have not checked any other combinations; Steve was saying that he saw it rdma_bind_addr() succeeding on his machines with OFED 1.5.1rcwhatever (I don't recall the OS he said he was using). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds. I don't understand the difference yet... I believe the issue is that rdma_bind_addr succeeds (returns 0), but no device is assigned to the rdma_cm_id (verbs field is NULL). This was a change from commit 6f8372b69c3198e06cecb1df2cb9682d0c55e657: The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdma_bind_addr, no device is associated with the rdma_cm_id. Fix this. There are two places where rdma_bind_addr() is called in the openmpi source code (based on a tarball download of the trunk). One is btl_openib_iwarp.c: rc = rdma_bind_addr(cm_id, ipaddr); if (rc || !cm_id-verbs) { rc = OMPI_SUCCESS; goto out3; } The other is btl_openib_connect_rdmacm.c, but that deals with listening. I can't quickly determine if btl_openib_iwarp.c is usually used for IB or not. So, to fully keep the behavior of 2.6.32, rdma_bind_addr for 127.0.0.1 should succeed, but not assign a device. I think this was the change from commit ..c55e657 that changed the behavior: @@ -2089,7 +2096,9 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND)) return -EINVAL; - if (!cma_any_addr(addr)) { + if (cma_loopback_addr(addr)) { + ret = cma_bind_loopback(id_priv); + } else if (!cma_zero_addr(addr)) { ret = rdma_translate_ip(addr, id-route.addr.dev_addr); if (ret) goto err1; I'll see if reverting this gives the desired(?) behavior. - Sean ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
Steve Wise wrote: Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds. I don't understand the difference yet... Sean Hefty wrote: On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to bind to 127.0.0.1 per the email I sent Friday: http://www.spinics.net/lists/linux-rdma/msg02568.html This is what I see over IB on 2.6.26, with a couple extra prints added to cmatose: cst-lin1:/home/mshefty/librdmacm# examples/ucmatose -b 127.0.0.1 cmatose: starting server src addr 0x17f rdma_bind_addr: 0 so we're missing something else. Hi Steve, I am attempting to duplicate the problem that you reported with today's OFED build (on Sles11, if that matters). I have rarely used openMPI, so suggestions would be helpful. Here is what I see: elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring -- mpirun was unable to launch the specified application as it could not find an executable: Executable: ring Node: elm3b199 while attempting to start process rank 0. -- elm3b199:/usr/lib # Incidentally tvflash did not build (this is a ppc64 machine). Thanks Pradeep ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote: elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring -- mpirun was unable to launch the specified application as it could not find an executable: Executable: ring Node: elm3b199 while attempting to start process rank 0. -- elm3b199:/usr/lib # Is there an executable named ring either in your $PATH or in /usr/lib? Open MPI is telling you it can't find an executable named ring. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
Jeff Squyres wrote: On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote: elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring -- mpirun was unable to launch the specified application as it could not find an executable: Executable: ring Node: elm3b199 while attempting to start process rank 0. -- elm3b199:/usr/lib # Is there an executable named ring either in your $PATH or in /usr/lib? Open MPI is telling you it can't find an executable named ring. Hi Jeff, No, there is none. I got this command from one of the mails in the thread. What should I use instead? Thanks Pradeep ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] rdma/cm: disallow loopback address for iwarp devices
On Feb 8, 2010, at 7:50 PM, Pradeep Satyanarayana wrote: No, there is none. I got this command from one of the mails in the thread. What should I use instead? You need to compile and run an MPI program. ring is a typical test program that sends a message around in a ring. I think that OFED installs those test apps somewhere, but I don't recall where offhand. ring_c.c is attached. Compile it with: mpicc ring_c.c -o ring (you might need the full path to mpicc if it's not in your path?) A better mpirun command line would be: /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --host HOSTNAME1,HOSTNAME2 \ --mca btl openib,sm,self --mca btl_openib_cpc_include rdmacm ring Put in your own HOSTNAME1 and HOSTNAME2 values. You'll also need to ensure that both Open MPI and ring are available on both names (preferably in the same filesystem locations on both nodes, for simplicity) and that you can ssh to from one node to the other without being prompted for a password or passphrase. This will run a 2-process MPI job across the two nodes, passing a message between the two processes a few times before quitting. The various --mca parameters on this mpirun command line ensure that you are definitely using the OpenFabrics verbs support and forcing the use of RDMA CM. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ring_c.c Description: Binary data ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Build error on Itanium on today's OFED-1.5.1 daily build
Vlad Please take care Thanks Tziporet -Original Message- From: Woodruff, Robert J [mailto:robert.j.woodr...@intel.com] Sent: Tuesday, February 09, 2010 12:42 AM To: ewg@lists.openfabrics.org Cc: Tziporet Koren; Vladimir Sokolovsky Subject: Build error on Itanium on today's OFED-1.5.1 daily build I am getting the following build error on today's daily build on Itanium. I do not see the same error on x86_64. Processing files: kernel-ib-1.5.1-2.6.18_128.el5 error: File not found: /var/tmp/OFED/usr/bin/ibdev2netdev Processing files: kernel-ib-devel-1.5.1-2.6.18_128.el5 Requires(interp): /bin/sh /bin/sh /bin/sh Requires(rpmlib): rpmlib(CompressedFileNames) = 3.0.4-1 rpmlib(PayloadFilesHavePrefix) = 4.0-1 Requires(pre): /bin/sh Requires(post): /bin/sh Requires(postun): /bin/sh Requires: /bin/bash kernel-ib RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root File not found: /var/tmp/OFED/usr/bin/ibdev2netdev ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg