[ewg] ofa_1_5_kernel 20100208-0200 daily build status

2010-02-08 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.26
Passed on ia64 with linux-2.6.25
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
Build failed on x86_64 with linux-2.6.18-164.el5
Log:
/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1832:
 warning: assignment from incompatible pointer type
/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:
 In function 'iscsi_transport_init':
/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1935:
 warning: passing argument 3 of 'netlink_kernel_create' from incompatible 
pointer type
/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1949:
 error: implicit declaration of function 'netlink_kernel_release'
make[3]: *** 
[/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.o]
 Error 1
make[2]: *** 
[/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi]
 Error 2
make[1]: *** 
[_module_/home/vlad/tmp/ofa_1_5_kernel-20100208-0200_linux-2.6.18-164.el5_x86_64_check]
 Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.18-164.el5'
make: *** [kernel] Error 2
--
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Teleconf NEXT TUESDAY: Feb 16

2010-02-08 Thread Jeff Squyres
WHAT: Need to pick a time for an EWG teleconf next Tues, Feb 16, 2010 (Mon, Feb 
15 is a holiday for some in the US).

WHY: A bunch of important issues came up on the EWG call today that require 
discussion in the near term (rather than just waiting 2 weeks for the next EWG 
teleconf).

HOW: Please visit http://doodle.com/iicpg5scd9hk23nu before Wednesday and fill 
out your availability for an EWG teleconf.

I will pick a time with maximum availability overlap around 6pm US Eastern on 
Wed, Feb 10, 2010, setup a webex for that time, and send an Outlook invite to 
the EWG list.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Steve Wise

Sean Hefty wrote:
 IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback.
 

 I disagree, but what does it matter?  So, we add a 'software' loopback that 
 uses
 127.0.0.1.  Openmpi still wouldn't work.

   

I guess that's true.

 I will commit to get the fix in openmpi asap.
 

 If we don't care if the fix is in the kernel or user space, then we could add 
 an
 a 'disable-loopback-support' build option to librdmacm, which can fail any
 attempt to bind to a loopback address.

   

I'd rather see it removed from 2.6.33 kernel before it shipts, and then 
we fix openmpi, and then re-submit 127.0.0.1 support once openmpi 
publishes a release with its fix.  See my other email that submits a 
potential commit to remove 127.0.0.1 support for 2.6.33. 

Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
Sorry -- I missed many of these mails today due to mail filtering (don't ask).

FWIW:

- I'm not opposed to adding LOOPBACK checks into OMPI to avoid this problem 
(I'm waiting for a patch, actually).  I'm just saying that we're not going to 
get a release out immediately with this fix.  Our next release was scheduled to 
be 1.4.2, and it is still at least several weeks away.  So allowing this in 
2.6.33 would be Bad because a) we know it breaks OMPI, and b) OMPI can't get a 
release out immediately to fix the issue.

- There are customers who are using RDMA CM with IB (e.g., Sandia with their 
Mesh/IB routing stuff).

- I see the following in rdma_bind_addr(3):

-
DESCRIPTION
   Associates a source address with an rdma_cm_id.  The  address  may  be
   wildcarded.   If  binding  to a specific local address, the rdma_cm_id
   will also be bound to a local RDMA device.
-

What RDMA device is bound to when you use 127.0.0.1?  I'm not 100% sure, but I 
think that this might be where we got the rationale that we didn't need 
additional LOOPBACK tests in OMPI...  (if anyone else agrees with this 
interpretation, then it's at least one argument that allowing binding to 
LOOPBACK devices *is* a change in semantics, and therefore should be treated 
extremely carefully)


On Feb 8, 2010, at 4:16 PM, Steve Wise wrote:

 
 Sean Hefty wrote:
  IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback.
 
 
  I disagree, but what does it matter?  So, we add a 'software' loopback that 
  uses
  127.0.0.1.  Openmpi still wouldn't work.
 
   
 
 I guess that's true.
 
  I will commit to get the fix in openmpi asap.
 
 
  If we don't care if the fix is in the kernel or user space, then we could 
  add an
  a 'disable-loopback-support' build option to librdmacm, which can fail any
  attempt to bind to a loopback address.
 
   
 
 I'd rather see it removed from 2.6.33 kernel before it shipts, and then
 we fix openmpi, and then re-submit 127.0.0.1 support once openmpi
 publishes a release with its fix.  See my other email that submits a
 potential commit to remove 127.0.0.1 support for 2.6.33.
 
 Steve.
 


-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH resend] rdma/cma: Disallow binding rdma endpoints to 127.0.0.1.

2010-02-08 Thread Steve Wise
Here is a not tested patch that I think removes support for binding
to 127.0.0.1.  Sean will this work?

If we agree to do this for 2.6.33, then I'll build/test this and resubmit.



rdma/cma: Disallow binding rdma endpoints to 127.0.0.1.

Currently this functionality breaks openmpi.  Once openmpi is fixed to
correctly ignore 127.0.0.1 as a valid external rdma address, we can
re-enable this functionality.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/core/cma.c |   16 ++--
 1 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cc9b594..cd3d351 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -628,19 +628,9 @@ static inline int cma_zero_addr(struct sockaddr *addr)
}
 }
 
-static inline int cma_loopback_addr(struct sockaddr *addr)
-{
-   if (addr-sa_family == AF_INET)
-   return ipv4_is_loopback(
-   ((struct sockaddr_in *) addr)-sin_addr.s_addr);
-   else
-   return ipv6_addr_loopback(
-   ((struct sockaddr_in6 *) addr)-sin6_addr);
-}
-
 static inline int cma_any_addr(struct sockaddr *addr)
 {
-   return cma_zero_addr(addr) || cma_loopback_addr(addr);
+   return cma_zero_addr(addr);
 }
 
 static inline __be16 cma_port(struct sockaddr *addr)
@@ -2115,9 +2105,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
if (ret)
goto err1;
 
-   if (cma_loopback_addr(addr)) {
-   ret = cma_bind_loopback(id_priv);
-   } else if (!cma_zero_addr(addr)) {
+   if (!cma_zero_addr(addr)) {
ret = rdma_translate_ip(addr, id-route.addr.dev_addr);
if (ret)
goto err1;

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jason Gunthorpe
On Mon, Feb 08, 2010 at 04:56:23PM -0500, Jeff Squyres wrote:

 DESCRIPTION
Associates a source address with an rdma_cm_id.  The  address  may  be
wildcarded.   If  binding  to a specific local address, the rdma_cm_id
will also be bound to a local RDMA device.
 
 What RDMA device is bound to when you use 127.0.0.1?  I'm not 100%
 sure, but I think that this might be where we got the rationale that
 we didn't need additional LOOPBACK tests in OMPI...  (if anyone else
 agrees with this interpretation, then it's at least one argument
 that allowing binding to LOOPBACK devices *is* a change in
 semantics, and therefore should be treated extremely carefully)

This statement is trying to say that if a source address is given then
the rdma_cm_id will be bound to a device. Designating which APIs bind
the device is important for the API user, once the device is bound you
can allocate resource against it.

It doesn't matter which device is picked, that is up to the kernel.
For instance if the the same IP is assigned to multiple RDMA devices
then the kernel will pick one.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 5:13 PM, Sean Hefty wrote:

 Are you certain that rdma_bind_addr does NOT work with 127.0.0.1, and that 
 this
 is now the problem?
 
 It does appear to work on OFED 1.4 and on 2.6.26 based on ucmatose.  Is the
 problem really with rdma_bind_addr succeeding, or with rdma_connect, which now
 works, or rdma_bind_addr now assigning a device?

On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to 
bind to 127.0.0.1 per the email I sent Friday:

http://www.spinics.net/lists/linux-rdma/msg02568.html

I have not checked any other combinations; Steve was saying that he saw it 
rdma_bind_addr() succeeding on his machines with OFED 1.5.1rcwhatever (I don't 
recall the OS he said he was using).

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Sean Hefty
Sean, can you try openmpi?  It fails for me, and yet ucmatose succeeds.
I don't understand the difference yet...

I believe the issue is that rdma_bind_addr succeeds (returns 0), but no device
is assigned to the rdma_cm_id (verbs field is NULL).

This was a change from commit 6f8372b69c3198e06cecb1df2cb9682d0c55e657:

  The defined behavior of rdma_bind_addr is to associate an RDMA
  device with an rdma_cm_id, as long as the user specified a non-
  zero address.  (ie they weren't just trying to reserve a port)
  Currently, if the loopback address is passed to rdma_bind_addr,
  no device is associated with the rdma_cm_id.  Fix this.

There are two places where rdma_bind_addr() is called in the openmpi source code
(based on a tarball download of the trunk).  One is btl_openib_iwarp.c:

  rc = rdma_bind_addr(cm_id, ipaddr);
  if (rc || !cm_id-verbs) {
  rc = OMPI_SUCCESS;
  goto out3;
  }

The other is btl_openib_connect_rdmacm.c, but that deals with listening.  I
can't quickly determine if btl_openib_iwarp.c is usually used for IB or not.

So, to fully keep the behavior of 2.6.32, rdma_bind_addr for 127.0.0.1 should
succeed, but not assign a device.  I think this was the change from commit
..c55e657 that changed the behavior:

@@ -2089,7 +2096,9 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr
*addr)
if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND))
return -EINVAL;

-   if (!cma_any_addr(addr)) {
+   if (cma_loopback_addr(addr)) {
+   ret = cma_bind_loopback(id_priv);
+   } else if (!cma_zero_addr(addr)) {
ret = rdma_translate_ip(addr, id-route.addr.dev_addr);
if (ret)
goto err1;

I'll see if reverting this gives the desired(?) behavior.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Pradeep Satyanarayana
Steve Wise wrote:
 Sean, can you try openmpi?  It fails for me, and yet ucmatose succeeds. 
 I don't understand the difference yet...
 
 
 Sean Hefty wrote:
 On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when
 attempting to
 bind to 127.0.0.1 per the email I sent Friday:

http://www.spinics.net/lists/linux-rdma/msg02568.html
 

 This is what I see over IB on 2.6.26, with a couple extra prints added to
 cmatose:

 cst-lin1:/home/mshefty/librdmacm# examples/ucmatose -b 127.0.0.1
 cmatose: starting server
 src addr 0x17f
 rdma_bind_addr: 0

 so we're missing something else.


Hi Steve,

I am attempting to duplicate the problem that you reported with today's OFED 
build (on Sles11, if that matters). I have rarely
used openMPI, so suggestions would be helpful. Here is what I see:

elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca 
btl_openib_cpc_include rdmacm ring
--
mpirun was unable to launch the specified application as it could not find an 
executable:

Executable: ring
Node: elm3b199

while attempting to start process rank 0.
--
elm3b199:/usr/lib #

Incidentally tvflash did not build (this is a ppc64 machine).

Thanks
Pradeep

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote:

 elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode 
 --mca btl_openib_cpc_include rdmacm ring
 --
 mpirun was unable to launch the specified application as it could not find an 
 executable:
 
 Executable: ring
 Node: elm3b199
 
 while attempting to start process rank 0.
 --
 elm3b199:/usr/lib #

Is there an executable named ring either in your $PATH or in /usr/lib?

Open MPI is telling you it can't find an executable named ring.

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Pradeep Satyanarayana
Jeff Squyres wrote:
 On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote:
 
 elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode 
 --mca btl_openib_cpc_include rdmacm ring
 --
 mpirun was unable to launch the specified application as it could not find 
 an executable:

 Executable: ring
 Node: elm3b199

 while attempting to start process rank 0.
 --
 elm3b199:/usr/lib #
 
 Is there an executable named ring either in your $PATH or in /usr/lib?
 
 Open MPI is telling you it can't find an executable named ring.

Hi Jeff,

No, there is none. I got this command from one of the mails in the thread. What 
should I use instead?

Thanks
Pradeep


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 7:50 PM, Pradeep Satyanarayana wrote:

 No, there is none. I got this command from one of the mails in the thread. 
 What should I use instead?

You need to compile and run an MPI program.  ring is a typical test program 
that sends a message around in a ring.  I think that OFED installs those test 
apps somewhere, but I don't recall where offhand.

ring_c.c is attached.  Compile it with:

mpicc ring_c.c -o ring

(you might need the full path to mpicc if it's not in your path?)

A better mpirun command line would be:

/usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --host HOSTNAME1,HOSTNAME2 \
--mca btl openib,sm,self --mca btl_openib_cpc_include rdmacm ring

Put in your own HOSTNAME1 and HOSTNAME2 values.  You'll also need to ensure 
that both Open MPI and ring are available on both names (preferably in the 
same filesystem locations on both nodes, for simplicity) and that you can ssh 
to from one node to the other without being prompted for a password or 
passphrase.

This will run a 2-process MPI job across the two nodes, passing a message 
between the two processes a few times before quitting.

The various --mca parameters on this mpirun command line ensure that you are 
definitely using the OpenFabrics verbs support and forcing the use of RDMA CM.

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


ring_c.c
Description: Binary data
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Build error on Itanium on today's OFED-1.5.1 daily build

2010-02-08 Thread Tziporet Koren
Vlad

Please take care

Thanks
Tziporet

-Original Message-
From: Woodruff, Robert J [mailto:robert.j.woodr...@intel.com] 
Sent: Tuesday, February 09, 2010 12:42 AM
To: ewg@lists.openfabrics.org
Cc: Tziporet Koren; Vladimir Sokolovsky
Subject: Build error on Itanium on today's OFED-1.5.1 daily build


I am getting the following build error on today's daily build
on Itanium. I do not see the same error on x86_64.



Processing files: kernel-ib-1.5.1-2.6.18_128.el5
error: File not found: /var/tmp/OFED/usr/bin/ibdev2netdev
Processing files: kernel-ib-devel-1.5.1-2.6.18_128.el5
Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(CompressedFileNames) = 3.0.4-1 
rpmlib(PayloadFilesHavePrefix) = 4.0-1
Requires(pre): /bin/sh
Requires(post): /bin/sh
Requires(postun): /bin/sh
Requires: /bin/bash kernel-ib


RPM build errors:
user vlad does not exist - using root
group vlad does not exist - using root
user vlad does not exist - using root
group vlad does not exist - using root
File not found: /var/tmp/OFED/usr/bin/ibdev2netdev
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg