Re: [ewg] OFED-1.5.1 failure over iWarp

2010-02-04 Thread Or Gerlitz
Sean Hefty wrote:
 If I look at what's there today, we're trying to find some way to match the
 net_device src_dev_addr with some sort of address associated with an 
 ib_device.
 In the case of actual IB, the net_device src_dev_addr contains the SGID, which
 provides the mapping.

 
 Steve, can you please clarify the iWarp case for me?  For iWarp, doesn't the
 src_dev_addr contain the MAC?  So, the 'GID's reported for an iWarp device is
 really just the MAC.  Is this correct?


 If this is the case, then couldn't rocee (I hate that name) report its MAC as
 one of its GIDs?  This would ensure that the mapping between net_device and
 ib_device was correct.

Sean, AFAIK, reporting the MAC as one of the GIDs was part of the IBoE (feel 
free
not to use names which you don't like) design presented couple of time, isn't 
it, Eli, Liran?

Or.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] ofa_1_5_kernel 20100204-0200 daily build status

2010-02-04 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.26
Passed on ia64 with linux-2.6.25
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
Build failed on x86_64 with linux-2.6.18-164.el5
Log:
/home/vlad/tmp/ofa_1_5_kernel-20100204-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1832:
 warning: assignment from incompatible pointer type
/home/vlad/tmp/ofa_1_5_kernel-20100204-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:
 In function 'iscsi_transport_init':
/home/vlad/tmp/ofa_1_5_kernel-20100204-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1935:
 warning: passing argument 3 of 'netlink_kernel_create' from incompatible 
pointer type
/home/vlad/tmp/ofa_1_5_kernel-20100204-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1949:
 error: implicit declaration of function 'netlink_kernel_release'
make[3]: *** 
[/home/vlad/tmp/ofa_1_5_kernel-20100204-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.o]
 Error 1
make[2]: *** 
[/home/vlad/tmp/ofa_1_5_kernel-20100204-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi]
 Error 2
make[1]: *** 
[_module_/home/vlad/tmp/ofa_1_5_kernel-20100204-0200_linux-2.6.18-164.el5_x86_64_check]
 Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.18-164.el5'
make: *** [kernel] Error 2
--
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED 1.5.1 major critical bugs

2010-02-04 Thread Tziporet Koren
Hi

We wish to build OFED 1.5.1 RC1 next Monday
I wish to have update on OFED 1.5.1 bugs major bugs status

Owners - please update status, or change priority if you think its too high

Thanks
Tziporet

1812 https://bugs.openfabrics.org/show_bug.cgi?id=1812blo 

am...@mellanox.co.il

SDP fails during init_qp
1912 https://bugs.openfabrics.org/show_bug.cgi?id=1912blo 

am...@mellanox.co.il

SDP doesn't work on PPC64 SLES11
1896 https://bugs.openfabrics.org/show_bug.cgi?id=1896blo 

e...@mellanox.co.il 

Error with /etc/init.d/openibd start
1884 https://bugs.openfabrics.org/show_bug.cgi?id=1884blo 

sw...@opengridcomputing.com 

[OFED-1.5]: Unable to ping between S310-BT card if they a...
1800 https://bugs.openfabrics.org/show_bug.cgi?id=1800cri 

am...@mellanox.co.il

iperf sdp on ppc cause to client machine to dead lock
1915 https://bugs.openfabrics.org/show_bug.cgi?id=1915cri 

mo...@voltaire.com  

Bonding: Removing vlan of bond1 (eth) cause to kernel panic
1840 https://bugs.openfabrics.org/show_bug.cgi?id=1840cri 

t...@opengridcomputing.com  

Some NFS large transfers stall
1894 https://bugs.openfabrics.org/show_bug.cgi?id=1894maj 

al...@voltaire.com  

ibv_reg_mr() fails to register a memory region allocated ...
1899 https://bugs.openfabrics.org/show_bug.cgi?id=1899maj 

am...@mellanox.co.il

Getting timer related oops when running sdp tests on RHEL...
1887 https://bugs.openfabrics.org/show_bug.cgi?id=1887maj 

joh...@georgex.org  

qperf doesn't support operation between DDR and QDR servers
1890 https://bugs.openfabrics.org/show_bug.cgi?id=1890maj 

pa...@mellanox.co.il

Applications built for OFED 1.4.2 or earlier will not run...
1885 https://bugs.openfabrics.org/show_bug.cgi?id=1885maj 

sw...@opengridcomputing.com 

[OFED-1.5]- Unable to set speed to 1Gbps on S310-BT card ...
1789 https://bugs.openfabrics.org/show_bug.cgi?id=1789maj 

t...@opengridcomputing.com  

OFED-1.5 kernel panic observed while running iozone  con...
1851 https://bugs.openfabrics.org/show_bug.cgi?id=1851maj 

t...@opengridcomputing.com  

Crash when running fstress with a large number of threads


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.5.1 major critical bugs

2010-02-04 Thread Steve Wise
NOTE: Please don't build RC1 until you resolve the iwarp cma regression...

 1884 https://bugs.openfabrics.org/show_bug.cgi?id=1884 blo
 
 sw...@opengridcomputing.com
 
 [OFED-1.5]: Unable to ping between S310-BT card if they a...

Under investigation now.

 1885 https://bugs.openfabrics.org/show_bug.cgi?id=1885 maj
 
 sw...@opengridcomputing.com
 
 [OFED-1.5]- Unable to set speed to 1Gbps on S310-BT card ...

Closed this as a dup of 1884.


Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED-1.5.1 failure over iWarp

2010-02-04 Thread Steve Wise
Never mind.  I see you already committed the change.  I just pulled the 
latest and rping works over iwarp.

Thanks,

Steve.


Steve Wise wrote:
 Hey Eli,

 This patch doesn't apply.

 If you give me one that applies and builds against RH5.3, I'll test it.

 Thanks,

 Steve.


 Eli Cohen wrote:
   
 Oops, you're right.

 Please try this one:

 commit 483fe703b03b1db99fa4a968fc3a918aa43f856f
 Author: Eli Cohen e...@mellanox.co.il
 Date:   Wed Feb 3 13:10:14 2010 +0200

 CMA: Fix iWarp failures to bind to a device
 
 rdma_addr_get_sgid() relies on dev_addr-transport to retrieve the 
 correct GID
 based on the hardware address. However, when called from 
 cma_acquire_dev(), the
 transport field is not yet valid. The solution is to avoid calling
 rdma_addr_get_sgid() from cma_acquire_dev() and find the device based on 
 it's
 GID: for ethernet, assume first it is rocee and search the GID table, if 
 not
 found generate the GID by copying it from the hardware address.
 
 Signed-off-by: Eli Cohen e...@mellanox.co.il

 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index a2d5aad..3c5c59f 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -348,15 +348,29 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv)
  union ib_gid gid;
  int ret = -ENODEV;
  
 -rdma_addr_get_sgid(dev_addr, gid);
 +if (dev_addr-dev_type != ARPHRD_INFINIBAND) {
 +rocee_addr_get_sgid(dev_addr, gid);
 +list_for_each_entry(cma_dev, dev_list, list) {
 +ret = ib_find_cached_gid(cma_dev-device, gid,
 + id_priv-id.port_num, NULL);
 +if (!ret)
 +goto out;
 +}
 +}
 +
 +memcpy(gid, dev_addr-src_dev_addr +
 +   rdma_addr_gid_offset(dev_addr), sizeof gid);
  list_for_each_entry(cma_dev, dev_list, list) {
  ret = ib_find_cached_gid(cma_dev-device, gid,
   id_priv-id.port_num, NULL);
 -if (!ret) {
 -cma_attach_to_dev(id_priv, cma_dev);
 +if (!ret)
  break;
 -}
  }
 +
 +out:
 +if (!ret)
 +cma_attach_to_dev(id_priv, cma_dev);
 +
  return ret;
  }
  

   
 
   memcpy(gid, dev_addr-src_dev_addr +
  rdma_addr_gid_offset(dev_addr), sizeof gid);
   list_for_each_entry(cma_dev, dev_list, list) {
   ret = ib_find_cached_gid(cma_dev-device, gid,
id_priv-id.port_num,
 NULL);
   if (!ret)
   break;
   }
   }

   if (!ret)
   cma_attach_to_dev(id_priv, cma_dev);

   return ret;
 }
 



 Eli Cohen wrote:
 
   
 On Wed, Feb 03, 2010 at 09:20:05AM -0600, Steve Wise wrote:
   
 
 diff --git a/drivers/infiniband/core/cma.c 
 b/drivers/infiniband/core/cma.c
 index a2d5aad..76dce2b 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -348,15 +348,28 @@ static int cma_acquire_dev(struct 
 rdma_id_private *id_priv)
union ib_gid gid;
int ret = -ENODEV;
 -  rdma_addr_get_sgid(dev_addr, gid);
 -  list_for_each_entry(cma_dev, dev_list, list) {
 -  ret = ib_find_cached_gid(cma_dev-device, gid,
 -   id_priv-id.port_num, NULL);
 -  if (!ret) {
 -  cma_attach_to_dev(id_priv, cma_dev);
 -  break;
 +  if (dev_addr-dev_type != ARPHRD_INFINIBAND) {
 +  rocee_addr_get_sgid(dev_addr, gid);
 +  list_for_each_entry(cma_dev, dev_list, list) {
 +  ret = ib_find_cached_gid(cma_dev-device, gid,
 +   id_priv-id.port_num, 
 NULL);
 +  if (!ret)
 +  break;
 +  }
   
 
 The above if statement is true for iwarp devices, so this patch is
 just wrong.   rocee__addr_get_sgid() should only be used for ROCEE
 interfaces, correct?
 
   
 No, the idea is this: for non ARPHRD_INFINIBAND devices (e.g. rocee or
 iwarp) I assume first this rocee, get the rocee gid, and check if this
 gid appears in any device's gid table. It the mac address belongs to a
 rocee device then it will be found; if it belongs to an iwarp device
 then it won't be found. In the later case I build the gid in the pre
 rocee patches fashion and search again.
   
 
 +  } else {
 +  memcpy(gid, dev_addr-src_dev_addr +
 + rdma_addr_gid_offset(dev_addr), sizeof gid);
 +  list_for_each_entry(cma_dev, dev_list, list) {
 +

Re: [ewg] OFED 1.5.1 major critical bugs

2010-02-04 Thread Tziporet Koren
On 2/4/2010 5:28 PM, Steve Wise wrote:
 NOTE: Please don't build RC1 until you resolve the iwarp cma 
 regression...
Sure - I assume Eli will solve this today with you

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.5.1 major critical bugs

2010-02-04 Thread Steve Wise
Tziporet Koren wrote:
 On 2/4/2010 5:28 PM, Steve Wise wrote:
 NOTE: Please don't build RC1 until you resolve the iwarp cma 
 regression...
 Sure - I assume Eli will solve this today with you

 Tziporet

I just sent email about this.  It has been resolved.  I just pulled the 
latest tree and rping works on iwarp...

Thanks!

Steve.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED-1.5.1 failure over iWarp

2010-02-04 Thread Eli Cohen
On Thu, Feb 04, 2010 at 09:46:58AM -0600, Steve Wise wrote:
 Never mind.  I see you already committed the change.  I just pulled
 the latest and rping works over iwarp.
 

Thanks for checking this.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.5.1 major critical bugs

2010-02-04 Thread Amir Vadai
1899 - fixed

Rest - PPC bugs - on work

- Amir


On 02/04/2010 04:44 PM, Tziporet Koren wrote:
 Hi

 We wish to build OFED 1.5.1 RC1 next Monday
 I wish to have update on OFED 1.5.1 bugs major bugs status

 Owners - please update status, or change priority if you think its too high

 Thanks
 Tziporet

 1812 https://bugs.openfabrics.org/show_bug.cgi?id=1812  blo 
   
   am...@mellanox.co.il
   
   SDP fails during init_qp
 1912 https://bugs.openfabrics.org/show_bug.cgi?id=1912  blo 
   
   am...@mellanox.co.il
   
   SDP doesn't work on PPC64 SLES11
 1896 https://bugs.openfabrics.org/show_bug.cgi?id=1896  blo 
   
   e...@mellanox.co.il 
   
   Error with /etc/init.d/openibd start
 1884 https://bugs.openfabrics.org/show_bug.cgi?id=1884  blo 
   
   sw...@opengridcomputing.com 
   
   [OFED-1.5]: Unable to ping between S310-BT card if they a...
 1800 https://bugs.openfabrics.org/show_bug.cgi?id=1800  cri 
   
   am...@mellanox.co.il
   
   iperf sdp on ppc cause to client machine to dead lock
 1915 https://bugs.openfabrics.org/show_bug.cgi?id=1915  cri 
   
   mo...@voltaire.com  
   
   Bonding: Removing vlan of bond1 (eth) cause to kernel panic
 1840 https://bugs.openfabrics.org/show_bug.cgi?id=1840  cri 
   
   t...@opengridcomputing.com  
   
   Some NFS large transfers stall
 1894 https://bugs.openfabrics.org/show_bug.cgi?id=1894  maj 
   
   al...@voltaire.com  
   
   ibv_reg_mr() fails to register a memory region allocated ...
 1899 https://bugs.openfabrics.org/show_bug.cgi?id=1899  maj 
   
   am...@mellanox.co.il
   
   Getting timer related oops when running sdp tests on RHEL...
 1887 https://bugs.openfabrics.org/show_bug.cgi?id=1887  maj 
   
   joh...@georgex.org  
   
   qperf doesn't support operation between DDR and QDR servers
 1890 https://bugs.openfabrics.org/show_bug.cgi?id=1890  maj 
   
   pa...@mellanox.co.il
   
   Applications built for OFED 1.4.2 or earlier will not run...
 1885 https://bugs.openfabrics.org/show_bug.cgi?id=1885  maj 
   
   sw...@opengridcomputing.com 
   
   [OFED-1.5]- Unable to set speed to 1Gbps on S310-BT card ...
 1789 https://bugs.openfabrics.org/show_bug.cgi?id=1789  maj 
   
   t...@opengridcomputing.com  
   
   OFED-1.5 kernel panic observed while running iozone  con...
 1851 https://bugs.openfabrics.org/show_bug.cgi?id=1851  maj 
   
   t...@opengridcomputing.com  
   
   Crash when running fstress with a large number of threads



   
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED-1.5.1 failure over iWarp

2010-02-04 Thread Sean Hefty
We should work to get this 'correct' when merging upstream.

Following the spirit of the current code, it is probably cma_acquire_dev()'s
job to fill in the missing ibdev type information after matching the netdev to
an ibdev.

This makes sense to me.

P.S. - I really wish that we had a cleaner way to match an ibdev to a netdev
without overloading the gid table entries.
Basically, it should be the job of the entity that created the netdev to make
this association, and stuff a pointer in the netdev.

Do you have a specific idea here?  So far, we've tried to keep the mapping the
responsibility of the rdma_cm module.  With rocee, we may need to re-architect
the solution and have the ib_device driver make this association.  Even if it's
unlikely, we need to make sure that we don't make the wrong match.



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
I just opened 1918.  The latest ofed-1.5.1 rdma-cm is allowing binds to 
127.0.0.1.  This is no-no for devices that don't support hw loopback...

OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid 
for which IB devices.   This logic is now broken.  Regardless of whether 
OpenMPI should use another method for determining which IP address 
belong to which interfaces, we should probably rethink whether we're 
breaking rdma-cm semantics in a bad way on a point release.


Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
The more I think about this, the more I conclude the rdma-cm is just 
broken.  There's no way to determine an RDMA device from 127.0.0.1, so 
how can bind succeed?


Steve Wise wrote:
 I just opened 1918.  The latest ofed-1.5.1 rdma-cm is allowing binds to 
 127.0.0.1.  This is no-no for devices that don't support hw loopback...

 OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid 
 for which IB devices.   This logic is now broken.  Regardless of whether 
 OpenMPI should use another method for determining which IP address 
 belong to which interfaces, we should probably rethink whether we're 
 breaking rdma-cm semantics in a bad way on a point release.


 Steve.
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
Sean Hefty wrote:
 OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid
 for which IB devices.   This logic is now broken.  Regardless of whether
 OpenMPI should use another method for determining which IP address
 belong to which interfaces, we should probably rethink whether we're
 breaking rdma-cm semantics in a bad way on a point release.
 

 The changes to the rdma_cm have been merged upstream.  These were fixes
 specifically to enable using the loopback address with RDMA devices.

 At first thought, we can extend enum ib_device_cap_flags to indicate if a 
 device
 supports loopback capabilities or not.  The rdma_cm could then skip over such
 devices when dealing with a loopback address. 

 - Sean
   

But how can you determine _which_ rdma device should be used if and app 
binds to 127.0.0.1?  I think this is busted...


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Sean Hefty
But how can you determine _which_ rdma device should be used if and app
binds to 127.0.0.1?  I think this is busted...

The code just picks the first rdma device available.  To me, this is preferable
than simply disallowing the loopback device from working at all.  I personally
use it all the time, so I don't have to figure out what the ip address is of the
system that I'm trying to test on.

Loopback support has always been in the rdma_cm and was intended to work; it
just didn't work very well... 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Sean Hefty
Well then the rdma-cm needs to know which devices support hw loopback.
Cuz on a T3-only system, no hwloop...

The problem sounds like it's more than just whether 127.0.0.1 is usable.  That
check may fix openmpi, but it sounds more like the app needs to know whether the
device can actually support loopback, regardless of what addresses are used.  Is
this correct?

What would openmpi do if there were two addresses assigned to the T3 device?
Does openmpi simply bypass RDMA for all connections on the local machine?

Basically, I'm not sure that this is *just* an rdma_cm issue.  Although it
definitely appears that some sort of change needs to be made to the rdma_cm.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Sean Hefty
This solution would work.  Will you code it up?

I can do that.  I just want to make sure that we address the full scope of the
problem.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
Sean Hefty wrote:
 Well then the rdma-cm needs to know which devices support hw loopback.
 Cuz on a T3-only system, no hwloop...
 

 The problem sounds like it's more than just whether 127.0.0.1 is usable.  That
 check may fix openmpi, but it sounds more like the app needs to know whether 
 the
 device can actually support loopback, regardless of what addresses are used.  
 Is
 this correct?

 What would openmpi do if there were two addresses assigned to the T3 device?
   

It would use them and might even create two connections.

 Does openmpi simply bypass RDMA for all connections on the local machine?

   

OpenMPI can be run to use hw loopback if its available.  For T3 
clusters, OMPI is run in a mode to use shared memory for intra-node 
communications.


 Basically, I'm not sure that this is *just* an rdma_cm issue.  Although it
 definitely appears that some sort of change needs to be made to the rdma_cm.

   

I think the OpenMPI rdmacm code needs to skip 127.0.0.1, in this 
particular case.  Prior to ofed-1.5.1, however, the bind would fail and 
thus OpenMPI would not advertise 127.0.0.1 to its peer.  I will work to 
get that change done.

But lets also add a device attribute so the rdmacm can know if a device 
supports loopback.   Clearly, if the rdma-cm allows binds to T3, 
loopback connections will fail at connect time.

Hey Roland, are you ok with a device attribute to indicate hw-loopback 
support?


Steve.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Roland Dreier
  Is this only an iwarp issue?  IE do all IB devices support hw
  loopback?  And will all future devices support it (IE is it an IBTA
  requirement)?

I do think IBA requires loopback to work.  Can't quote chapter  verse
off the top of my head.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Paul Grun
I can.  Chapter 17 verse 3.1

17.3.1 Loopback
An HCA shall be able to internally loopback a packet sent to itself. That
is,
the verbs layer can specify a packet to be delivered to the same port
(possibly
a different QP though). The packet shall be delivered without the
packet appearing on the port's physical link. This loopback shall be able
to function without requiring the presence of an external switch.
InfiniBand does not reserve a special LID value to indicate loopback.
Instead,
the DLID (and DGID if present) of a loopback packet should be the
LID (and GID) of the port on which the packet was emitted. For loopback
packets, a channel adapter implementation may ignore other path information,
such as MTU, that is not otherwise needed for the receive buffer
or for the completion queue as specified in section 11.4.2.1 Poll for
Completion
on page 629.

-Original Message-
From: linux-rdma-ow...@vger.kernel.org
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Roland Dreier
Sent: Thursday, February 04, 2010 3:51 PM
To: Steve Wise
Cc: Sean Hefty; linux-rdma; OpenFabrics EWG; Jeff Squyres
Subject: Re: bug 1918 - openmpi broken due to rdma-cm changes

  Is this only an iwarp issue?  IE do all IB devices support hw
  loopback?  And will all future devices support it (IE is it an IBTA
  requirement)?

I do think IBA requires loopback to work.  Can't quote chapter  verse
off the top of my head.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg