[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-27 Thread David J. Wilder

On Thu, 2009-10-22 at 15:02 -0700, Sean Hefty wrote:
 For ipv6 I ran what I described previously.  What I do need to do is add
 the option to rping to specify a source address and run it with various
 address.  Any help you can give defining what exactly needs to be tested
 would be appreciated.
 
 You can also test with ucmatose to verify ipv4 still works.  Use the -b option
 to bind to a specific address.
 
 - Sean 
 

Sean
This patch adds ipv6 support to ucmatose.

Signed-off-by: David Wilder dwil...@us.ibm.com

--
 examples/cmatose.c |   31 +++
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/examples/cmatose.c b/examples/cmatose.c
index 8c12347..481a6d0 100644
--- a/examples/cmatose.c
+++ b/examples/cmatose.c
@@ -516,15 +516,15 @@ static int get_addr(char *dst, struct sockaddr_in *addr)
return ret;
}
 
-   if (res-ai_family != PF_INET) {
+   if (res-ai_family == PF_INET)
+   memcpy(addr, res-ai_addr, sizeof(struct sockaddr_in));
+   else if (res-ai_family == PF_INET6)
+   memcpy(addr, res-ai_addr, sizeof(struct sockaddr_in6));
+   else
ret = -1;
-   goto out;
-   }
 
-   *addr = *(struct sockaddr_in *) res-ai_addr;
-out:
-   freeaddrinfo(res);
-   return ret;
+freeaddrinfo(res);
+return ret;
 }
 
 static int run_server(void)
@@ -543,11 +543,18 @@ static int run_server(void)
ret = get_addr(src_addr, test.src_in);
if (ret)
goto out;
-   } else
+   if(test.src_in.sin_family == AF_INET)
+   ((struct sockaddr_in *) test.src_in)-sin_port = port;
+   else
+   ((struct sockaddr_in6 *) test.src_in)-sin6_port=port;
+   
+   } else {
test.src_in.sin_family = PF_INET;
+   test.src_in.sin_port = port;
+   }
+
+   ret = rdma_bind_addr(listen_id, (struct sockaddr *)test.src_in);
 
-   test.src_in.sin_port = port;
-   ret = rdma_bind_addr(listen_id, test.src_addr);
if (ret) {
perror(cmatose: bind address failed);
goto out;
@@ -628,8 +635,8 @@ static int run_client(void)
printf(cmatose: connecting\n);
for (i = 0; i  connections; i++) {
ret = rdma_resolve_addr(test.nodes[i].cma_id,
-   src_addr ? test.src_addr : NULL,
-   test.dst_addr, 2000);
+   src_addr ? (struct sockaddr *)test.src_in : NULL,
+   (struct sockaddr *)test.dst_in, 2000);
if (ret) {
perror(cmatose: failure getting addr);
connect_error();


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-22 Thread David J. Wilder

On Wed, 2009-10-21 at 17:08 -0600, Jason Gunthorpe wrote:


 This looks exactly like what I was thinking of - have you tested this?

Yes I did do some testing, but that brings up a good question.  I am not
sure I know what all should be tested?  I am running rping with
different destination address (and scoping).  On the ipv4 side:
rping -c -a ip-of-my-ib0-interface
rping -c -a ip-of-remote-nodes-ib0-interface

For ipv6 I ran what I described previously.  What I do need to do is add
the option to rping to specify a source address and run it with various
address.  Any help you can give defining what exactly needs to be tested
would be appreciated. 
 
 
 If it is OK, I'd make it the first in the series.
 
 There were two things I was not sure about in my example.
  1) Is 'init_net.loopback_dev' the correct reference for the loop device? Or
 is it something like dev_net(rt-idev-dev)-loopback_dev ?
 
 I'm sensing it may be the latter, but can't investigate right now
 Donno much about this new namespace stuff really

I think you may be correct I will look at that closer.  I did explicitly
verify the test worked in both cases.
 
 
  2) Was rt-idev-dev the right choice for the ipv4 case? Or is it
 rt-u.dst.dev ?
 
 The TCP case kinda looks like
 int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 tmp = ip_route_connect(rt, nexthop, inet-saddr,
RT_CONN_FLAGS(sk), sk-sk_bound_dev_if,
IPPROTO_TCP,
inet-sport, usin-sin_port, sk, 1);
 sk_setup_caps(sk, rt-u.dst);
 
 void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 __sk_dst_set(sk, dst);
 
 And all later things key off the sk_get_dst. So I'm thinking
 that u.dst.dev might be correct.
 
 I have no idea what the difference is though (can't look too hard
 right now)
 
 The main other fixup I see is to remove
 ret = cma_bind_addr(id, src_addr, dst_addr);
 
 From rdma_resolve_addr and rely on the routing lookup in
 addr_resolve_remote called by addr_resolve_ip to setup the bind device
 from the routing lookup. (This is what I mentioned in my last email)
 
 Which then lets you fixup the checking and handling of the
 sin6_scopeid on the source address - and fixes the main other routing
 difference against the TCP stack.
 
 Thanks for working on this!
 
 Jason

Lots of discussion :) I will go through the mails, address the comments
and post the entire series of patches.

Thanks for all your input.

Dave. 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-22 Thread Sean Hefty
For ipv6 I ran what I described previously.  What I do need to do is add
the option to rping to specify a source address and run it with various
address.  Any help you can give defining what exactly needs to be tested
would be appreciated.

You can also test with ucmatose to verify ipv4 still works.  Use the -b option
to bind to a specific address.

- Sean 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Jason Gunthorpe
On Wed, Oct 21, 2009 at 03:30:29PM -0700, David J. Wilder wrote:

 addr6_resolve_remote(). The second patch fixes the bugs in
 addr_resolve_local().  The 3d patch I am posting now for discussion.
 
 This patch, as Jason's suggested, moves the function of
 addr_resolve_local() into addr4_resolve_remote() and
 addr6_resolve_remote().  It eliminates the need for
 addr_resolve_local().

This looks exactly like what I was thinking of - have you tested this?

If it is OK, I'd make it the first in the series.

There were two things I was not sure about in my example.
 1) Is 'init_net.loopback_dev' the correct reference for the loop device? Or
is it something like dev_net(rt-idev-dev)-loopback_dev ?

I'm sensing it may be the latter, but can't investigate right now
Donno much about this new namespace stuff really

 2) Was rt-idev-dev the right choice for the ipv4 case? Or is it
rt-u.dst.dev ?

The TCP case kinda looks like
int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
tmp = ip_route_connect(rt, nexthop, inet-saddr,
   RT_CONN_FLAGS(sk), sk-sk_bound_dev_if,
   IPPROTO_TCP,
   inet-sport, usin-sin_port, sk, 1);
sk_setup_caps(sk, rt-u.dst);

void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
__sk_dst_set(sk, dst);

And all later things key off the sk_get_dst. So I'm thinking
that u.dst.dev might be correct.

I have no idea what the difference is though (can't look too hard
right now)

The main other fixup I see is to remove
ret = cma_bind_addr(id, src_addr, dst_addr);

From rdma_resolve_addr and rely on the routing lookup in
addr_resolve_remote called by addr_resolve_ip to setup the bind device
from the routing lookup. (This is what I mentioned in my last email)

Which then lets you fixup the checking and handling of the
sin6_scopeid on the source address - and fixes the main other routing
difference against the TCP stack.

Thanks for working on this!

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Sean Hefty
This patch,  as Jason's suggested,  moves the function of addr_resolve_local()
into addr4_resolve_remote()
and addr6_resolve_remote().  It eliminates the need for addr_resolve_local().

One quick comment, remove '_remote' from function names:
addr4_resolve_remote, addr6_resolve_remote, and addr_resolve_remote

 drivers/infiniband/core/addr.c |   99 +++
 1 files changed, 18 insertions(+), 81 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..f7a5861 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -233,6 +233,14 @@ static int addr4_resolve_remote(struct sockaddr_in
*src_in,
   if (ret)
   goto out;

+  if (rt-idev-dev == init_net.loopback_dev){
+  ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
+  if (!ret)
+  memcpy(addr-dst_dev_addr, addr-src_dev_addr,
+  MAX_ADDR_LEN);
+  goto put;
+  }

This doesn't end up doing the same thing as what resolve_local did.  It only
matches up with the 'else if' portion below: 
 
-static int addr_resolve_local(struct sockaddr *src_in,
-struct sockaddr *dst_in,
-struct rdma_dev_addr *addr)
-{
-  struct net_device *dev;
-  int ret;
-
-  switch (dst_in-sa_family) {
-  case AF_INET:
-  {
-  __be32 src_ip = ((struct sockaddr_in *)
src_in)-sin_addr.s_addr;
-  __be32 dst_ip = ((struct sockaddr_in *)
dst_in)-sin_addr.s_addr;
-
-  dev = ip_dev_find(init_net, dst_ip);
-  if (!dev)
-  return -EADDRNOTAVAIL;
-
-  if (ipv4_is_zeronet(src_ip)) {
-  src_in-sa_family = dst_in-sa_family;
-  ((struct sockaddr_in *) src_in)-sin_addr.s_addr =
dst_ip;
-  ret = rdma_copy_addr(addr, dev, dev-dev_addr);
-  } else if (ipv4_is_loopback(src_ip)) {
-  ret = rdma_translate_ip(dst_in, addr);
-  if (!ret)
-  memcpy(addr-dst_dev_addr, dev-dev_addr,
MAX_ADDR_LEN);
-  } else {
-  ret = rdma_translate_ip(src_in, addr);
-  if (!ret)
-  memcpy(addr-dst_dev_addr, dev-dev_addr,
MAX_ADDR_LEN);
-  }

We need to handle the case where the source address is not given and provide
one.  We also need to handle the case where the source address is given, but may
not match the destination address, or even use the same RDMA device.

I didn't look at the ipv6 changes yet.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Jason Gunthorpe
On Wed, Oct 21, 2009 at 04:17:01PM -0700, Sean Hefty wrote:

 This doesn't end up doing the same thing as what resolve_local did.  It only
 matches up with the 'else if' portion below: 

It does all the behaviors, it passes the input into a routing lookup
- though it is true the source needs to be passed into the routing
lookup, which I think David had in another patch. Dave; I guess you
need to include that here now.

So, first case:

 -if (ipv4_is_zeronet(src_ip)) {

Looks like:
$ ip route get 10.0.0.11
local 10.0.0.11 dev lo  src 10.0.0.11 
cache local  mtu 16436 advmss 16396 hoplimit 64

trips through the if == loopback and does
'rdma_translate_ip(10.0.0.11)'

Same as the old code (little different path, but seems to give the
same result)

Second case:

 -} else if (ipv4_is_loopback(src_ip)) {

Looks like:
$ ip route get 10.0.0.11 from 127.0.0.1
local 10.0.0.11 from 127.0.0.1 dev lo 
cache local  mtu 16436 advmss 16396 hoplimit 64

so trips through the if == loopback and does rdma_translate_ip(10.0.0.11)

Same as the old code

Third case:
 -} else {

$ ip route get 10.0.0.11 from 10.0.0.11
local 10.0.0.11 from 10.0.0.11 dev lo 
cache local  mtu 16436 advmss 16396 hoplimit 64

Again, does rdma_translate_ip(10.0.0.11)

And the weird case is different:

$ ip route get 10.0.0.11 from 192.168.122.1
local 10.0.0.11 from 192.168.122.1 dev lo 
cache local  mtu 16436 advmss 16396 hoplimit 64

(192.168.122.1 is bound to a different device on my system than
10.0.0.11)

The new case trips the if == loopback and does
rdma_translate_ip(10.0.0.11)

The old case does rdma_translate_ip(192.168.122.1)

I don't think this is a significant difference, both behaviors are
reasonable choices and the code/complexity savings are worth it, IMHO.
[Alternatively, I suppose one could call rdma_translate_ip(rt-rt_src)
and if that fails then do rdma_translate_ip(dst_in) but why bother?]

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Jason Gunthorpe
On Wed, Oct 21, 2009 at 05:36:39PM -0600, Jason Gunthorpe wrote:

  -  if (ipv4_is_zeronet(src_ip)) {
 
 Looks like:
 $ ip route get 10.0.0.11
 local 10.0.0.11 dev lo  src 10.0.0.11 
 cache local  mtu 16436 advmss 16396 hoplimit 64
 
 trips through the if == loopback and does
 'rdma_translate_ip(10.0.0.11)'
 
 Same as the old code (little different path, but seems to give the
 same result)

Oops, there is a little woopsie here:

Dave:

+   if (rt-idev-dev == init_net.loopback_dev){
+   ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
+   if (!ret)
+   memcpy(addr-dst_dev_addr, addr-src_dev_addr,
+   MAX_ADDR_LEN);
+   goto put;
+   }
+

The 'goto put' will skip over the source address assignment step.

Maybe move this:

if (!src_ip) {
src_in-sin_family = dst_in-sin_family;
src_in-sin_addr.s_addr = rt-rt_src;
}

Up above your if (rt-idev)

And similarly for v6.

Also add a src_in = rt-rt_src assignment for v6.

I'd also remove the test for 0 address, just do it unconditionally
(IIRC routing table always returns src if src is not 0)

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Sean Hefty
$ ip route get 10.0.0.11 from 192.168.122.1
local 10.0.0.11 from 192.168.122.1 dev lo
cache local  mtu 16436 advmss 16396 hoplimit 64

(192.168.122.1 is bound to a different device on my system than
10.0.0.11)

The new case trips the if == loopback and does
rdma_translate_ip(10.0.0.11)

The old case does rdma_translate_ip(192.168.122.1)

I don't think this is a significant difference, both behaviors are
reasonable choices and the code/complexity savings are worth it, IMHO.
[Alternatively, I suppose one could call rdma_translate_ip(rt-rt_src)
and if that fails then do rdma_translate_ip(dst_in) but why bother?]

I agree this usage case is weird, but I'm still not sure about the patch.  If
the destination service is listening on 10.0.0.11, then the listen is bound to a
different gid than the source gid that we're trying to connect from.  The
src_dev_addr and dst_dev_addr need to reflect this, so that a PR gives us a path
that routes the CM REQ correctly.

- Sean 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Sean Hefty
The local loopback case uses PRs?

Yes - the rdma cm makes no distinction when resolve route is called.  It does a
PR query.

Even so, it still seems OK to me:

Path:
 addr4_resolve_remote
  $ ip route get 10.0.0.11 from 192.168.122.1
local 10.0.0.11 from 192.168.122.1 dev lo
  srcIP = 192.168.122.1
  rdma_translate_ip(dst_ip = 10.0.0.11)
   rdma_copy_addr(eth0);
src_dev_addr = eth0.dev_addr  (ie GID of 10.0.0.11)
  memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11)

So everthing is bound to the GID of 10.0.0.11 which matches the listen
of 10.0.0.11, which seems OK.

The source could have called rdma_bind_addr(192.168.122.1) prior to calling
rdma_resolve_addr().  (DAPL does this.)  This would have returned a different
RDMA device than binding to 10.0.0.11.  The client app could have allocated
resources on that device, but the CM REQ will carry the gid/lid of the other
device.  The endpoints won't be able to communicate.

Yes, it's weird, and may not be optimal, but if a source address is explicitly
given, then its mapping to a specific RDMA device should be honored.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Jason Gunthorpe
On Wed, Oct 21, 2009 at 05:40:30PM -0700, Sean Hefty wrote:

 Even so, it still seems OK to me:
 
 Path:
  addr4_resolve_remote
   $ ip route get 10.0.0.11 from 192.168.122.1
 local 10.0.0.11 from 192.168.122.1 dev lo
   srcIP = 192.168.122.1
   rdma_translate_ip(dst_ip = 10.0.0.11)
rdma_copy_addr(eth0);
 src_dev_addr = eth0.dev_addr  (ie GID of 10.0.0.11)
   memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11)
 
 So everthing is bound to the GID of 10.0.0.11 which matches the listen
 of 10.0.0.11, which seems OK.
 
 The source could have called rdma_bind_addr(192.168.122.1) prior to calling
 rdma_resolve_addr().  (DAPL does this.)  This would have returned a different
 RDMA device than binding to 10.0.0.11.  The client app could have allocated
 resources on that device, but the CM REQ will carry the gid/lid of the other
 device.  The endpoints won't be able to communicate.

That is very difficult to fit into the semantics the IP routing
model uses :( And it looks like an API problem in DAPL :(

So, I see now, you are proposing that in this case the connection
attempt to be routed through the network and not looped back..  I
actually have a big problem with that, ignoring a 'lo' entry in a
routing table is very much not IP like and not a good idea. That
should be respected..

I guess I'd much rather see that one situation return EHOSTUNREACH or
something.

But, I suppose you are going to tell me that Intel MPI uses DAPL to
loopback connect to other processes on the same node, and relies on
this? :( :( :(

Sigh. Anyhow, lets not get side tracked. It seems to me, the easy way
out for David's approach is to simply check if the device is already
bound via rdma_bind() and if so force it to that device no matter what
the routing table lookup returns. Can you suggest a reliable way to
make that check?

[What happens now if I do this:
 rdma_bind(10.0.0.11)
 rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11)
Does the cma_bind path check that it is already bound and give out an
error? too late for me to check]

Once the cma_bind for rdma_resolve_addr is moved into the
addr_resolve_remote function then people using the API without calling
bind on the client path will get sane IP-like behavior.

 Yes, it's weird, and may not be optimal, but if a source address is
 explicitly given, then its mapping to a specific RDMA device should
 be honored.

Remember, on Linux the IP is *not* attached to a device, it is part of
the host itself. So the idea that a source address somehow specifies a
RDMA device does not fit into the Linux IP networking model.

Unfortunately the definition of rdma_bind kinda bakes this mismatched
model into the API :(

Truth be told, to fit the Linux IP model, the RDMA CM should have
provided exactly only two ways to bind a cm_id to a specific device -
rdma_accept and rdma_resolve_addr.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-21 Thread Sean Hefty
That is very difficult to fit into the semantics the IP routing
model uses :( And it looks like an API problem in DAPL :(

This depends on if your view is that the rdma cm is trying to match IP routing,
trying to use IP addresses as convenient names for RDMA ports, or something in
between that may lean more one way or the other depending on the device type.
IMO, the primary reason for using IP addressing over IB is more for convenience,
than compliance.

So, I see now, you are proposing that in this case the connection
attempt to be routed through the network and not looped back..  I
actually have a big problem with that, ignoring a 'lo' entry in a
routing table is very much not IP like and not a good idea. That
should be respected..

I hesitate comparing RDMA versus IP too closely.

Sigh. Anyhow, lets not get side tracked. It seems to me, the easy way
out for David's approach is to simply check if the device is already
bound via rdma_bind() and if so force it to that device no matter what
the routing table lookup returns. Can you suggest a reliable way to
make that check?

I'm not entirely sure of the best way to test this.  Checking the src_dev_addr
is my first thought.

[What happens now if I do this:
 rdma_bind(10.0.0.11)
 rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11)
Does the cma_bind path check that it is already bound and give out an
error? too late for me to check]

rdma_resolve_addr only calls bind if the rdma id is not already bound.  The
src_addr simply gets ignored in this case, and the bound address is used
instead.

Once the cma_bind for rdma_resolve_addr is moved into the
addr_resolve_remote function then people using the API without calling
bind on the client path will get sane IP-like behavior.

I like the approach, I'm just concerned about the case where the app has
requested a specific source address, which today means that a specific RDMA
device should be used.

Remember, on Linux the IP is *not* attached to a device, it is part of
the host itself. So the idea that a source address somehow specifies a
RDMA device does not fit into the Linux IP networking model.

I would say we're extending the linux networking model to incorporate this
concept, while staying in our imposed little RDMA sand box world.  Even iWarp
has these issues.

Truth be told, to fit the Linux IP model, the RDMA CM should have
provided exactly only two ways to bind a cm_id to a specific device -
rdma_accept and rdma_resolve_addr.

I think this is more restrictive than things need to be.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-19 Thread Jason Gunthorpe
On Mon, Oct 19, 2009 at 03:47:09PM -0700, David J. Wilder wrote:

 +++ b/drivers/infiniband/core/addr.c
 @@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct sockaddr_in6 
 *src_in,
   fl.nl_u.ip6_u.daddr = dst_in-sin6_addr;
   fl.nl_u.ip6_u.saddr = src_in-sin6_addr;
  
 + if (ipv6_addr_type(src_in-sin6_addr)  IPV6_ADDR_LINKLOCAL) {
 + if (!src_in-sin6_scope_id)
 + return -EINVAL;
 + fl.oif = src_in-sin6_scope_id;
 + }

Seeing it all together like this make it clear this test needs to move
up the call chain and test the sockaddr passed from userspace, not
the one created by addr_resolve_local. Probably somewhere along the
rdma_resolve_addr - cma_bind_addr - rmda_bind_addr -
rdma_translate_ip path. Maybe rdma_translate_ip should use and check
the scope as a temporary hack?

BTW, while researching the above comment, I'm not certain your last
patch is at all correct:

commit 85f20b39fd44310a163a9b33708fea57f08a4e40
RDMA/addr: Fix resolution of local IPv6 addresses

This patch allows a local IPv6 address to be resolved by rdma_cm.

To reproduce the problem:

 $ rping -s -v -a ::0  
 $ rping -c -v -a IPv6 address local to this system
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in,
 
for_each_netdev(init_net, dev)
if (ipv6_chk_addr(init_net,
- ((struct sockaddr_in6 *) 
addr)-sin6_addr,
+ ((struct sockaddr_in6 *) 
dst_in)-sin6_addr,
  dev, 1))
break;

I can believe it fixes the case you describe (ie loopback) but
matching the *dest* IP against the local interface's IP list cannot
possibly be right.

The primary problem is that for_each_netdev/ipv6_chk_addr is NOT the
same as ip_dev_find. ip_dev_find is a routing lookup, ipv6_chk_addr
compares the local address list. Not at all the same. I don't see a
route lookup helper for ipv6, so you have to code full flowi lookup.

With your change I expect ipv6 is 100% broken now for non loop cases?

Regards,
Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-19 Thread Sean Hefty
@@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in,

for_each_netdev(init_net, dev)
if (ipv6_chk_addr(init_net,
- ((struct sockaddr_in6 *) addr)-
sin6_addr,
+ ((struct sockaddr_in6 *) dst_in)-
sin6_addr,
  dev, 1))
break;

I can believe it fixes the case you describe (ie loopback) but
matching the *dest* IP against the local interface's IP list cannot
possibly be right.

The intent is to see if the destination address is local.  A source address may
not be given.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-19 Thread Jason Gunthorpe
On Mon, Oct 19, 2009 at 04:47:59PM -0700, Sean Hefty wrote:
 @@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in,
 
 for_each_netdev(init_net, dev)
 if (ipv6_chk_addr(init_net,
 - ((struct sockaddr_in6 *) addr)-
 sin6_addr,
 + ((struct sockaddr_in6 *) dst_in)-
 sin6_addr,
   dev, 1))
 break;
 
 I can believe it fixes the case you describe (ie loopback) but
 matching the *dest* IP against the local interface's IP list cannot
 possibly be right.
 
 The intent is to see if the destination address is local.  A source
 address may not be given.

Well, that makes more sense, but it still pretty strange to match the
IP list like that, the proper thing is to query RT6_TABLE_LOCAL, like
the IPv4 case does.

Anyhow, couldn't the whole addr_resolve_local routine be replaced with
something like this in addr_resolve_remote:
 if (rt-idev == init_net-loopback_dev)
rdma_translate_ip(rt-rt_src, dev_addr, NULL);

for IPv4 and similar for IPv6?

That does query the proper RT_TABLEs to determine if the IP is local
and then we get the searching and ip_dev_find only for the case where
the address is definitely looped back. Much closer to how the IP stack
works normally.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-16 Thread David J. Wilder

On Thu, 2009-10-15 at 15:32 -0600, Jason Gunthorpe wrote:

 
 The actual fixing to the code is not hard, remove rdma_translate_ip,
 addr_resolve_local, split addr_resolve_remote into a part to resolve
 the route and a part that does the arp/nd. Make the route resolve part
 work almost exactly like addr4_resolve_remote (noting that the v6
 version is wrong, since is doesn't respect unset source addres,
 another bug). Call rdma_copy_addr based on the rt-idev-dev (or
 should it be odev??). Do the ARP.
 
 The pain is in retesting everything :|
 
 Jason

There is the wrinkle in this plan.  If you pass ip6_route_output() your
own link-local address (scoped or not) it returns a neighbor entry bound
to the loop back device.

$ ip route get fe80::202:c903:1:28ed oif ib0
local fe80::202:c903:1:28ed from :: via :: dev lo  table local  proto
none  src fe80::202:c903:1:28ed  metric 0  mtu 16436 advmss 16376
hoplimit 4294967295

Thats not going to work as lo has no RDMA support.  I suspect this is
why addr_resolve_local() was added in the first place :)

I will post a patch with our first solution and include the fix to
addr_resolve_local(). 

 Dave.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-16 Thread Jason Gunthorpe
On Fri, Oct 16, 2009 at 11:54:39AM -0700, David J. Wilder wrote:

 There is the wrinkle in this plan.  If you pass ip6_route_output() your
 own link-local address (scoped or not) it returns a neighbor entry bound
 to the loop back device.

Well, that really is correct. The RDMA CM could try to do something
smart if lo is returned from the route lookup - ie choose the first
RDMA device bound to that IP or something.

There are other problems with the loop back case, like the Linux stack
doesn't generate neighbor entries for its own IP addresses.

The loopback case is far less important than the other policy routing
features anyhow..

 I will post a patch with our first solution and include the fix to
 addr_resolve_local(). 

May as well, a little bit of progress is better than none at all. 

Also note that I observed that the IPv6 code path never sets the
source address from the routing table if IN6_ADDR_ANY is used as
source, this is certainly wrong and could cause other problems down
the line, probably more so for iwarp than IB though..

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-15 Thread David J. Wilder
On Wed, 2009-10-14 at 13:09 Jason Gunthorpe wrote:

 So, it tries to match the source addr to the addrs bound to the
 device, which is wrong - that isn't how the ip stack works.

 You can patch this up a little bit by fixing up addr_resolve_local to
 set sin6_scope_ip.

I found the bug in addr_resolve_local().  (more comments below)

--- addr.c.1759 2009-10-13 15:57:48.0 -0500
+++ addr.c.ip_local 2009-10-15 14:03:50.0 -0500
@@ -390,14 +390,17 @@ static int addr_resolve_local(struct soc
case AF_INET6:
{
struct in6_addr *a;
+   int found = 0;
 
for_each_netdev(init_net, dev)
if (ipv6_chk_addr(init_net,
  ((struct sockaddr_in6 *) 
dst_in)-sin6_addr,
- dev, 1))
+ dev, 1)){
+   found = 1;
break;
+   }
 
-   if (!dev)
+   if (!found)
return -EADDRNOTAVAIL;
 
a = ((struct sockaddr_in6 *) src_in)-sin6_addr;
@@ -406,6 +409,8 @@ static int addr_resolve_local(struct soc
src_in-sa_family = dst_in-sa_family;
((struct sockaddr_in6 *) src_in)-sin6_addr =
((struct sockaddr_in6 *) dst_in)-sin6_addr;
+   ((struct sockaddr_in6 *) src_in)-sin6_scope_id =
+((struct sockaddr_in6 *) 
dst_in)-sin6_scope_id;
ret = rdma_copy_addr(addr, dev, dev-dev_addr);
} else if (ipv6_addr_loopback(a)) {
ret = rdma_translate_ip(dst_in, addr);


 But really the correct thing to do is to remove addr_resolve_local and
 place the source address into the struct flowi and use the result of
 the route lookup to bind to the source device, and set the source
 address if it is unset.

Sorry I don't get it..
Are you saying that ip6_route_output() will resolve the address even if
it is a link-local address bound to my own interface? Therefor
addr_resolve_local() is not needed.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-15 Thread Jason Gunthorpe
On Thu, Oct 15, 2009 at 12:27:21PM -0700, David J. Wilder wrote:
 On Wed, 2009-10-14 at 13:09 Jason Gunthorpe wrote:
 
  So, it tries to match the source addr to the addrs bound to the
  device, which is wrong - that isn't how the ip stack works.
 
  You can patch this up a little bit by fixing up addr_resolve_local to
  set sin6_scope_ip.
 
 I found the bug in addr_resolve_local().  (more comments below)

Yes, that is the hacky work around I was mentioning..

  But really the correct thing to do is to remove addr_resolve_local and
  place the source address into the struct flowi and use the result of
  the route lookup to bind to the source device, and set the source
  address if it is unset.
 
 Sorry I don't get it..
 Are you saying that ip6_route_output() will resolve the address even if
 it is a link-local address bound to my own interface? Therefor
 addr_resolve_local() is not needed.

Yes, and more. In Linux the routing table takes as input the source
(optional), device (optional) and destination address and returns as
output the device to use.

To determine the device to bind to you ask the routing table what
device to use for all the route information you have.

For example:

$ ip route get fe80::c2d from fe80::213:72ff:fe29:e65d oif eth0
fe80::c2d via fe80::c2d dev eth0  src fe80::213:72ff:fe29:e65d  metric 0 
cache  mtu 1500 advmss 1440 hoplimit 4294967295

$ ip route get fe80::c2d oif eth0
fe80::c2d via fe80::c2d dev eth0  src fe80::213:72ff:fe29:e65d  metric 0 
cache  mtu 1500 advmss 1440 hoplimit 4294967295

You can see in both cases the routing table returns a 'src'
entry. 'src' is the address to bind to if no bind address was specified.

When doing link local addresess the sin6_scope_id should sets the
'oif' key in the routing lookup, which will result in the correct src
address and output device being selected by the routing algorithm. For
instance on my machine here, I have two interfaces:

$ ip route get fe80::c2d oif virbr0 
fe80::c2d via fe80::c2d dev virbr0  src fe80::2c5d:c4ff:feb8:1ce5  metric 0 
cache  mtu 1500 advmss 1440 hoplimit 4294967295

As you can see it is returning the link local address for virbr0 as
the source.

So the algorithm in RDMA CM should look like this:
 - If src is specified then set the bind local address to src
   [if src is link local then it must specify sin6_scope_id, and
   sin6_scope_id becomes the oif input to the route lookup]
 - If dst is link local then its sin6_scope_id is the oif to the route
   lookup (and must match src, as we did last go round)
 - Src (or 'any'), dst and device (or 'any') are passed to the route
   lookup
 - The RDMA CM ID is bound to the device returned by the route lookup
 - If the src address was not specified then the connection source IP
   is set to the 'src' value from the route lookup.

This is why addr_resolve_local/rdma_translate_ip is not needed, that
entire entire function is done by the routing table code.

You can see why this becomes important when it is combined with policy
routing, for instance consider this example:

$ ip rule
32765:  from 10.0.0.4 lookup dnat
$ ip route show table dnat
default via 10.0.0.1 dev eth1
$ ip route get 10.0.0.100
10.0.0.100 dev eth0  src 10.0.0.2 
$ ip route get 10.0.0.100 from 10.0.0.4
10.0.0.100 from 10.0.0.4 via 10.0.0.1 dev eth1
cache  mtu 1500 advmss 1460 hoplimit 64

The two results are radically different and dependant on the source
address. (10.0.0.4 could be attached to eth0, and eth1!)

The actual fixing to the code is not hard, remove rdma_translate_ip,
addr_resolve_local, split addr_resolve_remote into a part to resolve
the route and a part that does the arp/nd. Make the route resolve part
work almost exactly like addr4_resolve_remote (noting that the v6
version is wrong, since is doesn't respect unset source addres,
another bug). Call rdma_copy_addr based on the rt-idev-dev (or
should it be odev??). Do the ARP.

The pain is in retesting everything :|

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-14 Thread David J. Wilder
Sean-
This patch should fix the behavior of rdma_resolve_addr  when using
link-local addressing.
  
On Tue, 2009-10-13 at 17:12 -0600, Jason Gunthorpe wrote:
 On Tue, Oct 13, 2009 at 03:09:40PM -0700, David J. Wilder wrote:
  Here is a patch to addr6_resolve_remote() to correctly handle link-local 
  address.
  It should cover all the conditions Jason described. 
 
 Looks pretty good to me, definitely on the right track.
 
 Hmm..
 
 Actually, upon comparing to tcp_ipv6.c, I'd say one more behavior is
 necessary. The code in tcp_ipv6 allows the destination to not specify
 a scope id if an interface has already been set. Looks like the two
 ways to set an interface ID are to use bind() or SO_BINDTODEVICE..
 
 Specifying a source address to RDMA CM is similar to bind(), so if the
 source address is link local it must specify a sin6_scope_id and the
 dest address can specify 0, or the same value.

Jason - 
This new patch should closely emulate tcp_ipv6.c. when both source and
destination scope_ids are given with link-local address. 

With this patch rping works as expected.

Link-local with scope:
# /usr/bin/rping -c -a fe80::202:c903:1:1925%ib0

Link-local w/out scope:
# /usr/bin/rping -c  -a fe80::202:c903:1:1925
rdma_resolve_addr error -1

Other ipv6 address:
# /usr/bin/rping -c -a 2001:db8:1234::2

(server side started with rping -s -P -v -a ::0)

Signed-off-by: David Wilder dwil...@us.ibm.com
-
--- drivers/infiniband/core/addr.c.1759 2009-10-13 15:57:48.0 -0500
+++ drivers/infiniband/core/addr.c  2009-10-14 10:47:56.0 -0500
@@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct s
fl.nl_u.ip6_u.daddr = dst_in-sin6_addr;
fl.nl_u.ip6_u.saddr = src_in-sin6_addr;
 
+   if (ipv6_addr_type(dst_in-sin6_addr)  IPV6_ADDR_LINKLOCAL){
+   // Link-local address require an interface to be specified.
+   if (!(dst_in-sin6_scope_id||src_in-sin6_scope_id))
+   return -EINVAL;
+
+   // If src and dst interfaces are supplied they must match.
+   if ( (dst_in-sin6_scope_id  src_in-sin6_scope_id) 
+   (src_in-sin6_scope_id != dst_in-sin6_scope_id) )
+   return -EINVAL;
+   if ( dst_in-sin6_scope_id )
+   fl.oif = dst_in-sin6_scope_id;
+   else
+   fl.oif = src_in-sin6_scope_id;
+   }
+
dst = ip6_route_output(init_net, NULL, fl);
if (!dst)
return ret;


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-14 Thread Jason Gunthorpe
On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:

 This new patch should closely emulate tcp_ipv6.c. when both source and
 destination scope_ids are given with link-local address. 

Maybe like this:

fl.oif = 0;

if (ipv6_addr_type(src_in-sin6_addr)  IPV6_ADDR_LINKLOCAL) {
if (!src_in-sin6_scope_id)
return -EINVAL;
fl.oif = src_in-sin6scope_id;
}
if (ipv6_addr_type(dst_in-sin6_addr)  IPV6_ADDR_LINKLOCAL){
if (dst_in.sin6_scope_id) {
if (fl.oif  fl.oif != dst_in.sin6_scope_id)
return -EINVAL;
fl.oif = dst_in.sin6_scope_id;
}
if (!fl.oif)
return -EINVAL;
}


src and dest have to be tested seperately, the TCPv6 versions don't
require them to be both link local.

Don't forget to run checkpatch ;)

Regards,
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-14 Thread David J. Wilder

On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote:
 On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:
 
  This new patch should closely emulate tcp_ipv6.c. when both source and
  destination scope_ids are given with link-local address. 
 
 Maybe like this:
 
 fl.oif = 0;
 
   if (ipv6_addr_type(src_in-sin6_addr)  IPV6_ADDR_LINKLOCAL) {
   if (!src_in-sin6_scope_id)
 return -EINVAL;
 fl.oif = src_in-sin6scope_id;
 }
   if (ipv6_addr_type(dst_in-sin6_addr)  IPV6_ADDR_LINKLOCAL){
 if (dst_in.sin6_scope_id) {
 if (fl.oif  fl.oif != dst_in.sin6_scope_id)
   return -EINVAL;
 fl.oif = dst_in.sin6_scope_id;
 }
 if (!fl.oif)
   return -EINVAL;
   }
 
 
 src and dest have to be tested seperately, the TCPv6 versions don't
 require them to be both link local.
 
 Don't forget to run checkpatch ;)

This looks good. Once concern, it may be obtuse, but if both the src and
dst are link-local addresses should only one need to be scoped?  This
patch will required the src to always be scoped when using link local.  

Dave...

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-14 Thread Jason Gunthorpe
On Wed, Oct 14, 2009 at 10:30:05AM -0700, David J. Wilder wrote:

 This looks good. Once concern, it may be obtuse, but if both the src and
 dst are link-local addresses should only one need to be scoped?  This
 patch will required the src to always be scoped when using link local.  

The TCPv6 code requires the src to be scoped (or SO_BINDTODEVICE to be
used prior). If src is scoped then it is optional to scope dest, but
if dest is scoped then it must match. I intended to capture that logic..

Can you give a quick test and send a git format-patch/checkpatch'd
patch to Roland, with your signed off line, etc. You can put a
Reviewed-by line from me as well if you like.

Thanks,
Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-14 Thread David J. Wilder

On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote:
 On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:
 
  This new patch should closely emulate tcp_ipv6.c. when both source and
  destination scope_ids are given with link-local address. 
 
 Maybe like this:
 
 fl.oif = 0;
 
   if (ipv6_addr_type(src_in-sin6_addr)  IPV6_ADDR_LINKLOCAL) {
   if (!src_in-sin6_scope_id)
 return -EINVAL;
 fl.oif = src_in-sin6scope_id;
 }
   if (ipv6_addr_type(dst_in-sin6_addr)  IPV6_ADDR_LINKLOCAL){
 if (dst_in.sin6_scope_id) {
 if (fl.oif  fl.oif != dst_in.sin6_scope_id)
   return -EINVAL;
 fl.oif = dst_in.sin6_scope_id;
 }
 if (!fl.oif)
   return -EINVAL;
   }
 
 
 src and dest have to be tested seperately, the TCPv6 versions don't
 require them to be both link local.

Hum.. this change is not working with rping. The src address has been
filled in already with a link-local address and no scope_id so it is
returning -EINVAL.  Another bug I guess I am doing some more
debug.   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-14 Thread Jason Gunthorpe
On Wed, Oct 14, 2009 at 12:33:17PM -0700, David J. Wilder wrote:
 
 On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote:
  On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:
  
   This new patch should closely emulate tcp_ipv6.c. when both source and
   destination scope_ids are given with link-local address. 
  
  Maybe like this:
  
  fl.oif = 0;
  
  if (ipv6_addr_type(src_in-sin6_addr)  IPV6_ADDR_LINKLOCAL) {
  if (!src_in-sin6_scope_id)
  return -EINVAL;
  fl.oif = src_in-sin6scope_id;
  }
  if (ipv6_addr_type(dst_in-sin6_addr)  IPV6_ADDR_LINKLOCAL){
  if (dst_in.sin6_scope_id) {
  if (fl.oif  fl.oif != dst_in.sin6_scope_id)
  return -EINVAL;
  fl.oif = dst_in.sin6_scope_id;
  }
  if (!fl.oif)
  return -EINVAL;
  }
  
  
  src and dest have to be tested seperately, the TCPv6 versions don't
  require them to be both link local.
 
 Hum.. this change is not working with rping. The src address has been
 filled in already with a link-local address and no scope_id so it is
 returning -EINVAL.  Another bug I guess I am doing some more
 debug.   

Ah feh, you are running into that other bug..

http://lists.openfabrics.org/pipermail/general/2009-July/060612.html

Basically it does this:

int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
  struct sockaddr *dst_addr, int timeout_ms)
ret = rdma_resolve_ip(addr_client, (struct sockaddr *) 
id-route.addr.src_addr,
  dst_addr, id-route.addr.dev_addr,
  timeout_ms, addr_handler,
  id_priv);
int rdma_resolve_ip(struct rdma_addr_client *client,
struct sockaddr *src_addr, struct sockaddr *dst_addr,
struct rdma_dev_addr *addr, int timeout_ms,
void (*callback)(int status, struct sockaddr *src_addr,
 struct rdma_dev_addr *addr, void *context),
void *context)
req-status = addr_resolve_local(src_in, dst_in, addr);

static int addr_resolve_local(struct sockaddr *src_in,
  struct sockaddr *dst_in,
  struct rdma_dev_addr *addr)
dev = ip_dev_find(init_net, dst_ip);
[..]
   for_each_netdev(init_net, dev)
if (ipv6_chk_addr(init_net,
  ((struct sockaddr_in6 *) 
addr)-sin6_addr,
  dev, 1))
break;

So, it tries to match the source addr to the addrs bound to the
device, which is wrong - that isn't how the ip stack works.

You can patch this up a little bit by fixing up addr_resolve_local to
set sin6_scope_ip.

But really the correct thing to do is to remove addr_resolve_local and
place the source address into the struct flowi and use the result of
the route lookup to bind to the source device, and set the source
address if it is unset.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] link-local address fix for rdma_resolve_addr

2009-10-13 Thread Jason Gunthorpe
On Tue, Oct 13, 2009 at 03:09:40PM -0700, David J. Wilder wrote:
 Here is a patch to addr6_resolve_remote() to correctly handle link-local 
 address.
 It should cover all the conditions Jason described. 

Looks pretty good to me, definitely on the right track.

Hmm..

Actually, upon comparing to tcp_ipv6.c, I'd say one more behavior is
necessary. The code in tcp_ipv6 allows the destination to not specify
a scope id if an interface has already been set. Looks like the two
ways to set an interface ID are to use bind() or SO_BINDTODEVICE..

Specifying a source address to RDMA CM is similar to bind(), so if the
source address is link local it must specify a sin6_scope_id and the
dest address can specify 0, or the same value.

How af_inet6 handles bind():

if (addr_type  IPV6_ADDR_LINKLOCAL) {
if (addr_len = sizeof(struct sockaddr_in6) 
addr-sin6_scope_id) {
/* Override any existing binding, if 
another one
 * is supplied by user.
 */
sk-sk_bound_dev_if = 
addr-sin6_scope_id;
}

/* Binding to link-local address requires an 
interface */
if (!sk-sk_bound_dev_if) {
err = -EINVAL;
goto out;
}
dev = dev_get_by_index(net, 
sk-sk_bound_dev_if);
if (!dev) {
err = -ENODEV;
goto out;
}
}

How tcp_ipv6 checks the destination address:
if (addr_typeIPV6_ADDR_LINKLOCAL) {
if (addr_len = sizeof(struct sockaddr_in6) 
usin-sin6_scope_id) {
/* If interface is set while binding, indices
 * must coincide.
 */
if (sk-sk_bound_dev_if 
sk-sk_bound_dev_if != usin-sin6_scope_id)
return -EINVAL;

sk-sk_bound_dev_if = usin-sin6_scope_id;
}

/* Connect to link-local address requires an interface */
if (!sk-sk_bound_dev_if)
return -EINVAL;
}

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg