Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-10 Thread Sean Hefty
The hack to use a socket and bind it to claim the port was just for 
demostrating the idea.  The correct solution, IMO, is to enhance the 
core low level 4-tuple allocation services to be more generic (eg: not 
be tied to a struct sock).  Then the host tcp stack and the host rdma 
stack can allocate TCP/iWARP ports/4tuples from this common exported 
service and share the port space.  This allocation service could also be 
used by other deep adapters like iscsi adapters if needed.


Since iWarp runs on top of TCP, the port space is really the same. 
FWIW, I agree that this proposal is the correct solution to support iWarp.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-10 Thread David Miller
From: Sean Hefty [EMAIL PROTECTED]
Date: Wed, 10 Oct 2007 14:01:07 -0700

  The hack to use a socket and bind it to claim the port was just for 
  demostrating the idea.  The correct solution, IMO, is to enhance the 
  core low level 4-tuple allocation services to be more generic (eg: not 
  be tied to a struct sock).  Then the host tcp stack and the host rdma 
  stack can allocate TCP/iWARP ports/4tuples from this common exported 
  service and share the port space.  This allocation service could also be 
  used by other deep adapters like iscsi adapters if needed.
 
 Since iWarp runs on top of TCP, the port space is really the same. 
 FWIW, I agree that this proposal is the correct solution to support iWarp.

But you can be sure it's not going to happen, sorry.

It would mean that we'd need to export the entire TCP socket table so
then when iWARP connections are created you can search to make sure
there is not an existing full 4-tuple that is the same.

It is not just about local TCP ports.

iWARP needs to live in it's seperate little container and not
contaminate the rest of the networking, this is the deal.  Any
suggested such change which breaks that deal will be NACK'd by all of
the core networking developers.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-09 Thread James Lentini

On Mon, 8 Oct 2007, Steve Wise wrote:

 The correct solution, IMO, is to enhance the core low level 4-tuple 
 allocation services to be more generic (eg: not be tied to a struct 
 sock).  Then the host tcp stack and the host rdma stack can allocate 
 TCP/iWARP ports/4tuples from this common exported service and share 
 the port space.  This allocation service could also be used by other 
 deep adapters like iscsi adapters if needed.

As a developer of an RDMA ULP, NFS-RDMA, I like this approach because 
it will simplify the configuration of an RDMA device and the services 
that use it.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-08 Thread Steve Wise



David Miller wrote:

From: Sean Hefty [EMAIL PROTECTED]
Date: Thu, 09 Aug 2007 14:40:16 -0700


Steve Wise wrote:

Any more comments?
Does anyone have ideas on how to reserve the port space without using a 
struct socket?


How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.

These are exactly the kinds of problems for which people like myself
were dreading.  These subsystems have no buisness using the TCP port
space of the Linux software stack, absolutely none.

After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.

I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.


Hey Dave,

The hack to use a socket and bind it to claim the port was just for 
demostrating the idea.  The correct solution, IMO, is to enhance the 
core low level 4-tuple allocation services to be more generic (eg: not 
be tied to a struct sock).  Then the host tcp stack and the host rdma 
stack can allocate TCP/iWARP ports/4tuples from this common exported 
service and share the port space.  This allocation service could also be 
used by other deep adapters like iscsi adapters if needed.


Will you NAK such a solution if I go implement it and submit for review? 
 The dual ip subnet solution really sux, and I'm trying one more time 
to see if you will entertain the common port space solution, if done 
correctly.


Thanks,

Steve.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-28 Thread Roland Dreier
Sorry for the long latency, I was at the beach all last week.

   And direct data placement really does give you a factor of two at
   least, because otherwise you're stuck receiving the data in one
   buffer, looking at some of the data at least, and then figuring out
   where to copy it.  And memory bandwidth is if anything becoming more
   valuable; maybe LRO + header splitting + page remapping tricks can get
   you somewhere but as NCPUS grows then it seems the TLB shootdown cost
   of page flipping is only going to get worse.

  As Herbert has said already, people can code for this just like
  they have to code for RDMA.

No argument, you need to change the interface to take advantage of RDMA.

  There is no fundamental difference from converting an application
  to sendfile or similar.

Yes, on the transmit side, there's not much difference from sendfile
or splice, although RDMA may give a slightly nicer interface that also
gives basically the equivalent of AIO.

  The only thing this needs is a
  recvmsg_I_dont_care_where_the_data_is() call.  There are no alignment
  issues unless you are trying to push this data directly into the
  page cache.

I don't understand how this gives you the same thing as direct data
placement (DDP).  There are many situations where the sender knows
where the data has to go and if there's some way to pass that to the
receiver, so that info can be used in the receive path to put the data
in the right place, the receiver can save a copy.  This is
fundamentally the same offload that an FC HBA does -- the SCSI
midlayer queues up commands like read block A and put the data at
address X and read block B and put the data at address Y and the
HBA matches tags on incoming data to put the blocks at the right
addresses, even if block B is received before block A.

RFC 4297 has some discussion of the various approaches, and while you
might not agree with their conclusions, it is interesting reading.

  Couple this with a card that makes sure that on a per-page basis, only
  data for a particular flow (or group of flows) will accumulate.

It seems that the NIC would also have to look into a TCP stream (and
handle out of order segments etc) to find message boundaries for this
to be equivalent to what an RDMA NIC does.

 - R.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-28 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 28 Aug 2007 12:38:07 -0700

 It seems that the NIC would also have to look into a TCP stream (and
 handle out of order segments etc) to find message boundaries for this
 to be equivalent to what an RDMA NIC does.

It would work for data that accumulates in-order, give or take a small
window, just like LRO does.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-21 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Mon, 20 Aug 2007 18:16:54 -0700

 And direct data placement really does give you a factor of two at
 least, because otherwise you're stuck receiving the data in one
 buffer, looking at some of the data at least, and then figuring out
 where to copy it.  And memory bandwidth is if anything becoming more
 valuable; maybe LRO + header splitting + page remapping tricks can get
 you somewhere but as NCPUS grows then it seems the TLB shootdown cost
 of page flipping is only going to get worse.

As Herbert has said already, people can code for this just like
they have to code for RDMA.

There is no fundamental difference from converting an application
to sendfile or similar.

The only thing this needs is a
recvmsg_I_dont_care_where_the_data_is() call.  There are no alignment
issues unless you are trying to push this data directly into the
page cache.

Couple this with a card that makes sure that on a per-page basis, only
data for a particular flow (or group of flows) will accumulate.

People already make cards that can do stuff like this, it can be done
statelessly with an on-chip dynamically maintained flow table.

And best yet it doesn't turn off every feature in the networking nor
bypass it for the actual protocol processing.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
   Isn't RDMA _part_ of the software net stack within Linux?

  It very much is not so.

This is just nit-picking.  You can draw the boundary of the software
net stack wherever you want, but I think Sean's point was just that
RDMA drivers already are part of Linux, and we all want them to get
better.

  When using RDMA you lose the capability to do packet shaping,
  classification, and all the other wonderful networking facilities
  you've grown to love and use over the years.

Same thing with TSO and LRO and who knows what else.  I know you're
going to make a distinction between stateless and stateful
offloads, but really it's just an arbitrary distinction between things
you like and things you don't.

  Imagine if you didn't know any of this, you purchase and begin to
  deploy a huge piece of RDMA infrastructure, you then get the mandate
  from IT that you need to add firewalling on the RDMA connections at
  the host level, and oh shit you can't?

It's ironic that you bring up firewalling.  I've had vendors of iWARP
hardware tell me they would *love* to work with the community to make
firewalling work better for RDMA connections.  But instead we get the
catch-22 of your changing arguments -- first, you won't even consider
changes that might help RDMA work better in the name of
maintainability; then you have to protect poor, ignorant users from
accidentally using RDMA because of some problem or another; and then
when someone tries to fix some of the problems you mention, it's back
to step one.

Obviously some decisions have been prejudged here, so I guess this
moves to the realm of politics.  I have plenty of interesting
technical stuff, so I'll leave it to the people with a horse in the
race to find ways to twist your arm.

 - R.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Fri, 17 Aug 2007 12:52:39 -0700

   When using RDMA you lose the capability to do packet shaping,
   classification, and all the other wonderful networking facilities
   you've grown to love and use over the years.
 
 Same thing with TSO and LRO and who knows what else.

Not true at all.  Full classification and filtering still is usable
with TSO and LRO.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
 When using RDMA you lose the capability to do packet shaping,
 classification, and all the other wonderful networking facilities
 you've grown to love and use over the years.
   
   Same thing with TSO and LRO and who knows what else.
  
  Not true at all.  Full classification and filtering still is usable
  with TSO and LRO.

Well, obviously with TSO and LRO the packets that the stack sends or
receives are not the same as what's on the wire.  Whether that breaks
your wonderful networking facilities or not depends on the specifics
of the particular facility I guess -- for example shaping is clearly
broken by TSO.  (And people can wonder what the packet trains TSO
creates do to congestion control on the internet, but the netdev crowd
has already decided that TSO is good and RDMA is bad)

 - R.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Fri, 17 Aug 2007 16:31:07 -0700

  When using RDMA you lose the capability to do packet shaping,
  classification, and all the other wonderful networking facilities
  you've grown to love and use over the years.

Same thing with TSO and LRO and who knows what else.
   
   Not true at all.  Full classification and filtering still is usable
   with TSO and LRO.
 
 Well, obviously with TSO and LRO the packets that the stack sends or
 receives are not the same as what's on the wire.  Whether that breaks
 your wonderful networking facilities or not depends on the specifics
 of the particular facility I guess -- for example shaping is clearly
 broken by TSO.  (And people can wonder what the packet trains TSO
 creates do to congestion control on the internet, but the netdev crowd
 has already decided that TSO is good and RDMA is bad)

This is also a series of falsehoods.  All packet filtering,
queue management, and packet scheduling facilities work perfectly
fine and as designed with both LRO and TSO.

When problems come up, they are bugs, and we fix them.

Please stop spreading this FUD about TSO and LRO.

The fact is that RDMA bypasses the whole stack so that supporting
these facilities is not even _POSSIBLE_.  With stateless offloads it
is possible to support all of these facilities, and we do.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread Tom Tucker
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote:

[...snip...]

  I think removing the RDMA stack is the wrong thing to do, and you 
  shouldn't just threaten to yank entire subsystems because you don't like 
  the technology.  Lets keep this constructive, can we?  RDMA should get 
  the respect of any other technology in Linux.  Maybe its a niche in your 
  opinion, but come on, there's more RDMA users than say, the sparc64 
  port.  Eh?
 
 It's not about being a niche.  It's about creating a maintainable 
 software net stack that has predictable behavior.

Isn't RDMA _part_ of the software net stack within Linux? Why isn't
making RDMA stable, supportable and maintainable equally as important as
any other subsystem? 

 
 Needing to reach out of the RDMA sandbox and reserve net stack resources 
 away from itself travels a path we've consistently avoided.
 
 
  I will NACK any patch that opens up sockets to eat up ports or
  anything stupid like that.
  
  Got it.
 
 Ditto for me as well.
 
   Jeff
 
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread David Miller
From: Tom Tucker [EMAIL PROTECTED]
Date: Thu, 16 Aug 2007 08:43:11 -0500

 Isn't RDMA _part_ of the software net stack within Linux?

It very much is not so.

When using RDMA you lose the capability to do packet shaping,
classification, and all the other wonderful networking facilities
you've grown to love and use over the years.

I'm glad this is a surprise to you, because it illustrates the
point some of us keep trying to make about technologies like
this.

Imagine if you didn't know any of this, you purchase and begin to
deploy a huge piece of RDMA infrastructure, you then get the mandate
from IT that you need to add firewalling on the RDMA connections at
the host level, and oh shit you can't?

This is why none of us core networking developers like RDMA at all.
It's totally not integrated with the rest of the Linux stack and on
top of that it even gets in the way.  It's an abberation, an eye sore,
and a constant source of consternation.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-15 Thread Steve Wise



David Miller wrote:

From: Sean Hefty [EMAIL PROTECTED]
Date: Thu, 09 Aug 2007 14:40:16 -0700


Steve Wise wrote:

Any more comments?
Does anyone have ideas on how to reserve the port space without using a 
struct socket?


How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.


I think removing the RDMA stack is the wrong thing to do, and you 
shouldn't just threaten to yank entire subsystems because you don't like 
the technology.  Lets keep this constructive, can we?  RDMA should get 
the respect of any other technology in Linux.  Maybe its a niche in your 
opinion, but come on, there's more RDMA users than say, the sparc64 
port.  Eh?




These are exactly the kinds of problems for which people like myself
were dreading.  These subsystems have no buisness using the TCP port
space of the Linux software stack, absolutely none.



Ok, although IMO its the correct solution.  But I'll propose other 
solutions below.  I ask for your feedback (and everyones!) on these 
alternate solutions.



After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.



The only other change requested and commited, if I recall correctly, was 
for netevents, and that enabled both Infiniband and iWARP to integrate 
with the neighbour subsystem.  I think that was a useful and needed 
change.  Prior to that, these subsystems were snooping ARP replies to 
trigger events.  That was back in 2.6.18 or 2.6.19 I think...



I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.


Got it.

Here are alternate solutions that avoid the need to share the port space:

Solution 1)

1) admins must setup an alias interface on the iwarp device for use with 
rdma.  This interface will have to be a separate subnet from the TCP 
used interface.  And with a canonical name that indicates its for rdma 
only.  Like eth2:iw or eth2:rdma.  There can be many of these per device.


2) admins make sure their sockets/tcp services don't use the interface 
configured in #1, and their rdma service do use said interface.


3) iwarp providers must translation binds to ipaddr 0.0.0.0 to the 
associated for rdma only ip addresses.  They can do this by searching 
for all aliases of the canonical name that are aliases of the TCP 
interface for their nic device.  Or: somehow not handle incoming 
connections to any address but the for rdma use addresses and instead 
pass them up and not offload them.


This will avoid the collisions as long as the above steps are followed.


Solution 2)

Another possibility would be for the driver to create two net devices 
(and hence two interace names) like eth2 and iw2, and artificially 
separate the RDMA stuff that way.


These two solutions are similar in that they create a rdma only interface.

Pros:
- is not intrusive into the core networking code
- very minimal changes needed and in the iwarp provider's code, who are 
the ones with this problem

- makes it clear which subnets are RDMA only

Cons:
- relies on system admin to set it up correctly.
- native stack can still use this rdma-only interface and the same 
port space issue will exist.



For the record, here are possible port-sharing solutions Dave sez he'll NAK:

Solution NAK-1)

The rdma-cma just allocates a socket and binds it to reserve TCP ports.

Pros:
- minimal changes needed to implement (always a plus in my mind :)
- simple, clean, and it works (KISS)
- if no RDMA is in use, there is no impact on the native stack
- no need for a seperate RDMA interface

Cons:
- wastes memory
- puts a TCP socket in the CLOSED state in the pcb tables.
- Dave will NAK it :)

Solution NAK-2)

Create a low-level sockets-agnostic port allocation service that is 
shared by both TCP and RDMA.  This way, the rdma-cm can reserve ports in 
an efficient manor instead of doing it via kernel_bind() using a sock 
struct.


Pros:
- probably the correct solution (my opinion :) if we went down the path 
of sharing port space

- if no RDMA is in use, there is no impact on the native stack
- no need for a separate RDMA interface

Cons:

- very intrusive change because the port allocations stuff is tightly 
bound to the host stack and sock struct, etc.

- Dave will NAK it :)


Steve.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-15 Thread Jeff Garzik

Steve Wise wrote:



David Miller wrote:

From: Sean Hefty [EMAIL PROTECTED]
Date: Thu, 09 Aug 2007 14:40:16 -0700


Steve Wise wrote:

Any more comments?
Does anyone have ideas on how to reserve the port space without using 
a struct socket?


How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.


I think removing the RDMA stack is the wrong thing to do, and you 
shouldn't just threaten to yank entire subsystems because you don't like 
the technology.  Lets keep this constructive, can we?  RDMA should get 
the respect of any other technology in Linux.  Maybe its a niche in your 
opinion, but come on, there's more RDMA users than say, the sparc64 
port.  Eh?


It's not about being a niche.  It's about creating a maintainable 
software net stack that has predictable behavior.


Needing to reach out of the RDMA sandbox and reserve net stack resources 
away from itself travels a path we've consistently avoided.




I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.


Got it.


Ditto for me as well.

Jeff


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-15 Thread Roland Dreier
  Needing to reach out of the RDMA sandbox and reserve net stack
  resources away from itself travels a path we've consistently avoided.

Where did the idea of an RDMA sandbox come from?  Obviously no one
disagrees with keeping things clean and maintainable, but the idea
that RDMA is a second-class citizen that doesn't get any input into
the evolution of the networking code seems kind of offensive to me.

 - R.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-09 Thread Sean Hefty

Steve Wise wrote:

Any more comments?


Does anyone have ideas on how to reserve the port space without using a 
struct socket?


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-09 Thread David Miller
From: Sean Hefty [EMAIL PROTECTED]
Date: Thu, 09 Aug 2007 14:40:16 -0700

 Steve Wise wrote:
  Any more comments?
 
 Does anyone have ideas on how to reserve the port space without using a 
 struct socket?

How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.

These are exactly the kinds of problems for which people like myself
were dreading.  These subsystems have no buisness using the TCP port
space of the Linux software stack, absolutely none.

After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.

I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-09 Thread Sean Hefty

How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.


There are currently two RDMA solutions available.  Each solution has 
different requirements and uses the normal network stack differently. 
Infiniband uses its own transport.  iWarp runs over TCP.


We have tried to leverage the existing infrastructure where it makes sense.


After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.


Currently, the RDMA stack uses its own port space.  This causes a 
problem for iWarp, and is what Steve is looking for a solution for.  I'm 
not an iWarp guru, so I don't know what options exist.  Can iWarp use 
its own address family?  Identify specific IP addresses for iWarp use? 
Restrict iWarp to specific port numbers?  Let the app control the 
correct operation?  I don't know.


Steve merely defined a problem and suggested a possible solution.  He's 
looking for constructive help trying to solve the problem.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-06 Thread Steve Wise



Sean Hefty wrote:

Lemme know how I can help.  I certainly can test any patches on my 8
node iwarp cluster.


We should probably take the idea to netdev before making any substantial changes
to the code.

- Sean


Yup.

Should I post my RFC patch and we'll go from there?

Steve.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-06 Thread Sean Hefty

Should I post my RFC patch and we'll go from there?


Sounds good to me.  Roland, do you have any opinion?

- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-04 Thread Steve Wise



Sean Hefty wrote:
If we get rid of the rdma_cm specific port spaces, do we then reduce 
the  valid possible spaces to just TCP and UDP?  Or what?  In the 
sockets paradigm, the socket is explicitly bound to a protocol space 
when its created (based on the protocol id).  Do you think we need to 
change the rdma_cm_id to have such a concept?  IE when you create the 
cm_id, you say your intended QP type or port space?  The current API 
lends itself to somone incorrectly choosing a port space, by the way.


Currently, the RDMA port space implies the QP type (RC or UD).  We're 
not tied to any specific protocol when we create the rdma_cm_id, since 
we don't know what type of RDMA device we'll end up using.  So, I don't 
think we want users to specify a protocol.


But should we really change the API that drastically?  Or just keep 
the port spaces and make PS_TCP share the host's port space.


I don't want to break the user space API, if it can be helped.  SDP is 
kind of a problem, in that the rdma_cm needs to distinguish between SDP 
as a user, versus someone using RDMA_PS_TCP.  SDP maps between the RDMA 
port space and real TCP port space.  I need to get some details on how 
SDP uses the rdma_cm, like whether it uses wild card port numbers.


Maybe the rdma-cm port spaces should really be IB, IWARP, or BOTH.  IB 
has its own port space, and IWARP or BOTH gets the TCP port space.


I thought about doing something like this, but I'm not sure there would 
be much use of just IB or just iWarp, when the user can specify both.  I 
even considered pushing the problem into the iWarp CM, but that seems 
like a more complex implementation with no benefit unless there are 
users of just an IB port space.


At this point, my thoughts are to take your original patch, remove the 
rdma_cm port space structures and functions, and figure out how to 
handle SDP.




Lemme know how I can help.  I certainly can test any patches on my 8 
node iwarp cluster.



Steve.

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-04 Thread Sean Hefty
Lemme know how I can help.  I certainly can test any patches on my 8
node iwarp cluster.

We should probably take the idea to netdev before making any substantial changes
to the code.

- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-02 Thread Steve Wise

Sean Hefty wrote:
Consider NFS and NFS-RDMA.  The NFS gurus struggled with this very 
issue and concluded that the RDMA service needs to be on a separate 
port. Thus they are proposing a new netid/port number for doing RDMA 
mounts vs TCP/UDP mounts.  IMO that is the correct way to go:  RDMA 
services are different that tcp services.  They use a different 
protocol on top of TCP and thus shouldn't be handled on the same TCP 
port.  So, applications that want to service Sockets and RDMA services 
concurrently would do so by listening on different ports...


This is a good point, and a different view from what I've been taking. I 
was looking at it more like trying to provide the same service over UDP 
and TCP, where you use the same port number.  I just can't come up with 
any solution that works for iWarp, and sharing the port space seems like 
the only way to fix things.


The iWARP protocols don't include a UDP based service, so it is not 
needed.  But if you're calling it a UDP port space, maybe it should be 
the host's port space?


I think it should match what's done for TCP.  IMO, there should be a 
connectionless RDMA service, along with multicast, over 
UDP/IP/Ethernet.  :)




I think the winner would really be a reliable connectionless RDMA 
service with mcast.


Yes.  The only exports interfaces into the host port allocation stuff 
requires a socket struct.  I didn't want to try and tackle exporting 
the port allocation services at a lower level.  Even at the bottom 
level, I think it still assumes a socket struct...


I looked at this too at one point, and gave up as well.  I don't know 
what other assumptions are made in the stack as a result of this.  For 
example, if an app binds to an IP and port, and the IP address is 
removed and re-added, is the port still valid/reserved?




I just tried this and I believe the application is still listening/bound 
even though the address is no longer valid for the host:


[EMAIL PROTECTED] ~]# ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:E0:81:33:67:D1
  BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
  Interrupt:29

[EMAIL PROTECTED] ~]# netserver -L 192.168.69.135 -p  -4
Starting netserver at port 
set_up_server could not establish a listen endpoint for  port  with 
family AF_INET

[EMAIL PROTECTED] ~]# ifconfig eth1 192.168.69.135 up
[EMAIL PROTECTED] ~]# netserver -L 192.168.69.135 -p  -4
Starting netserver at port 
Starting netserver at hostname 192.168.69.135 port  and family AF_INET
[EMAIL PROTECTED] ~]# netstat -an|grep 
tcp0  0 192.168.69.135: 0.0.0.0:* 
LISTEN

[EMAIL PROTECTED] ~]# ifconfig eth1 0.0.0.0
[EMAIL PROTECTED] ~]# netstat -an|grep 
tcp0  0 192.168.69.135: 0.0.0.0:* 
LISTEN

[EMAIL PROTECTED] ~]# ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:E0:81:33:67:D1
  inet6 addr: fe80::2e0:81ff:fe33:67d1/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 b)  TX bytes:176 (176.0 b)
  Interrupt:29

[EMAIL PROTECTED] ~]#


For iWarp, is using a struct socket essentially any different than 
transitioning an existing socket to RDMA mode?  


In the RFC patch I posted, the socket is _just_ to allow binding to a 
port/addr.  Its not used for anything else.  From the native stack's 
perspective, its a TCP socket in the CLOSED state (but bound) I guess.


You're just requiring it 
to be in a specific state.  Are there problems around doing this?  How 
much harder (technically, as opposed to politically) would it be to take 
this change a step farther and offload an active connection?


By active, do you mean in the ESTABLISHED state?



I left it all in to show the minimal changes needed to implement the 
functionality.  To keep the patch simple for initial consumption.  But 
yes, the rdma-cm really doesn't need to track the port stuff for TCP 
since the host stack does.


Okay - for final patches, I think we want to remove the rdma_cm specific 
port spaces, along with changing the API to clarify that it uses the 
same port space as TCP/UDP.


What do you mean by changing the API? Adding a new port space enum?



I haven't looked in detail at the SDP code, but I would think it 
should want the TCP port space and not its own anwyay, but I'm not 
sure.  What is the point of the SDP port space anyway?


The rdma_cm needs to adjust its protocol for SDP over IB.  I'm not too 
concerned with SDP, since it's not upstream yet, but I don't want to 
break it beyond repair either.  The rdma_cm may not need to 

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-02 Thread Sean Hefty
In the RFC patch I posted, the socket is _just_ to allow binding to a 
port/addr.  Its not used for anything else.  From the native stack's 
perspective, its a TCP socket in the CLOSED state (but bound) I guess.


For RDMA, I think we're somewhere in between binding to an address, 
versus mapping the address.  We map the address to an RDMA device, but 
also use that address in connections.  So, we do a little more than 
simply map the address to a device, but if the address migrates to 
another device, we don't follow it.


I can't really think of any issues that might be caused by this, but I'm 
not sure.  If an app is listening on an address the goes away, would a 
new wildcard listen work?



By active, do you mean in the ESTABLISHED state?


Yes


What do you mean by changing the API? Adding a new port space enum?


I was thinking of replacing the rdma_cm port space enum with something 
like IPPROTO_TCP, but doing that probably doesn't matter.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-02 Thread Steve Wise

Sean Hefty wrote:
In the RFC patch I posted, the socket is _just_ to allow binding to a 
port/addr.  Its not used for anything else.  From the native stack's 
perspective, its a TCP socket in the CLOSED state (but bound) I guess.


For RDMA, I think we're somewhere in between binding to an address, 
versus mapping the address.  We map the address to an RDMA device, but 
also use that address in connections.  So, we do a little more than 
simply map the address to a device, but if the address migrates to 
another device, we don't follow it.


I can't really think of any issues that might be caused by this, but I'm 
not sure.  If an app is listening on an address the goes away, would a 
new wildcard listen work?


no:

[EMAIL PROTECTED] ~]# ifconfig eth1 192.168.69.135 up
[EMAIL PROTECTED] ~]# netserver -L 192.168.69.135 -p  -4
Starting netserver at port 
Starting netserver at hostname 192.168.69.135 port  and family AF_INET
[EMAIL PROTECTED] ~]# ifconfig eth1 0.0.0.0 down
[EMAIL PROTECTED] ~]# netserver -L 0.0.0.0 -p  -4
Starting netserver at port 
set_up_server could not establish a listen endpoint for  port  with 
family AF_INET

[EMAIL PROTECTED] ~]#





By active, do you mean in the ESTABLISHED state?


Yes


Well, excluding the political issues (patents/etc) and the general 
dislike for offload/toe by the linux community, it is technically 
doable, but not trivial.  The host stack would have to keep track of 
which connections are offloaded and which ones aren't.  For instance, it 
should handle (and fail) an app trying to send() on a socket that's in 
rdma mode.  Also the transition logic for pushing active connection to 
an rdma device would be very messy.  In part, this is due to the fact 
that there's no way to freeze the connection while you're offloading it. 
 So the host stack has to deal with incoming data, or outgoing data 
during the transition and pass that stuff to the rdma driver after the 
offload is complete.  Its messy, but doable.  MS Windows supports this 
exact design.  Note the iwarp HW must support this as well.  Ammasso 
doesn't.  Chelsio does.  I _think_ the other up and coming iwarp devices 
also support it because Windows supports it.


But I don't think we should consider this approach. I think we should 
minimally sync up with the native stack, like we are already doing. 
This is already done in a number of ways:


- using the host routing table to determine the local interface to be used.
- the association of a netdev device with an rdma device via the hwaddr.
- netevents that allow an iWARP devices to track neighbour/next hop changes.

The port space is one more that needs to be handled for iWARP.


Steve.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-02 Thread Steve Wise

Sean Hefty wrote:



Okay - for final patches, I think we want to remove the rdma_cm specific 
port spaces, along with changing the API to clarify that it uses the 
same port space as TCP/UDP.




If we get rid of the rdma_cm specific port spaces, do we then reduce the 
 valid possible spaces to just TCP and UDP?  Or what?  In the sockets 
paradigm, the socket is explicitly bound to a protocol space when its 
created (based on the protocol id).  Do you think we need to change the 
rdma_cm_id to have such a concept?  IE when you create the cm_id, you 
say your intended QP type or port space?  The current API lends itself 
to somone incorrectly choosing a port space, by the way.


But should we really change the API that drastically?  Or just keep the 
port spaces and make PS_TCP share the host's port space.


The only applications that really need this port space are apps that run 
on iWARP only or want to support both iWARP and IB via the 
transport-independent verbs and rdma-cm.  So things like IPoIB probably 
shouldn't use or need the TCP port space.


Maybe the rdma-cm port spaces should really be IB, IWARP, or BOTH.  IB 
has its own port space, and IWARP or BOTH gets the TCP port space.


thoughts?

Steve.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-02 Thread Sean Hefty
If we get rid of the rdma_cm specific port spaces, do we then reduce the 
 valid possible spaces to just TCP and UDP?  Or what?  In the sockets 
paradigm, the socket is explicitly bound to a protocol space when its 
created (based on the protocol id).  Do you think we need to change the 
rdma_cm_id to have such a concept?  IE when you create the cm_id, you 
say your intended QP type or port space?  The current API lends itself 
to somone incorrectly choosing a port space, by the way.


Currently, the RDMA port space implies the QP type (RC or UD).  We're 
not tied to any specific protocol when we create the rdma_cm_id, since 
we don't know what type of RDMA device we'll end up using.  So, I don't 
think we want users to specify a protocol.


But should we really change the API that drastically?  Or just keep the 
port spaces and make PS_TCP share the host's port space.


I don't want to break the user space API, if it can be helped.  SDP is 
kind of a problem, in that the rdma_cm needs to distinguish between SDP 
as a user, versus someone using RDMA_PS_TCP.  SDP maps between the RDMA 
port space and real TCP port space.  I need to get some details on how 
SDP uses the rdma_cm, like whether it uses wild card port numbers.


Maybe the rdma-cm port spaces should really be IB, IWARP, or BOTH.  IB 
has its own port space, and IWARP or BOTH gets the TCP port space.


I thought about doing something like this, but I'm not sure there would 
be much use of just IB or just iWarp, when the user can specify both. 
 I even considered pushing the problem into the iWarp CM, but that 
seems like a more complex implementation with no benefit unless there 
are users of just an IB port space.


At this point, my thoughts are to take your original patch, remove the 
rdma_cm port space structures and functions, and figure out how to 
handle SDP.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-01 Thread Sean Hefty
Consider NFS and NFS-RDMA.  The NFS gurus struggled with this very issue 
and concluded that the RDMA service needs to be on a separate port. Thus 
they are proposing a new netid/port number for doing RDMA mounts vs 
TCP/UDP mounts.  IMO that is the correct way to go:  RDMA services are 
different that tcp services.  They use a different protocol on top of 
TCP and thus shouldn't be handled on the same TCP port.  So, 
applications that want to service Sockets and RDMA services concurrently 
would do so by listening on different ports...


This is a good point, and a different view from what I've been taking. 
I was looking at it more like trying to provide the same service over 
UDP and TCP, where you use the same port number.  I just can't come up 
with any solution that works for iWarp, and sharing the port space seems 
like the only way to fix things.


The iWARP protocols don't include a UDP based service, so it is not 
needed.  But if you're calling it a UDP port space, maybe it should be 
the host's port space?


I think it should match what's done for TCP.  IMO, there should be a 
connectionless RDMA service, along with multicast, over UDP/IP/Ethernet.  :)


Yes.  The only exports interfaces into the host port allocation stuff 
requires a socket struct.  I didn't want to try and tackle exporting the 
port allocation services at a lower level.  Even at the bottom level, I 
think it still assumes a socket struct...


I looked at this too at one point, and gave up as well.  I don't know 
what other assumptions are made in the stack as a result of this.  For 
example, if an app binds to an IP and port, and the IP address is 
removed and re-added, is the port still valid/reserved?


For iWarp, is using a struct socket essentially any different than 
transitioning an existing socket to RDMA mode?  You're just requiring it 
to be in a specific state.  Are there problems around doing this?  How 
much harder (technically, as opposed to politically) would it be to take 
this change a step farther and offload an active connection?


I left it all in to show the minimal changes needed to implement the 
functionality.  To keep the patch simple for initial consumption.  But 
yes, the rdma-cm really doesn't need to track the port stuff for TCP 
since the host stack does.


Okay - for final patches, I think we want to remove the rdma_cm specific 
port spaces, along with changing the API to clarify that it uses the 
same port space as TCP/UDP.


I haven't looked in detail at the SDP code, but I would think it should 
want the TCP port space and not its own anwyay, but I'm not sure.  What 
is the point of the SDP port space anyway?


The rdma_cm needs to adjust its protocol for SDP over IB.  I'm not too 
concerned with SDP, since it's not upstream yet, but I don't want to 
break it beyond repair either.  The rdma_cm may not need to manage the 
SDP port space at all, and instead rely on SDP to ensure that it 
provides unique port numbers by itself.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general