Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
The hack to use a socket and bind it to claim the port was just for demostrating the idea. The correct solution, IMO, is to enhance the core low level 4-tuple allocation services to be more generic (eg: not be tied to a struct sock). Then the host tcp stack and the host rdma stack can allocate TCP/iWARP ports/4tuples from this common exported service and share the port space. This allocation service could also be used by other deep adapters like iscsi adapters if needed. Since iWarp runs on top of TCP, the port space is really the same. FWIW, I agree that this proposal is the correct solution to support iWarp. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Sean Hefty [EMAIL PROTECTED] Date: Wed, 10 Oct 2007 14:01:07 -0700 The hack to use a socket and bind it to claim the port was just for demostrating the idea. The correct solution, IMO, is to enhance the core low level 4-tuple allocation services to be more generic (eg: not be tied to a struct sock). Then the host tcp stack and the host rdma stack can allocate TCP/iWARP ports/4tuples from this common exported service and share the port space. This allocation service could also be used by other deep adapters like iscsi adapters if needed. Since iWarp runs on top of TCP, the port space is really the same. FWIW, I agree that this proposal is the correct solution to support iWarp. But you can be sure it's not going to happen, sorry. It would mean that we'd need to export the entire TCP socket table so then when iWARP connections are created you can search to make sure there is not an existing full 4-tuple that is the same. It is not just about local TCP ports. iWARP needs to live in it's seperate little container and not contaminate the rest of the networking, this is the deal. Any suggested such change which breaks that deal will be NACK'd by all of the core networking developers. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
On Mon, 8 Oct 2007, Steve Wise wrote: The correct solution, IMO, is to enhance the core low level 4-tuple allocation services to be more generic (eg: not be tied to a struct sock). Then the host tcp stack and the host rdma stack can allocate TCP/iWARP ports/4tuples from this common exported service and share the port space. This allocation service could also be used by other deep adapters like iscsi adapters if needed. As a developer of an RDMA ULP, NFS-RDMA, I like this approach because it will simplify the configuration of an RDMA device and the services that use it. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
David Miller wrote: From: Sean Hefty [EMAIL PROTECTED] Date: Thu, 09 Aug 2007 14:40:16 -0700 Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. These are exactly the kinds of problems for which people like myself were dreading. These subsystems have no buisness using the TCP port space of the Linux software stack, absolutely none. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Hey Dave, The hack to use a socket and bind it to claim the port was just for demostrating the idea. The correct solution, IMO, is to enhance the core low level 4-tuple allocation services to be more generic (eg: not be tied to a struct sock). Then the host tcp stack and the host rdma stack can allocate TCP/iWARP ports/4tuples from this common exported service and share the port space. This allocation service could also be used by other deep adapters like iscsi adapters if needed. Will you NAK such a solution if I go implement it and submit for review? The dual ip subnet solution really sux, and I'm trying one more time to see if you will entertain the common port space solution, if done correctly. Thanks, Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sorry for the long latency, I was at the beach all last week. And direct data placement really does give you a factor of two at least, because otherwise you're stuck receiving the data in one buffer, looking at some of the data at least, and then figuring out where to copy it. And memory bandwidth is if anything becoming more valuable; maybe LRO + header splitting + page remapping tricks can get you somewhere but as NCPUS grows then it seems the TLB shootdown cost of page flipping is only going to get worse. As Herbert has said already, people can code for this just like they have to code for RDMA. No argument, you need to change the interface to take advantage of RDMA. There is no fundamental difference from converting an application to sendfile or similar. Yes, on the transmit side, there's not much difference from sendfile or splice, although RDMA may give a slightly nicer interface that also gives basically the equivalent of AIO. The only thing this needs is a recvmsg_I_dont_care_where_the_data_is() call. There are no alignment issues unless you are trying to push this data directly into the page cache. I don't understand how this gives you the same thing as direct data placement (DDP). There are many situations where the sender knows where the data has to go and if there's some way to pass that to the receiver, so that info can be used in the receive path to put the data in the right place, the receiver can save a copy. This is fundamentally the same offload that an FC HBA does -- the SCSI midlayer queues up commands like read block A and put the data at address X and read block B and put the data at address Y and the HBA matches tags on incoming data to put the blocks at the right addresses, even if block B is received before block A. RFC 4297 has some discussion of the various approaches, and while you might not agree with their conclusions, it is interesting reading. Couple this with a card that makes sure that on a per-page basis, only data for a particular flow (or group of flows) will accumulate. It seems that the NIC would also have to look into a TCP stream (and handle out of order segments etc) to find message boundaries for this to be equivalent to what an RDMA NIC does. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier [EMAIL PROTECTED] Date: Tue, 28 Aug 2007 12:38:07 -0700 It seems that the NIC would also have to look into a TCP stream (and handle out of order segments etc) to find message boundaries for this to be equivalent to what an RDMA NIC does. It would work for data that accumulates in-order, give or take a small window, just like LRO does. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier [EMAIL PROTECTED] Date: Mon, 20 Aug 2007 18:16:54 -0700 And direct data placement really does give you a factor of two at least, because otherwise you're stuck receiving the data in one buffer, looking at some of the data at least, and then figuring out where to copy it. And memory bandwidth is if anything becoming more valuable; maybe LRO + header splitting + page remapping tricks can get you somewhere but as NCPUS grows then it seems the TLB shootdown cost of page flipping is only going to get worse. As Herbert has said already, people can code for this just like they have to code for RDMA. There is no fundamental difference from converting an application to sendfile or similar. The only thing this needs is a recvmsg_I_dont_care_where_the_data_is() call. There are no alignment issues unless you are trying to push this data directly into the page cache. Couple this with a card that makes sure that on a per-page basis, only data for a particular flow (or group of flows) will accumulate. People already make cards that can do stuff like this, it can be done statelessly with an on-chip dynamically maintained flow table. And best yet it doesn't turn off every feature in the networking nor bypass it for the actual protocol processing. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Isn't RDMA _part_ of the software net stack within Linux? It very much is not so. This is just nit-picking. You can draw the boundary of the software net stack wherever you want, but I think Sean's point was just that RDMA drivers already are part of Linux, and we all want them to get better. When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. Same thing with TSO and LRO and who knows what else. I know you're going to make a distinction between stateless and stateful offloads, but really it's just an arbitrary distinction between things you like and things you don't. Imagine if you didn't know any of this, you purchase and begin to deploy a huge piece of RDMA infrastructure, you then get the mandate from IT that you need to add firewalling on the RDMA connections at the host level, and oh shit you can't? It's ironic that you bring up firewalling. I've had vendors of iWARP hardware tell me they would *love* to work with the community to make firewalling work better for RDMA connections. But instead we get the catch-22 of your changing arguments -- first, you won't even consider changes that might help RDMA work better in the name of maintainability; then you have to protect poor, ignorant users from accidentally using RDMA because of some problem or another; and then when someone tries to fix some of the problems you mention, it's back to step one. Obviously some decisions have been prejudged here, so I guess this moves to the realm of politics. I have plenty of interesting technical stuff, so I'll leave it to the people with a horse in the race to find ways to twist your arm. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 12:52:39 -0700 When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. Same thing with TSO and LRO and who knows what else. Not true at all. Full classification and filtering still is usable with TSO and LRO. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. Same thing with TSO and LRO and who knows what else. Not true at all. Full classification and filtering still is usable with TSO and LRO. Well, obviously with TSO and LRO the packets that the stack sends or receives are not the same as what's on the wire. Whether that breaks your wonderful networking facilities or not depends on the specifics of the particular facility I guess -- for example shaping is clearly broken by TSO. (And people can wonder what the packet trains TSO creates do to congestion control on the internet, but the netdev crowd has already decided that TSO is good and RDMA is bad) - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 16:31:07 -0700 When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. Same thing with TSO and LRO and who knows what else. Not true at all. Full classification and filtering still is usable with TSO and LRO. Well, obviously with TSO and LRO the packets that the stack sends or receives are not the same as what's on the wire. Whether that breaks your wonderful networking facilities or not depends on the specifics of the particular facility I guess -- for example shaping is clearly broken by TSO. (And people can wonder what the packet trains TSO creates do to congestion control on the internet, but the netdev crowd has already decided that TSO is good and RDMA is bad) This is also a series of falsehoods. All packet filtering, queue management, and packet scheduling facilities work perfectly fine and as designed with both LRO and TSO. When problems come up, they are bugs, and we fix them. Please stop spreading this FUD about TSO and LRO. The fact is that RDMA bypasses the whole stack so that supporting these facilities is not even _POSSIBLE_. With stateless offloads it is possible to support all of these facilities, and we do. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote: [...snip...] I think removing the RDMA stack is the wrong thing to do, and you shouldn't just threaten to yank entire subsystems because you don't like the technology. Lets keep this constructive, can we? RDMA should get the respect of any other technology in Linux. Maybe its a niche in your opinion, but come on, there's more RDMA users than say, the sparc64 port. Eh? It's not about being a niche. It's about creating a maintainable software net stack that has predictable behavior. Isn't RDMA _part_ of the software net stack within Linux? Why isn't making RDMA stable, supportable and maintainable equally as important as any other subsystem? Needing to reach out of the RDMA sandbox and reserve net stack resources away from itself travels a path we've consistently avoided. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Got it. Ditto for me as well. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Tom Tucker [EMAIL PROTECTED] Date: Thu, 16 Aug 2007 08:43:11 -0500 Isn't RDMA _part_ of the software net stack within Linux? It very much is not so. When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. I'm glad this is a surprise to you, because it illustrates the point some of us keep trying to make about technologies like this. Imagine if you didn't know any of this, you purchase and begin to deploy a huge piece of RDMA infrastructure, you then get the mandate from IT that you need to add firewalling on the RDMA connections at the host level, and oh shit you can't? This is why none of us core networking developers like RDMA at all. It's totally not integrated with the rest of the Linux stack and on top of that it even gets in the way. It's an abberation, an eye sore, and a constant source of consternation. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
David Miller wrote: From: Sean Hefty [EMAIL PROTECTED] Date: Thu, 09 Aug 2007 14:40:16 -0700 Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. I think removing the RDMA stack is the wrong thing to do, and you shouldn't just threaten to yank entire subsystems because you don't like the technology. Lets keep this constructive, can we? RDMA should get the respect of any other technology in Linux. Maybe its a niche in your opinion, but come on, there's more RDMA users than say, the sparc64 port. Eh? These are exactly the kinds of problems for which people like myself were dreading. These subsystems have no buisness using the TCP port space of the Linux software stack, absolutely none. Ok, although IMO its the correct solution. But I'll propose other solutions below. I ask for your feedback (and everyones!) on these alternate solutions. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. The only other change requested and commited, if I recall correctly, was for netevents, and that enabled both Infiniband and iWARP to integrate with the neighbour subsystem. I think that was a useful and needed change. Prior to that, these subsystems were snooping ARP replies to trigger events. That was back in 2.6.18 or 2.6.19 I think... I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Got it. Here are alternate solutions that avoid the need to share the port space: Solution 1) 1) admins must setup an alias interface on the iwarp device for use with rdma. This interface will have to be a separate subnet from the TCP used interface. And with a canonical name that indicates its for rdma only. Like eth2:iw or eth2:rdma. There can be many of these per device. 2) admins make sure their sockets/tcp services don't use the interface configured in #1, and their rdma service do use said interface. 3) iwarp providers must translation binds to ipaddr 0.0.0.0 to the associated for rdma only ip addresses. They can do this by searching for all aliases of the canonical name that are aliases of the TCP interface for their nic device. Or: somehow not handle incoming connections to any address but the for rdma use addresses and instead pass them up and not offload them. This will avoid the collisions as long as the above steps are followed. Solution 2) Another possibility would be for the driver to create two net devices (and hence two interace names) like eth2 and iw2, and artificially separate the RDMA stuff that way. These two solutions are similar in that they create a rdma only interface. Pros: - is not intrusive into the core networking code - very minimal changes needed and in the iwarp provider's code, who are the ones with this problem - makes it clear which subnets are RDMA only Cons: - relies on system admin to set it up correctly. - native stack can still use this rdma-only interface and the same port space issue will exist. For the record, here are possible port-sharing solutions Dave sez he'll NAK: Solution NAK-1) The rdma-cma just allocates a socket and binds it to reserve TCP ports. Pros: - minimal changes needed to implement (always a plus in my mind :) - simple, clean, and it works (KISS) - if no RDMA is in use, there is no impact on the native stack - no need for a seperate RDMA interface Cons: - wastes memory - puts a TCP socket in the CLOSED state in the pcb tables. - Dave will NAK it :) Solution NAK-2) Create a low-level sockets-agnostic port allocation service that is shared by both TCP and RDMA. This way, the rdma-cm can reserve ports in an efficient manor instead of doing it via kernel_bind() using a sock struct. Pros: - probably the correct solution (my opinion :) if we went down the path of sharing port space - if no RDMA is in use, there is no impact on the native stack - no need for a separate RDMA interface Cons: - very intrusive change because the port allocations stuff is tightly bound to the host stack and sock struct, etc. - Dave will NAK it :) Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Steve Wise wrote: David Miller wrote: From: Sean Hefty [EMAIL PROTECTED] Date: Thu, 09 Aug 2007 14:40:16 -0700 Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. I think removing the RDMA stack is the wrong thing to do, and you shouldn't just threaten to yank entire subsystems because you don't like the technology. Lets keep this constructive, can we? RDMA should get the respect of any other technology in Linux. Maybe its a niche in your opinion, but come on, there's more RDMA users than say, the sparc64 port. Eh? It's not about being a niche. It's about creating a maintainable software net stack that has predictable behavior. Needing to reach out of the RDMA sandbox and reserve net stack resources away from itself travels a path we've consistently avoided. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Got it. Ditto for me as well. Jeff ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Needing to reach out of the RDMA sandbox and reserve net stack resources away from itself travels a path we've consistently avoided. Where did the idea of an RDMA sandbox come from? Obviously no one disagrees with keeping things clean and maintainable, but the idea that RDMA is a second-class citizen that doesn't get any input into the evolution of the networking code seems kind of offensive to me. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Sean Hefty [EMAIL PROTECTED] Date: Thu, 09 Aug 2007 14:40:16 -0700 Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. These are exactly the kinds of problems for which people like myself were dreading. These subsystems have no buisness using the TCP port space of the Linux software stack, absolutely none. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. There are currently two RDMA solutions available. Each solution has different requirements and uses the normal network stack differently. Infiniband uses its own transport. iWarp runs over TCP. We have tried to leverage the existing infrastructure where it makes sense. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. Currently, the RDMA stack uses its own port space. This causes a problem for iWarp, and is what Steve is looking for a solution for. I'm not an iWarp guru, so I don't know what options exist. Can iWarp use its own address family? Identify specific IP addresses for iWarp use? Restrict iWarp to specific port numbers? Let the app control the correct operation? I don't know. Steve merely defined a problem and suggested a possible solution. He's looking for constructive help trying to solve the problem. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sean Hefty wrote: Lemme know how I can help. I certainly can test any patches on my 8 node iwarp cluster. We should probably take the idea to netdev before making any substantial changes to the code. - Sean Yup. Should I post my RFC patch and we'll go from there? Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Should I post my RFC patch and we'll go from there? Sounds good to me. Roland, do you have any opinion? - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sean Hefty wrote: If we get rid of the rdma_cm specific port spaces, do we then reduce the valid possible spaces to just TCP and UDP? Or what? In the sockets paradigm, the socket is explicitly bound to a protocol space when its created (based on the protocol id). Do you think we need to change the rdma_cm_id to have such a concept? IE when you create the cm_id, you say your intended QP type or port space? The current API lends itself to somone incorrectly choosing a port space, by the way. Currently, the RDMA port space implies the QP type (RC or UD). We're not tied to any specific protocol when we create the rdma_cm_id, since we don't know what type of RDMA device we'll end up using. So, I don't think we want users to specify a protocol. But should we really change the API that drastically? Or just keep the port spaces and make PS_TCP share the host's port space. I don't want to break the user space API, if it can be helped. SDP is kind of a problem, in that the rdma_cm needs to distinguish between SDP as a user, versus someone using RDMA_PS_TCP. SDP maps between the RDMA port space and real TCP port space. I need to get some details on how SDP uses the rdma_cm, like whether it uses wild card port numbers. Maybe the rdma-cm port spaces should really be IB, IWARP, or BOTH. IB has its own port space, and IWARP or BOTH gets the TCP port space. I thought about doing something like this, but I'm not sure there would be much use of just IB or just iWarp, when the user can specify both. I even considered pushing the problem into the iWarp CM, but that seems like a more complex implementation with no benefit unless there are users of just an IB port space. At this point, my thoughts are to take your original patch, remove the rdma_cm port space structures and functions, and figure out how to handle SDP. Lemme know how I can help. I certainly can test any patches on my 8 node iwarp cluster. Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Lemme know how I can help. I certainly can test any patches on my 8 node iwarp cluster. We should probably take the idea to netdev before making any substantial changes to the code. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sean Hefty wrote: Consider NFS and NFS-RDMA. The NFS gurus struggled with this very issue and concluded that the RDMA service needs to be on a separate port. Thus they are proposing a new netid/port number for doing RDMA mounts vs TCP/UDP mounts. IMO that is the correct way to go: RDMA services are different that tcp services. They use a different protocol on top of TCP and thus shouldn't be handled on the same TCP port. So, applications that want to service Sockets and RDMA services concurrently would do so by listening on different ports... This is a good point, and a different view from what I've been taking. I was looking at it more like trying to provide the same service over UDP and TCP, where you use the same port number. I just can't come up with any solution that works for iWarp, and sharing the port space seems like the only way to fix things. The iWARP protocols don't include a UDP based service, so it is not needed. But if you're calling it a UDP port space, maybe it should be the host's port space? I think it should match what's done for TCP. IMO, there should be a connectionless RDMA service, along with multicast, over UDP/IP/Ethernet. :) I think the winner would really be a reliable connectionless RDMA service with mcast. Yes. The only exports interfaces into the host port allocation stuff requires a socket struct. I didn't want to try and tackle exporting the port allocation services at a lower level. Even at the bottom level, I think it still assumes a socket struct... I looked at this too at one point, and gave up as well. I don't know what other assumptions are made in the stack as a result of this. For example, if an app binds to an IP and port, and the IP address is removed and re-added, is the port still valid/reserved? I just tried this and I believe the application is still listening/bound even though the address is no longer valid for the host: [EMAIL PROTECTED] ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:E0:81:33:67:D1 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:29 [EMAIL PROTECTED] ~]# netserver -L 192.168.69.135 -p -4 Starting netserver at port set_up_server could not establish a listen endpoint for port with family AF_INET [EMAIL PROTECTED] ~]# ifconfig eth1 192.168.69.135 up [EMAIL PROTECTED] ~]# netserver -L 192.168.69.135 -p -4 Starting netserver at port Starting netserver at hostname 192.168.69.135 port and family AF_INET [EMAIL PROTECTED] ~]# netstat -an|grep tcp0 0 192.168.69.135: 0.0.0.0:* LISTEN [EMAIL PROTECTED] ~]# ifconfig eth1 0.0.0.0 [EMAIL PROTECTED] ~]# netstat -an|grep tcp0 0 192.168.69.135: 0.0.0.0:* LISTEN [EMAIL PROTECTED] ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:E0:81:33:67:D1 inet6 addr: fe80::2e0:81ff:fe33:67d1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:176 (176.0 b) Interrupt:29 [EMAIL PROTECTED] ~]# For iWarp, is using a struct socket essentially any different than transitioning an existing socket to RDMA mode? In the RFC patch I posted, the socket is _just_ to allow binding to a port/addr. Its not used for anything else. From the native stack's perspective, its a TCP socket in the CLOSED state (but bound) I guess. You're just requiring it to be in a specific state. Are there problems around doing this? How much harder (technically, as opposed to politically) would it be to take this change a step farther and offload an active connection? By active, do you mean in the ESTABLISHED state? I left it all in to show the minimal changes needed to implement the functionality. To keep the patch simple for initial consumption. But yes, the rdma-cm really doesn't need to track the port stuff for TCP since the host stack does. Okay - for final patches, I think we want to remove the rdma_cm specific port spaces, along with changing the API to clarify that it uses the same port space as TCP/UDP. What do you mean by changing the API? Adding a new port space enum? I haven't looked in detail at the SDP code, but I would think it should want the TCP port space and not its own anwyay, but I'm not sure. What is the point of the SDP port space anyway? The rdma_cm needs to adjust its protocol for SDP over IB. I'm not too concerned with SDP, since it's not upstream yet, but I don't want to break it beyond repair either. The rdma_cm may not need to
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
In the RFC patch I posted, the socket is _just_ to allow binding to a port/addr. Its not used for anything else. From the native stack's perspective, its a TCP socket in the CLOSED state (but bound) I guess. For RDMA, I think we're somewhere in between binding to an address, versus mapping the address. We map the address to an RDMA device, but also use that address in connections. So, we do a little more than simply map the address to a device, but if the address migrates to another device, we don't follow it. I can't really think of any issues that might be caused by this, but I'm not sure. If an app is listening on an address the goes away, would a new wildcard listen work? By active, do you mean in the ESTABLISHED state? Yes What do you mean by changing the API? Adding a new port space enum? I was thinking of replacing the rdma_cm port space enum with something like IPPROTO_TCP, but doing that probably doesn't matter. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sean Hefty wrote: In the RFC patch I posted, the socket is _just_ to allow binding to a port/addr. Its not used for anything else. From the native stack's perspective, its a TCP socket in the CLOSED state (but bound) I guess. For RDMA, I think we're somewhere in between binding to an address, versus mapping the address. We map the address to an RDMA device, but also use that address in connections. So, we do a little more than simply map the address to a device, but if the address migrates to another device, we don't follow it. I can't really think of any issues that might be caused by this, but I'm not sure. If an app is listening on an address the goes away, would a new wildcard listen work? no: [EMAIL PROTECTED] ~]# ifconfig eth1 192.168.69.135 up [EMAIL PROTECTED] ~]# netserver -L 192.168.69.135 -p -4 Starting netserver at port Starting netserver at hostname 192.168.69.135 port and family AF_INET [EMAIL PROTECTED] ~]# ifconfig eth1 0.0.0.0 down [EMAIL PROTECTED] ~]# netserver -L 0.0.0.0 -p -4 Starting netserver at port set_up_server could not establish a listen endpoint for port with family AF_INET [EMAIL PROTECTED] ~]# By active, do you mean in the ESTABLISHED state? Yes Well, excluding the political issues (patents/etc) and the general dislike for offload/toe by the linux community, it is technically doable, but not trivial. The host stack would have to keep track of which connections are offloaded and which ones aren't. For instance, it should handle (and fail) an app trying to send() on a socket that's in rdma mode. Also the transition logic for pushing active connection to an rdma device would be very messy. In part, this is due to the fact that there's no way to freeze the connection while you're offloading it. So the host stack has to deal with incoming data, or outgoing data during the transition and pass that stuff to the rdma driver after the offload is complete. Its messy, but doable. MS Windows supports this exact design. Note the iwarp HW must support this as well. Ammasso doesn't. Chelsio does. I _think_ the other up and coming iwarp devices also support it because Windows supports it. But I don't think we should consider this approach. I think we should minimally sync up with the native stack, like we are already doing. This is already done in a number of ways: - using the host routing table to determine the local interface to be used. - the association of a netdev device with an rdma device via the hwaddr. - netevents that allow an iWARP devices to track neighbour/next hop changes. The port space is one more that needs to be handled for iWARP. Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sean Hefty wrote: Okay - for final patches, I think we want to remove the rdma_cm specific port spaces, along with changing the API to clarify that it uses the same port space as TCP/UDP. If we get rid of the rdma_cm specific port spaces, do we then reduce the valid possible spaces to just TCP and UDP? Or what? In the sockets paradigm, the socket is explicitly bound to a protocol space when its created (based on the protocol id). Do you think we need to change the rdma_cm_id to have such a concept? IE when you create the cm_id, you say your intended QP type or port space? The current API lends itself to somone incorrectly choosing a port space, by the way. But should we really change the API that drastically? Or just keep the port spaces and make PS_TCP share the host's port space. The only applications that really need this port space are apps that run on iWARP only or want to support both iWARP and IB via the transport-independent verbs and rdma-cm. So things like IPoIB probably shouldn't use or need the TCP port space. Maybe the rdma-cm port spaces should really be IB, IWARP, or BOTH. IB has its own port space, and IWARP or BOTH gets the TCP port space. thoughts? Steve. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
If we get rid of the rdma_cm specific port spaces, do we then reduce the valid possible spaces to just TCP and UDP? Or what? In the sockets paradigm, the socket is explicitly bound to a protocol space when its created (based on the protocol id). Do you think we need to change the rdma_cm_id to have such a concept? IE when you create the cm_id, you say your intended QP type or port space? The current API lends itself to somone incorrectly choosing a port space, by the way. Currently, the RDMA port space implies the QP type (RC or UD). We're not tied to any specific protocol when we create the rdma_cm_id, since we don't know what type of RDMA device we'll end up using. So, I don't think we want users to specify a protocol. But should we really change the API that drastically? Or just keep the port spaces and make PS_TCP share the host's port space. I don't want to break the user space API, if it can be helped. SDP is kind of a problem, in that the rdma_cm needs to distinguish between SDP as a user, versus someone using RDMA_PS_TCP. SDP maps between the RDMA port space and real TCP port space. I need to get some details on how SDP uses the rdma_cm, like whether it uses wild card port numbers. Maybe the rdma-cm port spaces should really be IB, IWARP, or BOTH. IB has its own port space, and IWARP or BOTH gets the TCP port space. I thought about doing something like this, but I'm not sure there would be much use of just IB or just iWarp, when the user can specify both. I even considered pushing the problem into the iWarp CM, but that seems like a more complex implementation with no benefit unless there are users of just an IB port space. At this point, my thoughts are to take your original patch, remove the rdma_cm port space structures and functions, and figure out how to handle SDP. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Consider NFS and NFS-RDMA. The NFS gurus struggled with this very issue and concluded that the RDMA service needs to be on a separate port. Thus they are proposing a new netid/port number for doing RDMA mounts vs TCP/UDP mounts. IMO that is the correct way to go: RDMA services are different that tcp services. They use a different protocol on top of TCP and thus shouldn't be handled on the same TCP port. So, applications that want to service Sockets and RDMA services concurrently would do so by listening on different ports... This is a good point, and a different view from what I've been taking. I was looking at it more like trying to provide the same service over UDP and TCP, where you use the same port number. I just can't come up with any solution that works for iWarp, and sharing the port space seems like the only way to fix things. The iWARP protocols don't include a UDP based service, so it is not needed. But if you're calling it a UDP port space, maybe it should be the host's port space? I think it should match what's done for TCP. IMO, there should be a connectionless RDMA service, along with multicast, over UDP/IP/Ethernet. :) Yes. The only exports interfaces into the host port allocation stuff requires a socket struct. I didn't want to try and tackle exporting the port allocation services at a lower level. Even at the bottom level, I think it still assumes a socket struct... I looked at this too at one point, and gave up as well. I don't know what other assumptions are made in the stack as a result of this. For example, if an app binds to an IP and port, and the IP address is removed and re-added, is the port still valid/reserved? For iWarp, is using a struct socket essentially any different than transitioning an existing socket to RDMA mode? You're just requiring it to be in a specific state. Are there problems around doing this? How much harder (technically, as opposed to politically) would it be to take this change a step farther and offload an active connection? I left it all in to show the minimal changes needed to implement the functionality. To keep the patch simple for initial consumption. But yes, the rdma-cm really doesn't need to track the port stuff for TCP since the host stack does. Okay - for final patches, I think we want to remove the rdma_cm specific port spaces, along with changing the API to clarify that it uses the same port space as TCP/UDP. I haven't looked in detail at the SDP code, but I would think it should want the TCP port space and not its own anwyay, but I'm not sure. What is the point of the SDP port space anyway? The rdma_cm needs to adjust its protocol for SDP over IB. I'm not too concerned with SDP, since it's not upstream yet, but I don't want to break it beyond repair either. The rdma_cm may not need to manage the SDP port space at all, and instead rely on SDP to ensure that it provides unique port numbers by itself. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general