The hack to use a socket and bind it to claim the port was just for
demostrating the idea. The correct solution, IMO, is to enhance the
core low level 4-tuple allocation services to be more generic (eg: not
be tied to a struct sock). Then the host tcp stack and the host rdma
stack can
Umm... this is a difficult situation for me to merge the changes then.
We're changing the CM retry behavior blind here. How do we know that
the MRA changes don't make the scalability issue worse?
What's currently upstream doesn't work for Intel MPI on our larger clusters.
The connection requests
Kanevsky, Arkady wrote:
Exactly,
it forces the burden on administrator.
And one will be forced to try one mount for iWARP and it does not
work issue another one TCP or UDP if it fails.
Yack!
And server will need to listen on different IP address and simple
* will not work since it will need to
The sysadmin creates for iwarp use only alias interfaces of the form
devname:iw* where devname is the native interface name (eg eth0) for the
iwarp netdev device. The alias label can be anything starting with iw.
The iw immediately after the ':' is the key used by the iw_cxgb3 driver.
I'm
What is the model on how client connects, say for iSCSI,
when client and server both support, iWARP and 10GbE or 1GbE,
and would like to setup most performant connection for ULP?
For the most performance connection, the ULP would use IB, and all these
problems go away. :)
This proposal is for
It is ok to block while holding a mutex, yes?
It's okay, I just didn't try to trace through the code to see if it ever tries
to acquire the same mutex in the thread that needs to signal the event.
- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message
If an application is calling rdma_resolve_ip() and a status of -ENODATA is
returned from addr_resolve_local/remote(), the timeout mechanism waits until
the application's timeout occurs before rechecking the address resolution
status; the application will wait until it's full timeout occurs.
+addr = kmalloc(sizeof *addr, GFP_KERNEL);
As a small nitpick: this wants to be sizeof(struct in_ifaddr)
See chapter 14 of CodingStyle document. kmalloc(sizeof *addr... is correct.
- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to
OK -- just to make sure I'm understanding what you're saying: have you
confirmed that your proposed patches actually fix the issue?
Not directly. I cannot easily test kernel patches on our larger, production
clusters. We've seen the issue with specific applications on 512 and 1024
cores, but
- My user_mad P_Key index support patch. I'll test the ioctl to
change to the new mode and merge this I guess, since Hal and Sean
have tested this out.
I can give this patch a reviewed-by: too, and I will also try to review a couple
of the pending ipoib patches.
- Sean's QoS changes.
The iWARP driver must translate all listens on address 0.0.0.0 to the
set of rdma-only ip addresses for the device in question. This prevents
incoming connect requests to the TCP ipaddresses from going up the
rdma stack.
I've only given this a high level review at this point, and while the
ND, rdma address resolution fails in the presence of
dropped arp bcast packets.
Signed-off-by: Steve Wise [EMAIL PROTECTED]
Acked-by: Sean Hefty [EMAIL PROTECTED]
Roland - can you please queue this up for 2.6.24?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body
Just be realistic and accept that RDMA is a point in time solution,
and like any other such technology takes flexibility away from users.
All technologies are just point in time solutions. While management is
important, shouldn't the customers decide how important it is relative to their
It's not about being a niche. It's about creating a maintainable
software net stack that has predictable behavior.
Needing to reach out of the RDMA sandbox and reserve net stack resources
away from itself travels a path we've consistently avoided.
We need to ensure that we're also creating a
Steve Wise wrote:
Any more comments?
Does anyone have ideas on how to reserve the port space without using a
struct socket?
- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
How about we just remove the RDMA stack altogether? I am not at all
kidding. If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable. We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.
Steve Wise wrote:
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index d294bbc..83f84ef 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -32,6 +32,7 @@ #include linux/mutex.h
#include linux/inetdevice.h
#include linux/workqueue.h
Er...no. It will lose this event. Depending on the event...the carnage
varies. We'll take a look at this.
This behavior is consistent with the Infiniband CM (see
drivers/infiniband/core/cm.c function cm_recv_handler()). But I think
we should at least log an error because a lost event will
Steve Wise wrote:
+int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt)
+{
+ struct iwcm_id_private *cm_id_priv;
+ unsigned long flags;
+ int ret = 0;
+
+ cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+ /* Wait if we're currently in a connect or
Mainly nits...
Steve Wise wrote:
-static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
+int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
unsigned char *dst_dev_addr)
Might want to rename this to something like rdma_copy_addr if
Steve Wise wrote:
+/*
+ * Release a reference on cm_id. If the last reference is being removed
+ * and iw_destroy_cm_id is waiting, wake up the waiting thread.
+ */
+static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
+{
+ int ret = 0;
+
+
Roland Dreier wrote:
+struct workqueue_struct *rdma_wq;
+EXPORT_SYMBOL(rdma_wq);
Sean, I don't think I saw an answer when I asked you this before. Why
is ib_addr exporting a workqueue? Is there some sort of ordering
constraint that is forcing other modules to go through the same
workqueue
The ib_addr module depends on CONFIG_INET, because it uses symbols
like arp_tbl, which are only exported if INET is enabled.
I fixed this up by creating a new (non-user-visible) config symbol to
control when ib_addr is built -- I put the following diff on top of
your patch in my tree:
Thanks!
Provide common handling for marshalling data between userspace clients
and kernel mode Infiniband drivers.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core
Extend matching connection requests to listens in the Infiniband CM to include
private data checks.
This allows applications to listen on the same service identifier, with private
data directing the request to the appropriate application.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff
Export ip_dev_find to allow locating a net_device given an IP address.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/net/ipv4/fib_frontend.c
linux-2.6.ib/net/ipv4/fib_frontend.c
--- linux-2.6.git/net/ipv4/fib_frontend.c
Add an address translation service that maps IP addresses to Infiniband
GID addresses using IPoIB.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/addr.c
linux-2.6.ib/drivers/infiniband/core/addr.c
Kernel component necessary to support the userspace RDMA connection management
library.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6
-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cma.c
linux-2.6.ib/drivers/infiniband/core/cma.c
--- linux-2.6.git/drivers/infiniband/core/cma.c 1969-12-31 16:00:00.0
-0800
+++ linux-2.6.ib/drivers/infiniband
Caitlin Bestler wrote:
The term private data is intended to convey the
intent that the data is private to the application
layer and is opaque to middleware and the network.
The private data area is for the use of whatever client resides above the
Infiniband CM only. There is no assumption
Roland Dreier wrote:
+struct rdma_ucm_query_route_resp {
+ __u64 node_guid;
+ struct ib_user_path_rec ib_route[2];
+ struct sockaddr_in6 src_addr;
+ struct sockaddr_in6 dst_addr;
+ __u32 num_paths;
+ __u8 port_num;
+ __u8 reserved[3];
+};
Is there a 32-bit/64-bit compatibility
Roland Dreier wrote:
On the other hand I think it would be good to let this userspace
interface cook a little more, say in -mm.
I think that this makes sense.
- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo
Extend matching connection requests to listens in the Infiniband CM to include
private data checks.
This allows applications to listen on the same service identifier, with private
data directing the request to the appropriate application.
Signed-off-by: Sean Hefty [EMAIL PROTECTED
Add an address translation service that maps IP addresses to Infiniband
GID addresses using IPoIB.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
This should be the correct patch. The only difference between this and the
mis-post
is the use of mutex_lock/unlock in place of up/down.
diff
-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cma.c
linux-2.6.ib/drivers/infiniband/core/cma.c
--- linux-2.6.git/drivers/infiniband/core/cma.c 1969-12-31 16:00:00.0
-0800
+++ linux-2.6.ib/drivers/infiniband
Kernel component necessary to support the userspace RDMA connection management
library.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
Discussion on the list suggested giving the userspace interface more time to
develop, which seems reasonable.
diff -uprN -X linux-2.6.git/Documentation
Here's an updated version of these patches based on feedback. (The license
did not change and continues to match that of the other Infiniband code.)
Please consider for inclusion in 2.6.17.
This is just a ping for anymore feedback to this patch series, so that I can
respond to any requests
Here's an updated version of these patches based on feedback. (The license
did not change and continues to match that of the other Infiniband code.)
Please consider for inclusion in 2.6.17.
The following set of patches defines a connection abstraction for Infiniband and
other RDMA devices, and
The following patch provides common handling for marshalling data between
Userspace clients and kernel mode Infiniband drivers.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib
The following patch extends matching connection requests to listens in the
Infiniband CM to include private data.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cm.c
linux-2.6.ib/drivers/infiniband/core
The following provides an address translation service that maps IP addresses
to Infiniband addresses (GIDs) using IPoIB.
This patch exports ip_dev_find() to locate a net_device given an IP address.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation
on behalf of clients.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cma.c
linux-2.6.ib/drivers/infiniband/core/cma.c
--- linux-2.6.git/drivers/infiniband/core/cma.c 1969-12-31 16:00:00.0
-0800
This patch adds the kernel component to support the userspace Infiniband/RDMA
connection agent library.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core
Grant Grundler wrote:
Is this code going to get invoked very often?
In practice, it would be invoked when matching any listen requests
originating from the CMA (RDMA connection abstraction).
hrm..I'm not sure how to translate your answer into a workload.
e.g. which netperf or netpipe test
Roland Dreier wrote:
+ UCMA_MAX_BACKLOG= 128
Is there any reason that we might want to make this a tunable? Maybe
as a module parameter that's writable in sysfs...
There's no reason not to make this tunable.
- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev
The following set of patches defines a connection abstraction for Infiniband and
other RDMA devices, and serves several purposes:
* It implements a connection protocol over Infiniband based on IP addressing.
This greatly simplifies clients wishing to establish connections over
Infiniband.
* It
The following patch provides common handling for marshalling data between
userspace clients and kernel mode Infiniband drivers.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib
The following patch extends matching connection requests to listens in the
Infiniband CM to include private data.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cm.c
linux-2.6.ib/drivers/infiniband/core
The following provides an address translation service that maps IP addresses
to Infiniband addresses (GIDs) using IPoIB.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/addr.c
linux-2.6.ib/drivers
on behalf of clients.
- Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cma.c
linux-2.6.ib/drivers/infiniband/core/cma.c
--- linux-2.6.git/drivers/infiniband/core/cma.c 1969-12-31 16:00:00.0
-0800
This patch adds the kernel component to support the userspace Infiniband/RDMA
connection agent library.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core
+static void cm_mask_compare_data(u8 *dst, u8 *src, u8 *mask)
static void cm_mask_compare_data(u8 *dst, const u8 *src, u8 *mask)
but I would rename it to cm_mask_copy since it doesn't really do a compare.
I'll change this. The function is masking the data to use in the comparison,
but I can
52 matches
Mail list logo