Ishai Rabinovitz wrote:
There is a bug in SRP Engenio target that send a large value as service
timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long
timeout is not reasonable and it may leave the kernel module waiting on
wait_for_completion and may stuck a lot of
James Unfortunately this does happen. Sean has already said he
James can only access git trees at kernel.org.
I think he just said that he can only access git trees via http://.
I can access git://git.kernel.org or http://git.somewhere.else.
- Sean
Hal Rosenstock wrote:
Is there a possibility that there is a double deletion from a list
somewhere?
Perhaps but I don't see it. Sean ? Roland ?
I looked at this and couldn't find anything obviously wrong. I was waiting to
hear back to Michael's question about module unload being
This is correct. Note that the number of DREQ retries was changed to 15 now.
do you mean internally changed in the CM or somehow controlled from
the outside by uDAPL?
I meant the number of retries set by RDMA CM.
- Sean
___
openib-general mailing
in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().
Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 69bb089..f383a4f 100644
--- a/drivers
Eliminate remove_list by using list_del_init instead during device removal
handling.
Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
This removes a stack variable and simplifies the code, but does not fix
any bugs. We can defer this to 2.6.20
Re-organize code relating to cma_get_net_info() and rdam_create_id() to
optimize error case handling (no need to alloc memory/etc. as part of
rdma_create_id() if input parameters are wrong).
Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
This does
Steve Wise wrote:
Why? I don't see anything wrong with the git trees that are at
www.mellanox.co.il right now.
Just trying to simplify things and centralize the technology location...
Well, for myself, I have been unable to access the git trees at mellanox. For
me to access git
their deferred
work on module unload.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
Erez, can you see if this fixes the crash problem that you're seeing?
Index: cm.c
===
--- cm.c(revision 9680)
+++ cm.c(working copy
Can you post a patch pls?
This was the patch committed to svn. I'm creating a patch set for review for
2.6.19/2.6.20 to merge the svn code upstream. I will post those patches against
the 2.6.19 code tree when they are ready.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
Index: core/cma.c
Or Gerlitz wrote:
My understanding is that without this patch the side that sends the DREQ
would do few DREQ resends as of the firsts DREPs being lost and no
DREPs sent once the id at the peer side left the timewait state, correct?
This is correct. Note that the number of DREQ retries was
Matt Leininger wrote:
I'd add one more thing. To make the OFED release process go more
smoothly I'd like to see the maintainers for each stack component spin
out releases from time to time. Roland has been doing this with
libmthca and libibverbs. If we had the development releases for
Roland Dreier wrote:
Not to be difficult -- but I disagree. I think this statement doesn't
actually make sense, because: ** what does latest mean?? **
I think this is more a matter of whether there's a single, main development
branch somewhere, or if one even needs to exist.
Well, I think
Erez Zilber wrote:
When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an oops
(below). In order to check which module caused that oops, I replaced the
'modprobe -r' call with rmmod for each module:
rmmod ib_iser
rmmod libiscsi
rmmod scsi_transport_iscsi
rmmod rdma_cm
rmmod
Sean Hefty wrote:
The byte ordering in the kernel APIs are fairly clear about this, but that
documentation didn't carry up to userspace everywhere. I will update the
userspace documentation, but it may take me a few weeks to get to this.
I've added some additional comments next to structure
Sean Hefty wrote:
Currently a DREP is only sent in response to a DREQ if a connection
has been found matching the DREQ, and it is in the proper state. Once
a DREP is sent, the local connection moves into timewait. Duplicate
DREQs received while in this state result in re-sending the DREP
Krishna Kumar wrote:
Encapsulate duplicate code to common routine - avoid checking same
errors in multiple places.
I went back and forth on this, but ended up committing it, since it does
slightly simplify maintenance.
- Sean
___
openib-general
Krishna Kumar wrote:
rdma_listen could be called from a context where id_priv-list
is not initialized. Then at a later stage, a cma_cancel_listen
does a list_del() which could oops since this element is not
on any list.
Eg, in rdma_listen(), if id-device is !NULL, it calls
cma_ib_listen()
Good catch. Thanks - committed.
- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Krishna Kumar wrote:
- cma_leave_mc_groups can race with other routines updating
or reading the mclist, so use lock. Eg while doing a
rdma_destroy_id(), other processes could be looking at
this id and de-referencing mclist.
I don't think that there's an issue here.
The mc_list is only
Sean Hefty wrote:
1. Should I even be looking at event-status or does the event type tell me
everything I need to know? I've had a report that the assertion
(event-status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR.
It sounds like (and looks like from reading the code) that you've hit
BTW, there was a set of bugfix patches for CMA posted that didn't get acked or
nacked yet. They looked sane and I took them into ofed - could you take the
time to review please? Should I repost? It might make sense to put stability
fixes in before adding more features.
I've actually been on
Connections taking 60 sec to create is an issue.
Can you please explain how the fact that some connections are used affect
the time it takes to send the response?
This is in userspace, and IMO, an application issue. Threads using established
connections simply begin consuming all processor time.
Now that changes from the iWarp branch have been merged upstream, I wanted to
get feedback about migrating existing changes in svn upstream, or removing
features from svn. Specifically, the following features are in svn only:
* RDMA CM:
- userspace support
- multicast support
1. Should I even be looking at event-status or does the event type tell me
everything I need to know? I've had a report that the assertion
(event-status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR.
The event type is usually sufficient. In the case of an error, the status
should provide
Erez Zilber wrote:
In the IB spec it says in 12.7.29:
The recipient of the REQ message shall choose a local Initiator Depth that
does not exceed the Responder Resources offered in the REQ. If the recipient
of the REQ message is unwilling or unable to do so, it shall send a
REJ message to
Bub Thomas wrote:
Do you know rany other Verbs or CM parameter that does have a different
byte order between gen1 and gen2?
I'm not really familiar with the gen1 code.
P.S.: Maybe someone should put a big “Warning” sign somewhere so that
others don’t stumple into that pit again. ;-)
The
Or Gerlitz wrote:
change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add help text
clarifying what the thing does. Adding the help text also has the side
effect of the cma config being visible when one does make menuconfig
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
Acked-by: Sean Hefty
a query (e.g. multipath record queries),
but it also simplifies a userspace interface.
The implementation of existing SA query routines were layered on top
of the generic query interface.
Signed-off-by: Sean Hefty sean.hefty at intel.com
---
Index: include/rdma/ib_sa.h
I merge 100 patches every kernel release. If I have to spend an
extra 5 minutes creating a patch or pulling it out of svn, then I end
up burning an extra day of stupid work. If 20+ people who contribute
patches sent me clean patches, then everyone will be happier because
I'll be able to merge
increase the timewait state before a QP
can be re-used when CM messages are not lost.
An alternative is to send a DREP in response to a DREQ, even if a local
connection is not found, which is what this patch does.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
Index: cm.c
Committed to svn 9461. Roland, can you also pull into 2.6.19?
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
Or Gerlitz wrote:
Document the reject sending and modifying qp to error done in rdma_accept
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
diff --git a/include/rdma/rdma_cm.h b/include
.
Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]
Dropping 3 packets in a row seems likely only under stress testing, so I'm not
sure that this is worthy of a change to 2.6.18 at this point (we're at rc7).
This seems fine for 19 though.
Acked-by: Sean Hefty [EMAIL PROTECTED
Krishna Kumar wrote:
Thanks for the explanation. So a list_del_init() would be the best
thing to do. Another option is to add a remove_list to rdma_id_private
by which this entry could be added to a local remove_list and traversed
without holding a lock, but it doesn't make sense to add that
Bub Thomas wrote:
Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to gen1?
No - the gen1 code is really the old Topspin code. Topspin is now part of
Cisco, so they may have something.
Or is there any other trick to play here?
I don't think so. I'm pretty sure that this
Hal Rosenstock wrote:
But it only needs the MTU on each local side (once for the REQ and on
the remote side for the REP). It would mean that if the local side were
capable of larger MTU and the remote side were Tavor, that the REQ would
be REJ with MTU too large and need to be retried at a
Michael S. Tsirkin wrote:
I think we can do that without breaking IPoIB.
IPoIB needs mtu = 1K. IPoIB sets mtu selector to = 2K.
I am talking about users that do not set mtu selector.
The ipoib spec requires support for a 2k MTU, but allows support for smaller
MTUs. I agree that if the ipoib
Putting knowledge about hw quirks in all protocols is really horrible.
Agreed.
MTU should be decided by SA as part of path information.
If ULPs have spicific limitations wrt MTU they should use mtu selector
in path record query.
Thinking about this more, the proper place for this does seem to
Michael S. Tsirkin wrote:
I don't really understand. The fix is a one-liner.
The problem is observed in practice, under stress.
Who *wants* systems that fall apart under stress?
My view is: is this worth delaying the release of the kernel? And I don't see
that it is at this point in the
Michael S. Tsirkin wrote:
Although, I don't like the idea of the CMA changing every path to use an MTU
of
1k.
Well, that's why it's off by default.
So, Ack?
I'd like to find a way to support a 1k MTU to tavor HCAs without making the MTU
1k to other HCAs, in case we're dealing with a
Michael S. Tsirkin wrote:
That's the default and not the minimum MTU (for IPoIB).
How isn't it? By default, IPoIB reports 2K MTU to linux.
So it will get 2K packets, and since IB swiches
can not fragment packets, they will simply get dropped.
I think this is simply the difference between the
Michael S. Tsirkin wrote:
IB/cma: add rdma_establish
Make it possible for ULPs to handle RTU loss by calling
rdma_establish.
I've committed this patch to svn 9470. It still requires exporting the
rdma_establish call to userspace.
- Sean
___
Roland Dreier wrote:
OK, I added the following to my for-2.6.19 branch. The differences
from your patch are:
- CMA can have a static variable (good to avoid clashes with a global
'sa_client' variable name too)
- IPoIB does not use multicast module upstream, fix ipoib_multicast.c too.
Or Gerlitz wrote:
Just to make sure, you come to say that you would merge this patch
instead the one that had the CM track local qp numbers and install a
callback for the consumer QP to catch the async event etc?
correct
Indeed the **patch** for itself is somehow simpler, but the consumer
Krishna Kumar2 wrote:
mutex_lock(lock);
while (!list_empty(cma_dev-id_list)) {
id_priv = list_entry(cma_dev-id_list.next,
struct rdma_id_private, list);
if (cma_internal_listen(id_priv)) {
Bub Thomas wrote:
What I don’t understand why the local_cm_response_timeout set to 254
instead of 20 can block IBV_WR_SEND from client to server while the
opposite direction from server to client works!?
local_cm_response_timeout is a 5-bit value. It's 4.096 x 2 ^
local_cm_response_timeout
Or Gerlitz wrote:
+ * In the case of error, a reject message is sent to the remote side and the
+ * state of the qp associated with the id is modified to error, such that any
+ * previously posted receive buffers would be flushed.
Hmm... this makes me question whether this is what it should be
Can you queue this for 2.6.19 ?
Roland, can you pull this patch in for 2.6.19? It's SVN check-in 9273.
---
Clarify that rdma_destroy_id cancels outstanding asynchronous operations on the
Associated id.
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
Signed-off-by: Sean Hefty [EMAIL PROTECTED
Michael S. Tsirkin wrote:
The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a
second call is not made to rdma_connect after the first call fails. So we're
probably safe deferring this until 2.6.19, unless someone has code which
calls
rdma_connect twice.
SDP can do
This test (for now) don't send any join message to the SA, it only
attach (and detach) the QP to the multicast group.
I posted a simple multicast test program that uses the proposed libibsa
interface in:
http://openib.org/pipermail/openib-general/2006-August/025433.html
(See the program at the
Well, the idea of pushing timewait handling down into the low-level
drivers seems strange to me. I don't think any other stack or any
other OS does anything like this.
I think the Windows IB stack may do something similar.
The difficulty is doing this at a higher level is that the QP must be
Michael S. Tsirkin wrote:
As a side note, reasons for frequent loss of RTU must be investigated.
A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU
never showing up?
Seems like that. I know fir sure I do accept after REP but remote side never
gets ESTABLISHED.
I
I completely understand that the existing port management services are not
exported, but functionally, they support multiple port spaces, show up in
netstat, etc... Can someone please explain to me the reluctance to use these
services in favor of replicating them?
My reluctance to use the
Roland Dreier wrote:
I haven't really read the later patches but I am planning on merging
at least the registration stuff for 2.6.19.
I'd like to commit the SA related patches soon. There have been several
e-mails
recently about using IB multicast and the IB CM directly.
- Sean
Bub Thomas wrote:
with the help of your modified cmpost.c example I found out that the
byte order in the lid your query_for_path in cmpost.c is getting into
the ib_sa_path_rec is the opposite to the one reported by ibv_query_port.
The path record defines all fields in network-byte order.
Dotan Barak wrote:
The user-mode cm header files don't have the C++ stuff to identify all
the declarations as C. The verbs.h file has it and works fine if you
wanted to copy it, but all you really need is ...
Sean, please add those definitions to the libibcm header as well.
I've updated the
Michael S. Tsirkin wrote:
cma_connect_ib leaks an struct ib_cm_id* in failure cases.
Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
This one looks like it might be good for 2.6.18. Sean?
The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a
second call is not made to
Krishna Kumar wrote:
cma_connect_ib leaks an struct ib_cm_id* in failure cases.
Thanks - committed.
- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
Krishna Kumar wrote:
Re-organize code relating to cma_get_net_info() and rdma_create_id() to
optimize error case handling (no need to alloc memory/etc as part of
rdma_create_id() if input parameters are wrong).
Thanks! Committed with a minor adjustment to rename 'out' label 'err'.
- Sean
Krishna Kumar wrote:
static void cma_process_remove(struct cma_device *cma_dev)
{
struct list_head remove_list;
- struct rdma_id_private *id_priv;
+ struct rdma_id_private *id_priv, *tmp;
int ret;
INIT_LIST_HEAD(remove_list);
@@ -2344,22 +2344,20 @@ static
- CMA can have a static variable (good to avoid clashes with a global
'sa_client' variable name too)
Sounds good - that's a goof on my part.
- IPoIB does not use multicast module upstream, fix ipoib_multicast.c too.
Okay - As an FYI, I will probably submit the multicast module upstream for
Michael S. Tsirkin wrote:
Sean, did we decide what to do for upstream yet?
I would say we need something like the below for 2.6.19 too
(probably just need to update node type check).
And, I like it that this approach leaves all matters of policy
to users (such as whether move QP to RTS after
# udaddy
udaddy: starting server
librdmacm: Kernel ABI does not support requested port space.
udaddy: listen request failed
test complete
return status -93
UD QP and multicast support requires kernel ABI version 2. It appears that the
kernel version running is 1.
- Sean
Bub Thomas wrote:
Dotan,
the ibv_rc_pingpong example works for me so I can exclude the
architecture.
I never got the libibcm example compiled.
Which is your example and which architecture x86 vs. x86_64 did you
compile it for?
Can you share your libibcm the example code? (if it is not the
ok, thanks for clarifying that, is cancellation allowed only for address
resolution or also for route resolving and/or CM calls? also how about
documenting this?
Cancellation is allowed for any asynchronous operation. I will pull in your
patch when I get back in the office. Thanks.
- Sean
/usr/bin/ld: warning: libibverbs.so.1, needed by
/usr/local/lib/librdmacm.so, may conflict with libibverbs.so.2
Does rdmacm use the older version of ibverbs or do I need to install
rdmacm differently?
I keep the RDMA CM updated with the latest version of verbs. There may be an
issue with the
Does this patch protects against the case where an rdma_cm_id is being
destructed while address resolution related to the **same** id attaches
it to a device?
If yes, why does someone destroys this id? is it legal to do so?
Yes - this protects against the user destroying the id while that same
I'll test some, but the problem hasn't reappeared since.
The patch looks right, I'd say push it for 2.6.18.
We need the following change, which applies on top of the previous patch, as
well.
Add missing synchronization around acquiring an IB device.
Signed-off-by: Sean Hefty [EMAIL PROTECTED
This closes a window where address resolution can attach an rdma_cm_id
to a device during destruction of the rdma_cm_id. This can result in
the rdma_cm_id remaining in the device list after its memory has been
freed.
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
I generated this patch off
Comma should be semi-colon
Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
Please queue for 2.6.19
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d6f99d5..bf20410 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -265,7 +265,7
Can you see if this patch helps any?
This closes a window where address resolution can attach an rdma_cm_id
to a device during destruction of the rdma_cm_id. This can result in
the rdma_cm_id remaining in the device list after its memory has been
freed.
Signed-off-by: Sean Hefty [EMAIL
Michael S. Tsirkin wrote:
Apparently, list-prev pointer in CMA id_priv structure is NULL
which causes a crash in list_del.
I note that rdma_destroy_id tests outside the mutex lock.
Could that be the problem?
The problem is not unfortunately easily reproducible.
I think I see one bug, but
Michael S. Tsirkin wrote:
I'm trying to come up with a fix for this, but I'm not convinced it's the
problem that you're seeing.
Could be what you describe leads to a memory corruption.
I believe so. If this were the cause of the crash, I would expect to see an
issue with list-prev-prev or
. In this situation, the DREQ gets dropped
repeatedly.
We will want to queue this patch for 2.6.19, if you can point Roland to your
git
tree.
Acked-by: Sean Hefty [EMAIL PROTECTED]
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman
Roland Dreier wrote:
While merging this, I uninlined rdma_node_get_transport, since I don't
think there's any reason to make it inline:
I've committed the patch to svn to sync as well.
- Sean
___
openib-general mailing list
openib-general@openib.org
There are compilation errors with this patch when using gcc 4.1.0:
Hmmm... I will look into this.
- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
Why SEND ? In general, couldn't it be used like SET/DELETE (in addition
to being used like the GET method variants) ? Also, the SA doesn't use
the SEND method.
The latest version of the patch only allows GET or GET_TABLE for PathRecords
ServiceRecords, and MCMemberRecords, and GET_MULTI for
Here's an idea:
how about we move the whole timewait thing to low level driver,
starting timer automatically upon QP destroy?
I've thought about this too, and I think this may end up making the most sense.
How would the driver determine how long the QP should remain in timewait, and
how would you
I handled it all myself this time, but in the future it is easier for
me if each patch is inline in a separate email. A couple of other
things that would also make my life easier:
That's not a problem. I think in the past I've just referred you to the svn
revision numbers. I was just trying to
Michael S. Tsirkin wrote:
I think offsetof is defined in stddef.h, so you must include that.
Dotan,
Can you see if adding this include works for you? I just re-tested the build
on
my system, and it worked fine without it (gcc 3.3.3). Jack posted a patch for
this earlier if you need one.
-
Michael S. Tsirkin wrote:
I've thought about this too, and I think this may end up making the most
sense.
How would the driver determine how long the QP should remain in timewait,
Need to look into this - likely we can just add a call for that.
Roland?
The Intel gen1 code passed this into
Hal Rosenstock wrote:
OK. So shouldn't IBV_SA_METHOD_SEND be removed from sa_net.h ?
I was just defining the well known methods. I can remove this.
By raw access, do you mean SEND_MAD operation ?
How do those applications gain this privilege ?
The kernel module exports two files to
Jack Morgenstein wrote:
Fix compilation on SLES10:
cm.c uses offsetof, so it must include stddef.h
Thanks - committed in 9150.
- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To
Looked into the openIB kernel sources and found that the minor number
seems to be wrong in the README file. With a minor number 224 and the
creation like:
mknod /dev/infiniband/ucm0 c 231 224
The README file was never updated when the userspace CM added per device
handling. I've updated
Sean Hefty wrote:
How would the driver determine how long the QP should remain in timewait
The spec isn't totally clear to me on this, but here's what I can gather:
timewait = packet lifetime x 2 + remote ack delay
local_ack_timeout (in CM REQ) = packet lifetime x 2 + local ack delay
Verbs
Michael S. Tsirkin wrote:
Verbs gets local_ack_timeout through qp_attr.timeout when modifying the QP to
RTS.
Isn't that RTR?
It's the transition from RTR to RTS.
So it seems we won't need any API changes. This begins to look good.
I waner what Roland and other low level driver
Michael S. Tsirkin wrote:
Hmm. But you need timewait already after you get to RTR, right?
The active side looks fine. The passive side can enter timewait without moving
through RTS if it gets an RTU timeout. I'm not sure how much going into
timewait really helps in this case though.
If we
Michael S. Tsirkin wrote:
If we completely ignore timewait, what conditions are required to have a
problem
occur?
Outstanding packets with PSNs and QP numbers coinside between the 2
connections.
Look for Stale packet in IB spec.
From what I can tell, a QP will receive an incoming packet
Michael S. Tsirkin wrote:
Apparently, list-prev pointer in CMA id_priv structure is NULL
which causes a crash in list_del.
I note that rdma_destroy_id tests outside the mutex lock.
Could that be the problem?
The problem is not unfortunately easily reproducible.
I'll see if I see a problem.
Michael S. Tsirkin wrote:
Comments appreciated.
I will look at the spec in more details, but I thought that timewait was
included as part of the life of a connection. I.e. the connection wasn't
released until it returned to idle. Also, isn't the purpose behind timewait to
prevent
Michael S. Tsirkin wrote:
IB spec, section 12.4, says:
CMs shall maintain enough connection state information to detect an
attempt
to initiate a connection on a remote QP/EEC that has not been released
from a connection with a local QP/EEC, or that is in the TimeWait
Michael S. Tsirkin wrote:
So, you must somehow detect that the remote QP is in timewait state.
I don't see any way to do this, and this is not what the CM
currently does.
Our CM tracks local QPs in timewait state, which is obviously not
what the spec intends since remote QP could be reused
Michael S. Tsirkin wrote:
Another problem that I see is that CMA currently seems to completely
mask timewait exit.
This is correct.
So there's no way to properly handle timewait on top of cma that I can see.
I don't think so, which is what brought up the problem with Arlin. (He's using
Michael S. Tsirkin wrote:
I believe communication id should be checked to detect duplicates. Right?
Can you clarify this? Check the remote comm id of an incoming REQ against a
value in timewait?
Remote QPN stale connection rule is only to avoid a case where we keep
connection in established
Well, what is an OpenFabrics driver anyway? I'm interesting in
writing Linux drivers to be honest.
It's often ignored, but OpenFabrics does include Windows.
My understanding is that the requirement for lower level components is that they
must be licensed using dual GPL / BSD. This agreement
- randomize starting local comm id
Let me know if you'd prefer these in another format (such as inline).
- Sean
From d697059a6f69e19c18a50c87df20894d253d3d8f Mon Sep 17 00:00:00 2001
From: Sean Hefty [EMAIL PROTECTED]
Date: Mon, 28 Aug 2006 15:15:18 -0700
Subject: [PATCH] Randomize the starting local
Sean Hefty wrote:
Modify the libibcm API to provide better support for multi-threaded
event processing. CM devices are no longer tied to verb devices
and hidden from the user. This should allow an application to direct
events to specific threads for processing.
This patch also removes
Thomas How does an adapter guarantee that no bridges or other
Thomas intervening devices reorder their writes, or for that
Thomas matter flush them to memory at all!?
That's a good point. The HCA would have to do a read to flush the
posted writes, and I'm sure it's not doing that
Michael S. Tsirkin wrote:
Maybe the librdmacm part should be merged to svn?
So librdmacm could try to read from misc, then from
/sys/class/infiniband/rdma_cm, and then assume latest.
It's good to have userspace code portable across distros ...
I can go with that.
- Sean
Michael S. Tsirkin wrote:
And even with these proposed changes, there's a race condition where the CM
can timeout a connection after data is received over it, but before this event
can be processed.
Hmm. And what happens then?
The connection is aborted by the CM. The CM sends a REJ for the
301 - 400 of 2277 matches
Mail list logo