Re: [ewg] Agenda for the OFED meeting today (May 5)

2008-05-05 Thread Olaf Kirch
Hi Tziporet,

   RDS fixes for RDMA API - done

As an update, I just sent Vlad a bugfix for a RDMA related
crash in RDS. It would be cool if that could be included in
1.3.1.

I am also currently testing three more bugfix patches;
two of them related to dma_sync related issues, and one patch
to reduce the latency of RDS RDMA notifications (a process
expects a notification from the kernel that tells it when it's
okay to release the RDMA buffer - the current code tries to
give a reliable status at the expense of one round-trip; this
turns out to be too slow for some purposes).

It is not yet clear however which (if any) of these three
pending patches will make OFED 1.3.1.

Regards,
Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [ofa-general] [PATCH 0/8] RDS patch set

2008-04-29 Thread Olaf Kirch
On Tuesday 29 April 2008 13:36:36 Or Gerlitz wrote:
 Olaf - I prefer that we wait a second for Roland to finish the review 
 and merging of the mthca/mlx4 patches (maybe it would be splited to two 
 patches) before merging them into 1.3.1 such that the instance in ofed 
 would be the exact copy of the one in the kernel, agree?

I don't have a very strong opinion on this either way. If Vlad
wants to merge the patch the way I submitted it, I won't argue.
If he replaces it with the patch that ends up in mainline, I'm
just as happy.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: Please sned me pathces to release notes for OFED 1.3

2008-02-18 Thread Olaf Kirch
Hi Tziporet,

On Monday 18 February 2008 14:53, Tziporet Koren wrote:
 rds_release_notes.txt - Olaf
 RDS_README.txt - Olaf

I'm attaching a patch for RDS_README, which replaces it with
a newer copy of the manpage. The manpage will also go into
rds-tools, I hope.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] Report proper error code in [was: trying to reproduce the crash]

2008-02-04 Thread Olaf Kirch
I've been struggling with crashes in mthca_arbel_map_phys for a few days 
(triggered
by RDS), and I think I'm finally making some progress

mthca_fmr_alloc does this:

if (mthca_is_memfree(dev)) {
err = mthca_table_get(dev, dev-mr_table.mpt_table, key);
if (err)
goto err_out_mpt_free;
...
}

/* when we get here, err == 0 (at least for memfree cards) */
mr-mtt = __mthca_alloc_mtt(dev, list_len, dev-mr_table.fmr_mtt_buddy);
if (IS_ERR(mr-mtt))
goto err_out_table;

err_out_table:
/* clean up some */
return err;

ie we set mr-mtt to some ERR_PTR(-whatever), and return success.

The same problem exists when mailbox allocation fails.

I fixed this, using the patch below. Now I'm making some progress:
First, the kernel reports:

RDS/IB: ib_alloc_fmr failed (err=-12)

which is good - now we get a decent error code instead of a crash.
A little later, it complains:

ib_mthca :05:00.0: SW2HW_MPT returned status 0x0a

which doesn't sound quite as good... and things are very hosed
from that moment on; reloading ib_mthca seems to fix things, however.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
--- snip ---
From: Olaf Kirch [EMAIL PROTECTED]
Subject: Return proper error codes from mthca_fmr_alloc

If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc
would return 0 (success) no matter what. This leads to crashes a little
down the road, when we try to dereference eg mr-mtt, which was
really ERR_PTR(-ENOMEM).

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]
---
 drivers/infiniband/hw/mthca/mthca_mr.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
===
--- ofa_kernel-1.3.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -613,8 +613,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
sizeof *(mr-mem.tavor.mpt) * idx;
 
mr-mtt = __mthca_alloc_mtt(dev, list_len, dev-mr_table.fmr_mtt_buddy);
-   if (IS_ERR(mr-mtt))
+   if (IS_ERR(mr-mtt)) {
+   err = PTR_ERR(mr-mtt);
goto err_out_table;
+   }
 
mtt_seg = mr-mtt-first_seg * MTHCA_MTT_SEG_SIZE;
 
@@ -627,8 +629,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
mr-mem.tavor.mtts = dev-mr_table.tavor_fmr.mtt_base + mtt_seg;
 
mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL);
-   if (IS_ERR(mailbox))
+   if (IS_ERR(mailbox)) {
+   err = PTR_ERR(mailbox);
goto err_out_free_mtt;
+   }
 
mpt_entry = mailbox-buf;
 


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-22 Thread Olaf Kirch
On Sunday 20 January 2008 20:57, Roland Dreier wrote:
 If you could send me some code and a recipe to get the bogus CQ
 message, that might be helpful.  Because as far as I can see, there
 shouldn't be any way for a consumer to get that message without a bug
 in the low-level driver.  It's fine if it's a whole big RDS test case,
 I just want to be able to run the test and instrument the low-level
 driver to get a better handle on what's happening.

Okay, I put my current patch queue into a git tree. It's in
the testing branch of

git://www.openfabrics.org/~okir/ofed_1_3/linux-2.6.git
git://www.openfabrics.org/~okir/ofed_1_3/rds-tools.git

In order to reproduce the problem, I usually run

while sleep 1; do
rds-stress -R -r locip -s remip -p 4000 -c -d2 -t8 -T5 -D1m
done

Within minutes, I get syslog messages saying

Timed out waiting for CQs to be drained - recv: 0 entries, send: 4 entries left

This message originates from net/rds_ib_cm.c - as a workaround, I added
a timeout of 1 second when waiting for the WQs to be drained. I usually
get those stalls after a WQE completes with status 10 (or sometimes 4).

 BTW, what kind of HCA are you using for this testing?

A pair of fairly new Mellanox cards.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-21 Thread Olaf Kirch
On Friday 18 January 2008 23:12, Roland Dreier wrote:
   The corruption happened when the process that allocated the MRs went
   away in the middle of the operation. We would free the MR and invalidate
   - and expect the in flight RDMA to error out. RDS does not know who is
   doing RDMA to or from a MR at any given time.
 
 OK, I see.  Of course this error will move your QP to the error state
 and cause other in-flight operations on behalf of other processes to
 fail and need to be reissued after you reconnect.  Seems like a bit of
 a mess but I don't see a way around it if you want to multiplex direct
 access operations to multiple different processes over the same QP.

Yes, and that's the whole point of RDS. Sockets are unconnected and you
use sendto, else we'd drown in sockets. I will readily agree that this
approach, while it's fast and simple, does get us into a bit of a mess
sometimes :-)

   Is that a safe thing to do? I found the spec a little unclear on
   the ordering rules. It *seems* that RDMA writes are always fencing
   against subsequent operations, and RDMA reads will fence if we ask
   for it. But I'm not perfectly sure whether the ordering applies
   to the sending system only, or if IB also guarantees that the
   RDMA will have completed when it puts the incoming message on
   the completion queue at the consumer.
 
 I believe this is safe.  I can't point to chapter and verse in the
 spec, but operations are supposed to complete in order, so I don't
 think that the receive completion can appear before earlier responder
 operations have completed.

Okay, thanks. Much appreciated,
Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-20 Thread Olaf Kirch
On Friday 18 January 2008 21:41, Roland Dreier wrote:
 I don't follow this.  All work requests should generate a completion
 eventually, unless you do something like destroy the work queue or
 overrun a CQ.  So what part of the spec are you talking about here?

The part on affiliated asynchronous errors says WQ processing is stopped.
This also happens if we're signalling a remote access error to the
other host.

When a RDMA operation errors out because the remote side destroyed the MR,
the RDMA WQE completes with error 10 (remote access error), which is exepected.
The other end sees an affiliated asynchronous error code 3 (remote access
error), which is also expected.

Now, on the sending system, I'm seeing send queue entries that do not get
completed. The RDMA itself is completed in error; the subsequent SEND
is completed (error 5, flushed) as well. But one or more entries seem to
remain on the queue - at least my book-keeping says so. I double checked
the book-keeping, and it seems accurate...

All very strange.

   I tried destroying the QP first, then we know we can pick off
   any remaining WRs still allocated. That didn't work, as the card
   seems to generate interrupts even after the QP is gone. This results
   in lots of errors on the console complaining about Completion to
   bogus CQ.
 
 Destroying a QP should immediately stop work processing, so no
 completions should be generated once the destroy QP operation
 returns.  I don't see how you get the bogus CQ message in this case --
 it certainly seems like a driver bug.  Unless you mean you are
 destroying the CQ with a QP still attached?  But that shouldn't be
 possible because the CQ's usecnt should be non-zero until all attached
 QPs are freed.  Not sure what could be going on but it sounds bad...

This may be a driver bug, yes.

   I then tried to move the QP to error state instead - this didn't
   elicit a storm of kernel messages anymore, but still I seem to get
   incoming completions.
 
 The cleanest way to destroy a QP is to move the QP to the error state,
 wait until you have seen a completion for every posted work request
 (the completions generated after the transition to the error state
 should have a flush status), and then destroy the QP.

Okay, that's what the RDS code does currently, but I get stuck waiting
for the queue to drain - it simply never does.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Olaf Kirch
On Thursday 17 January 2008 11:57, Johann George wrote:
  That's a remote invalid request error. Were you testing
  with RDMA or without?
 
 We were using the version that runs over IB.

Well, yes. But you can do that with ordinary SENDs, or you
can enable RDMA for large data blobs as well. But looking at
the qperf source, it doesn't do that.

 To run it, on one machine (the server), run it with no
 arguments.  On the other machine, run:
 
 qperf server_nodename rds_bw

Okay, will give it a try.

 If the TCP part is entirely non-working, it might be better
 to disable it for now rather than have it crash the machine.
 So far, I have never gotten it to function correctly and it
 crashes some machines almost immediately.

Let's put it that way - nobody looked at the code for a while.
I kind of put it at the bottom of my todo list, around position 18
or so :-/

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch

When I hit a RDMA error (which happens quite frequently now at rds-stress
exit, thanks to the fixed mr pool flushing :) I often see the RDS 
shutdown_worker
getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear.
This usually works, as all WQ entries are flushed out. This doesn't happen
when a RDMA transfer generates a remote access error, and that seems to be
intended according to the spec.

I tried destroying the QP first, then we know we can pick off
any remaining WRs still allocated. That didn't work, as the card
seems to generate interrupts even after the QP is gone. This results
in lots of errors on the console complaining about Completion to
bogus CQ.

I then tried to move the QP to error state instead - this didn't
elicit a storm of kernel messages anymore, but still I seem to get
incoming completions.

Any other suggestions?

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch
On Thursday 17 January 2008 16:51, Dotan Barak wrote:
 Moving the QP to error state flushes all of the outstanding WRs and 
 create a completion for each WR.
 If you want to delete all of the outstanding WRs, you should move the QP 
 state to reset.
 
 (Is this is what you asked?)

My question was more along the lines - can I expect that all pending
WRs have been flushed when ib_modify_qp returns? At least for error
state this does not seem to be the case - I'm still getting completions
on the receive QP from the mellanox card after I do this.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Issues with fmr_pool

2008-01-16 Thread Olaf Kirch
Hi all,

I've been debugging a memory corruption in the RDS zerocopy code for
the past several days - basically, when we tear down a socket and destroy
any existing MRs, RMDA writes that are in progress continue well after
we've freed the MR and flushed the fmr_pool.

After chasing several schools of red herrings I think I understand the
problem. I believe there are two bugs in the fmr_pool code.

The first bug is this:

The fmr_pool has a per pool cleanup thread, which gets woken in two cases.
One, when there are too many FMRs on the dirty_list, and two, when the
user explicitly asked for a flush.

Now, ib_flush_fmr_pool synchronizes with the cleanup thread using two
atomic counters - one is a request serial number, which gets bumped
by ib_flush_fmr_pool, and the other is the flush serial number, which
gets incremented whenever the cleanup pool actually flushes something.
When the two are equal, we've flushed everything, and the cleanup thread
can go back to sleep.

Now the bad thing is, the two can get out of sync. When there are
too many FMRs on the dirty list, the cleanup thread will perform a
flush as well, and bump the flush serial number. The next time around
someone calls ib_flush_fmr_pool, the request serial number is incremented
and *is now equal* to the flush serial number - and nothing is flushed
at all.

The second bug (or maybe it's just a misunderstanding on my part) has
far worse consequences.

When we release a FMR using ib_fmr_pool_unmap, it will do one of two
things. If the fmr's remap_count is less than max_remaps, it will
be added to the free_list right away. If it exceeds max_remaps, it
will be added to the dirty_list.

Now when the user calls ib_flush_fmr_pool, it will only inspect the
dirty list, but leave the free_list alone. So all the while we *think*
we have invalidated all FMRs freed previously, most of them will stay
active because they're not inspected *at all*. So ib_flush_fmr_pool does
nothing 31 out of 32 times (32 is the default max_remaps value).

I will post two patches for these issues in follow-up emails. In general
however I wonder if the fmr_pool interface is really optimal. The major
concern I have is that the whole page pinning, mapping and unmapping
business is the caller's responsibility, but we don't know when the
underlying MR really goes away. So in order to be on the safe side,
the caller has to keep any pages mapped and pinned until the next
call to flush_fmr_pool. IMHO it would be very useful if there was a
callback function that lets you know that a particular MR was
zapped. I guess something like this could be engineered using the
flush_function, but that's really a very spartan interface, and requires
you to keep your deceased MRs on yet another list for later disposal.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Olaf Kirch
From: Olaf Kirch [EMAIL PROTECTED]
Subject: [fmr_pool] fmr_pool flush serials can get out of sync

Normally, the serial numbers for flush requests and flushes
executed should be in sync.

When we decide to flush dirty MRs because there are too many of them, we
wake up the cleanup thread and let it do its stuff.  As a side effect, it
increments pool-flush_ser, which leaves it one higher than req_ser. The
next time the user calls ib_flush_fmr_pool, it will wake up the cleanup
thread, but won't wait for the flush to complete. This can cause memory
corruption, as the user expects the flush to have taken place.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]
---
 drivers/infiniband/core/fmr_pool.c |   16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

Index: ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
===
--- ofa_kernel-1.3.orig/drivers/infiniband/core/fmr_pool.c
+++ ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
@@ -182,8 +182,7 @@ static int ib_fmr_cleanup_thread(void *p
struct ib_fmr_pool *pool = pool_ptr;
 
do {
-   if (pool-dirty_len = pool-dirty_watermark ||
-   atomic_read(pool-flush_ser) - atomic_read(pool-req_ser) 
 0) {
+   if (atomic_read(pool-flush_ser) - atomic_read(pool-req_ser) 
 0) {
ib_fmr_batch_release(pool);
 
atomic_inc(pool-flush_ser);
@@ -194,8 +193,7 @@ static int ib_fmr_cleanup_thread(void *p
}
 
set_current_state(TASK_INTERRUPTIBLE);
-   if (pool-dirty_len  pool-dirty_watermark 
-   atomic_read(pool-flush_ser) - atomic_read(pool-req_ser) 
= 0 
+   if (atomic_read(pool-flush_ser) - atomic_read(pool-req_ser) 
= 0 
!kthread_should_stop())
schedule();
__set_current_state(TASK_RUNNING);
@@ -397,6 +395,7 @@ EXPORT_SYMBOL(ib_destroy_fmr_pool);
  */
 int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
 {
+   int flush_count = atomic_read(pool-flush_ser);
int serial = atomic_inc_return(pool-req_ser);
 
wake_up_process(pool-thread);
@@ -405,6 +404,9 @@ int ib_flush_fmr_pool(struct ib_fmr_pool
 atomic_read(pool-flush_ser) - serial = 
0))
return -EINTR;
 
+   flush_count = atomic_read(pool-flush_ser) - flush_count;
+   BUG_ON(flush_count == 0);
+
return 0;
 }
 EXPORT_SYMBOL(ib_flush_fmr_pool);
@@ -511,8 +513,10 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr
list_add_tail(fmr-list, pool-free_list);
} else {
list_add_tail(fmr-list, pool-dirty_list);
-   ++pool-dirty_len;
-   wake_up_process(pool-thread);
+   if (++pool-dirty_len = pool-dirty_watermark) {
+   atomic_inc(pool-req_ser);
+   wake_up_process(pool-thread);
+   }
}
}
 

-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH 2/2] fmr_pool_flush didn't flush all MRs

2008-01-16 Thread Olaf Kirch
From: Olaf Kirch [EMAIL PROTECTED]
Subject: [fmr_pool] fmr_pool_flush didn't flush all MRs

When a FMR is released via ib_fmr_pool_unmap, the FMR usually ends up
on the free_list rather than the dirty_list (because we allow a certain
number of remappings before actually requiring a flush).

However, ib_fmr_batch_release only looks at dirty_list when flushing
out old mappings. This can lead to memory corruption as the user
expects *all* old mappings to go away.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]
---
 drivers/infiniband/core/fmr_pool.c |   15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Index: ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
===
--- ofa_kernel-1.3.orig/drivers/infiniband/core/fmr_pool.c
+++ ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
@@ -139,7 +139,7 @@ static inline struct ib_pool_fmr *ib_fmr
 static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
 {
int ret;
-   struct ib_pool_fmr *fmr;
+   struct ib_pool_fmr *fmr, *next;
LIST_HEAD(unmap_list);
LIST_HEAD(fmr_list);
 
@@ -158,6 +158,19 @@ static void ib_fmr_batch_release(struct 
 #endif
}
 
+   /* The free_list may hold FMRs that have been put there
+* because they haven't reached the max_remap count. We want
+* to invalidate their mapping as well!
+*/
+   list_for_each_entry_safe(fmr, next, pool-free_list, list) {
+   if (fmr-remap_count == 0)
+   continue;
+   hlist_del_init(fmr-cache_node);
+   fmr-remap_count = 0;
+   list_add_tail(fmr-fmr-list, fmr_list);
+   list_move(fmr-list, unmap_list);
+   }
+
list_splice(pool-dirty_list, unmap_list);
INIT_LIST_HEAD(pool-dirty_list);
pool-dirty_len = 0;

-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Olaf Kirch
Hi Roland,

On Wednesday 16 January 2008 22:54, Roland Dreier wrote:
 However I'm a little puzzled about how this can lead to memory
 corruption in practice: the only thing that flushing FMRs should do is
 make memory keys that should no longer be in use anyway become
 invalid.  So the only effect of this fix should be to expose a bug in
 your ULP by having some RDMA operation complete with a protection
 error -- and you're not relying on that behavior in normal operation,
 are you?  What am I missing?

The corruption happened when the process that allocated the MRs went
away in the middle of the operation. We would free the MR and invalidate
- and expect the in flight RDMA to error out. RDS does not know who is
doing RDMA to or from a MR at any given time.

There is a second potential issue however.

When RDS performs an RDMA, the initiator will queue two work requests -
one for the actual RDMA, immediately followed by a normal SEND with
a RDS packet. When the consumer sees that RDS packet, it will
release the MR to which the RDMA was directed.

Is that a safe thing to do? I found the spec a little unclear on
the ordering rules. It *seems* that RDMA writes are always fencing
against subsequent operations, and RDMA reads will fence if we ask
for it. But I'm not perfectly sure whether the ordering applies
to the sending system only, or if IB also guarantees that the
RDMA will have completed when it puts the incoming message on
the completion queue at the consumer.

If there is no such guarantee, then we have a second potential issue
in RDS wrt RDMA and memory corruption.

Thanks,
Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS problematic on RC2

2008-01-16 Thread Olaf Kirch
On Thursday 17 January 2008 04:15, Johann George wrote:
 We've been testing the OFED 1.3 pre-releases on a 12 node cluster here
 at UNH-IOL.  RDS seemed largely functional (other than problems we
 were aware of) on OFED 1.3 RC1.  When we installed RC2, RDS stopped
 working.  A dmesg indicates the following message repeatedly on the

Huh, scary. It works reasonably well here, though.

 console:
 
 RDS/IB: completion on 10.1.1.205 had status 9, disconnecting and reconnecting

That's a remote invalid request error. Were you testing with
RDMA or without? What user application were you using for testing?

 Note that this is using RDS over IB.  Our minimal experience with the
 non-IB version of RDS was worse.  We only tried it with RC1 and it
 crashed one of the two machines almost instantly.

Yes, the TCP part of RDS isn't being looked after very much, unfortunately.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

2008-01-16 Thread Olaf Kirch
On Wednesday 16 January 2008 22:54, Roland Dreier wrote:
 Thanks, good catch, and I applied this (except I removed the BUG_ON,
 since I don't think killing the system with minimal info available on
 how the counts got out of sync is that useful...)

Can you turn it into a rate limited printk instead? I'd prefer
some indication that things are askew over memory corruption from
dangling MRs that I have though long dead and gone.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg