[EMAIL PROTECTED] wrote on Wed, 01 Nov 2006 10:25 -0600:
> We've been playing with various FlowBufferSizes on the servers as well
> as varying the stripe_size when opening/modifying files for our
> benchmarking. I ran across this error that caused our tests to hang,
> not sure where this should go, but it's reproducable with our setup. I
> was wondering if anyone could tell me if there's an obvious problem with
> this setup:
> (using mellanox ddr card)
> FlowBufferSize 16MB
> stripe_size 256KB
> 6 data servers
Sorry it took me so long to look at this. In testing here using
pvfs2-cp with the stripe size you indicate, 6 servers with your
FlowBufferSize, 1 MD server, and 1 client, everything works. Both
reads and writes. This is only Mellanox SDR.
> This problem occurs whenever we pick 256KB as a stripe size, however, it
> doesnt show up @ 64KB, or 1M or more (testing 512K right now to see if
> it occurs). We also noticed that in general using 256KB stripes causes
> weird things, like eHCA errors which bring down the server completely...
>
> (this shows up in the logs of all servers)
> [E 10:09:51.709387] Warning: openib_check_async_events:
> IBV_EVENT_QP_ACCESS_ERR.
Something tried to submit a work request when the QP was not able to
take it, like it was in a "down" state.
> dmesg returns this: (on every server)
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
Transitioning from RTS to SQD failed for some reason. The status
number is not interpretable by mere mortals.
No clue why the servers had a problem doing this. The SQD state is
used to flush all pending WRs when a connection closes, like when
the server has finished communicating with the client. That
kernel-reported failure is certainly the root of the problem, but
I've never seen such a thing, and can't figure out what went wrong.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers