[EMAIL PROTECTED] wrote on Wed, 01 Nov 2006 10:25 -0600:
> We've been playing with various FlowBufferSizes on the servers as well 
> as varying the stripe_size when opening/modifying files for our 
> benchmarking.  I ran across this error that caused our tests to hang, 
> not sure where this should go, but it's reproducable with our setup.  I 
> was wondering if anyone could tell me if there's an obvious problem with 
> this setup:
> (using mellanox ddr card)
> FlowBufferSize 16MB
> stripe_size 256KB
> 6 data servers

Sorry it took me so long to look at this.  In testing here using
pvfs2-cp with the stripe size you indicate, 6 servers with your
FlowBufferSize, 1 MD server, and 1 client, everything works.  Both
reads and writes.  This is only Mellanox SDR.

> This problem occurs whenever we pick 256KB as a stripe size, however, it 
> doesnt show up @ 64KB, or 1M or more (testing 512K right now to see if 
> it occurs).  We also noticed that in general using 256KB stripes causes 
> weird things, like eHCA errors which bring down the server completely...
> 
> (this shows up in the logs of all servers)
> [E 10:09:51.709387] Warning: openib_check_async_events: 
> IBV_EVENT_QP_ACCESS_ERR.

Something tried to submit a work request when the QP was not able to
take it, like it was in a "down" state.

> dmesg returns this: (on every server)
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
> ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.

Transitioning from RTS to SQD failed for some reason.  The status
number is not interpretable by mere mortals.

No clue why the servers had a problem doing this.  The SQD state is
used to flush all pending WRs when a connection closes, like when
the server has finished communicating with the client.  That
kernel-reported failure is certainly the root of the problem, but
I've never seen such a thing, and can't figure out what went wrong.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to