Pete -

We've been playing with various FlowBufferSizes on the servers as well as varying the stripe_size when opening/modifying files for our benchmarking. I ran across this error that caused our tests to hang, not sure where this should go, but it's reproducable with our setup. I was wondering if anyone could tell me if there's an obvious problem with this setup:
(using mellanox ddr card)
FlowBufferSize 16MB
stripe_size 256KB
6 data servers

This problem occurs whenever we pick 256KB as a stripe size, however, it doesnt show up @ 64KB, or 1M or more (testing 512K right now to see if it occurs). We also noticed that in general using 256KB stripes causes weird things, like eHCA errors which bring down the server completely...

(this shows up in the logs of all servers)
[E 10:09:51.709387] Warning: openib_check_async_events: IBV_EVENT_QP_ACCESS_ERR. [E 10:10:22.026509] job_time_mgr_expire: job time out: cancelling flow operation
, job_id: 2028006.
[E 10:10:22.026567] fp_multiqueue_cancel: flow proto cancel called on 0x2aaaabc2
1d20
[E 10:10:22.026578] handle_io_error: flow proto error cleanup started on 0x2aaaa
bc21d20, error_code: -1610612737
[E 10:10:22.035861] handle_io_error: flow proto 0x2aaaabc21d20 canceled 1 operat
ions, will clean up.
[E 10:10:22.036661] handle_io_error: flow proto 0x2aaaabc21d20 error cleanup fin
ished, error_code: -1610612737

dmesg returns this: (on every server)
ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.
ib_mthca 0000:01:00.0: modify QP 3->4 returned status 10.

Any ideas?

thanks
   --Kyle

--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to