Can anyone make any sense of this?
I have a feeling these are related to the hangups I'm having w/o the client interface in openib. This is built off of latest cvs head. 6 server nodes, 1 client node. mounted via pvfs2-client over openib.

I did a `killall pvfs2-client-core` and thats where the process exit status message comes from.. This appears to have kicked the process out of the hang state that it was in. And I'm able to do fs operations on it again, but it will lock up eventually.. (every other time I do an op on the mount?)

Any other debian+openib+pvfs2 users out there?

Log message from the client:

[E 10:19:33.127182] fp_multiqueue_cancel: flow proto cancel called on 0x10151cf0 [E 10:19:33.127283] handle_io_error: flow proto error cleanup started on 0x10151
cf0, error_code: -1610612737
[E 10:19:33.127394] handle_io_error: flow proto 0x10151cf0 canceled 1 operations
, will clean up.
[E 10:19:33.127420] fp_multiqueue_cancel: flow proto cancel called on 0x10152bc0 [E 10:19:33.127442] handle_io_error: flow proto error cleanup started on 0x10152
bc0, error_code: -1610612737
[E 10:19:33.127529] handle_io_error: flow proto 0x10152bc0 canceled 1 operations
, will clean up.
[E 10:19:33.127553] fp_multiqueue_cancel: flow proto cancel called on 0x10153328 [E 10:19:33.127576] handle_io_error: flow proto error cleanup started on 0x10153
328, error_code: -1610612737
[E 10:19:33.127664] handle_io_error: flow proto 0x10153328 canceled 1 operations
, will clean up.
[E 10:19:33.129177] handle_io_error: flow proto 0x10151cf0 error cleanup finishe
d, error_code: -1610612737
[E 10:19:33.129220] handle_io_error: flow proto 0x10152bc0 error cleanup finishe
d, error_code: -1610612737
[E 10:19:33.129254] handle_io_error: flow proto 0x10153328 error cleanup finishe
d, error_code: -1610612737
[E 10:20:42.852432] fp_multiqueue_cancel: flow proto cancel called on 0x104a1f58 [E 10:20:42.852485] handle_io_error: flow proto error cleanup started on 0x104a1
f58, error_code: -1610612737
[E 10:20:42.852602] handle_io_error: flow proto 0x104a1f58 canceled 1 operations
, will clean up.
[E 10:20:42.853082] handle_io_error: flow proto 0x104a1f58 error cleanup finishe
d, error_code: -1610612737
[E 10:21:23.104434] fp_multiqueue_cancel: flow proto cancel called on 0x104a17f0 [E 10:21:23.104487] handle_io_error: flow proto error cleanup started on 0x104a1
7f0, error_code: -1610612737
[E 10:21:23.104600] handle_io_error: flow proto 0x104a17f0 canceled 1 operations
, will clean up.
[E 10:21:23.104665] handle_io_error: flow proto 0x104a17f0 error cleanup finishe
d, error_code: -1610612737
[E 10:23:35.657241] pvfs2-client-core with pid 22269 exited with value 0
[E 10:26:44.473556] fp_multiqueue_cancel: flow proto cancel called on 0x10152150 [E 10:26:44.473654] handle_io_error: flow proto error cleanup started on 0x10152
150, error_code: -1610612737
[E 10:26:44.473765] handle_io_error: flow proto 0x10152150 canceled 1 operations
, will clean up.
[E 10:26:44.473820] handle_io_error: flow proto 0x10152150 error cleanup finishe
d, error_code: -1610612737
[E 10:31:45.812544] job_time_mgr_expire: job time out: cancelling bmi operation,
job_id: 115663.
[E 10:31:45.813455] job_time_mgr_expire: job time out: cancelling bmi operation,
job_id: 115665.
[E 10:32:12.253541] job_time_mgr_expire: job time out: cancelling bmi operation,
job_id: 115695.
[E 10:32:12.253590] job_time_mgr_expire: job time out: cancelling bmi operation,
job_id: 115697.

Hopefully this can makes sense to someone else, its not english to me :(

   -- Kyle

--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to