[EMAIL PROTECTED] wrote on Mon, 16 Oct 2006 17:40 -0500:
> We have modified an existing application to directly call
> libpvfs2. Our pvfs2 setup has 6 servers and is setup to run pvfs2
> over OpenIB verbs. We borrowed the code more or less from pvfs2-cp.
> This seems to work and we have had several successful runs. However
> we have also had a couple of hangs on one node. The traceback for the
> hang is:
>
> #0 0x00002ab9874a34bf in poll () from /lib/libc.so.6
> #1 0x0000000001cbea67 in BMI_ib_testcontext ()
> #2 0x0000000001c8feb4 in BMI_testcontext ()
> #3 0x0000000001c99624 in PINT_thread_mgr_bmi_push ()
> #4 0x0000000001c950d3 in do_one_work_cycle_all ()
> #5 0x0000000001c95883 in job_testcontext ()
> #6 0x0000000001ca37e4 in PINT_client_state_machine_test ()
> #7 0x0000000001ca3c00 in PINT_client_wait_internal ()
> #8 0x0000000001c7df71 in PVFS_sys_io ()
[..]
>
> Eventually we timeout and die. So the first question is do you have
> any suggestions as to where to look for the cause of the hang? That
> is a write, but I have seen it fail now during a read as well (it
> died on the 12th pass through after reading the complete file 11 times).
This is where the BMI IB device goes when it has nothing better to
do. It is waiting on a completion event from the NIC to tell it
there is some action. The poll() is actually never more than 10 ms
long, then the code loops back up into job_testcontext() but goes
right back down to check the device.
Given the history with Kyle and your machines, I'm suspecting the
network is broken and losing messages again. Make sure pairwise
netpipe-ib produces identical latency and throughput numbers for all
machines before trying to run any apps.
> We have also have a problem when running on our IBM EHCA's with too
> many memory registrations. The odd part is that I am using the same
> 1MB buffer all time so I don't see why it seems to be reregistered at
> each write.
I think Troy sent mail to the IBM guys asking why the kernel
complains about some registrations. But as far as happening on
every write, it shouldn't. The BMI IB code caches memory
registrations. As long as you pass in the same myBuffer, it
should hit in the registration cache for all after the first time.
You can run your code with the environment variables
PVFS2_DEBUGMASK=network PVFS2_DEBUGFILE=debug.out
and look for messages like
memcache_register: hit [%d] %p len %lld (via %p len %lld) ...
and compare the first pointer value to your myBuffer. A hit
is good. If you get messages that say "miss", please send me the
trace.
I'll let other pvfs types think about the contiguous request
question and choices of stripe sizes. I don't know.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers