On Mon, Feb 01, 2010 at 02:04:22PM -0500, Michael Moore wrote:
> We recently saw some strange behvior in pvfs2-client-core when a server goes 
> away 
> (via segfault) and the client is unable to re-mount the filesystem. The 
> pvfs2-client-core process takes up 100% of a core just spinning on 
> process_vfs_request -> PVFS_sys_testsome and subsequent calls. Full backtrace 
> follows. 
> 
> In looking at the code in pvfs2-client-core it seems to  assume that the 
> re-mount 
> will always succeed (around line 3579). However, I don't know that it's the 
> root 
> cause of the issue. I'll continue looking but wondered if anyone had ideas on 
> this. 
> This appears to be re-creatable by:
> 1) cleanly mounting and using the filesystem for some I/O
> 2) either killing the servers or adding iptables rules to the client to 
> reject 
> traffic to the server,
> 3) Attemping I/O from the client

I neglected to mention the pvfs2-client-core must be killed after attempting 
I/O traffic to the 'failed' server as I only saw this behavior after the client 
core restarts. I'm still digging into the reason the client core segfaulted 
after a 
failed I/O flow.

Michael

> 
> The operation correctly dies with connection refused but the client begins to 
> spin 
> taking up CPU.
> 
> (gdb) bt
> #0  0x00511402 in __kernel_vsyscall ()
> #1  0x001d8023 in poll () from /lib/libc.so.6
> #2  0x0082ebed in PINT_dev_test_unexpected (incount=5, outcount=0xbf84e4f8, 
> info_array=0x8bbbc0, max_idle_time=10) at src/io/dev/pint-dev.c:398
> #3  0x00848f50 in PINT_thread_mgr_dev_push (max_idle_time=10) at 
> src/io/job/thread-mgr.c:332
> #4  0x00844caf in do_one_work_cycle_all (idle_time_ms=10) at 
> src/io/job/job.c:5238
> #5  0x008454a1 in job_testcontext (out_id_array_p=0xbf8515f0, 
> inout_count_p=0xbf8521f4, returned_user_ptr_array=0xbf851df4, 
> out_status_array_p=0xbf84e5f0, timeout_ms=10, context_id=0) at 
> src/io/job/job.c:4273
> #6  0x00857dba in PINT_client_state_machine_testsome (op_id_array=0xbf8522a8, 
> op_count=0xbf8528c4, user_ptr_array=0xbf8527a8, error_code_array=0xbf8526a8, 
> timeout_ms=10) at src/client/sysint/client-state-machine.c:756
> #7  0x00857fb9 in PVFS_sys_testsome (op_id_array=0xbf8522a8, 
> op_count=0xbf8528c4, user_ptr_array=0xbf8527a8, error_code_array=0xbf8526a8, 
> timeout_ms=10) at src/client/sysint/client-state-machine.c:971
> #8  0x08050cd6 in process_vfs_requests () at 
> src/apps/kernel/linux/pvfs2-client-core.c:3119
> #9  0x08052658 in main (argc=10, argv=0xbf852a74) at 
> src/apps/kernel/linux/pvfs2-client-core.c:3579
> 
> Let me now if more information is needed, thanks for the input!
> 
> Michael
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to