[Pvfs2-developers] server crash on startup with millions of files

Phil Carns Tue, 20 Feb 2007 04:13:52 -0800

Hi guys,

We have run into a problem recently with a configuration that looks likethis:


- x86_64 architecture
- 16 servers
- SAN based storage
- approximately 1.4 million files on PVFS

Everything works fine, except when we stop and then later restart one ofthe pvfs2-server daemons. At least one of them usually (but not quitealways) crashes before the file system is ready to be mounted.

We captured a core file and can see that it died on this assertion inthe dbpf_dspace_test() function:


dbpf-dspace.c:1371
assert(!dbpf_op_queue_empty(dbpf_completion_queue_array[context_id]));

According to the stack trace, this test() call followed atrove_dspace_iterate_handles() call within thetrove_check_handle_ranges() function. This is part of the logic onstartup that scans all of the handles in the storage space to update thelist of available/used handles in trove-handle-mgmt.

We found that we can completely work around the problem by manuallysetting the coll_p->immediate_completion flag during thetrove_check_handle_ranges() function. That forces the iterate_handles()function to do all of its processing up front without using a testfunction. There is just some sort of bad interaction when the twofunctions are used together.

As a side note, setting the "ImmediateCompletion" config file optiondoes not work around the problem, because that flag does not take effectuntil after this assertion occurs. The set_info calls in pvfs2-serverjust happen to be in the wrong order. We would probably not have usedthis approach anyway, because we haven't fully tested the performanceimpact of enabling immediate completion for everything.

Anyone have any suggestions about what the real problem is here? Whilethe workaround is fine to keep us running for now, it seems like thereis an underlying issue to be addressed.

I apologize that I don't have an exact stack dump to paste in the email,but if we need any further information from the core file I think I canstill get it loaded up on another machine to look at.

Oh, and one other detail; the memory usage of the servers looks fineduring startup, so this doesn't appear to be a memory leak. There isquite a bit of CPU work, but I am guessing that is just berkeley dbkeeping busy in the iteration function.


thanks,
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] server crash on startup with millions of files

Reply via email to