Hi Sam,

Thanks for the suggestions and for the insight on the error codes.

I think tomorrow I'll try to replicate the problem we saw in a simpler single server environment (the file system we saw this on is busy now). That might make it easier to step through your suggestions, starting with just upgrading to a newer version. I didn't realize that those error code changes might have an impact here.

-Phil

Sam Lang wrote:

On Feb 20, 2007, at 6:29 AM, Phil Carns wrote:

Hi guys,

We have run into a problem recently with a configuration that looks like this:

- x86_64 architecture
- 16 servers
- SAN based storage
- approximately 1.4 million files on PVFS

Everything works fine, except when we stop and then later restart one of the pvfs2-server daemons. At least one of them usually (but not quite always) crashes before the file system is ready to be mounted.

We captured a core file and can see that it died on this assertion in the dbpf_dspace_test() function:

dbpf-dspace.c:1371
assert(!dbpf_op_queue_empty(dbpf_completion_queue_array[context_id]));

According to the stack trace, this test() call followed a trove_dspace_iterate_handles() call within the trove_check_handle_ranges() function. This is part of the logic on startup that scans all of the handles in the storage space to update the list of available/used handles in trove-handle-mgmt.

We found that we can completely work around the problem by manually setting the coll_p->immediate_completion flag during the trove_check_handle_ranges() function. That forces the iterate_handles() function to do all of its processing up front without using a test function. There is just some sort of bad interaction when the two functions are used together.

As a side note, setting the "ImmediateCompletion" config file option does not work around the problem, because that flag does not take effect until after this assertion occurs. The set_info calls in pvfs2-server just happen to be in the wrong order. We would probably not have used this approach anyway, because we haven't fully tested the performance impact of enabling immediate completion for everything.

Anyone have any suggestions about what the real problem is here? While the workaround is fine to keep us running for now, it seems like there is an underlying issue to be addressed.


Hi Phil,

It looks like the completion queue is empty but the state is set to OP_COMPLETED, which we assert shouldn't ever happen. In the dbpf thread function, we essentially add anything to the completion queue thats either DBPF_OP_COMPLETE (1) or an error (which we assume to be negative). We leave 0 (DBPF_OP_CONTINUE) and non-negative values for operations that need to be re-queued. There's a special case I've seen before though, where a DB call returns an error that the dbpf_db_error_to_trove_error function doesn't recognize as a DB error to translate and so returns -4243, but in the dspace code (including iterate_handles), we do:

ret = -dbpf_db_error_to_trove_error(db_ret);

so ret ends up being positive. I've tried to fix this in a recent version of the 2.6 branch and head, by checking that the error isn't -4243 or 4243 in the thread code, but I think for older versions the op gets added back to the queue or just ends up in la-la land.

In any case, it _might_ help to upgrade to the latest HEAD or 2.6 branch if possible. Also, you could test my theory by adding an assertion for anything that isn't DBPF_OP_COMPLETE in the dbpf- thread.c:dbpf_do_one_work_cycle function.

If my theory is correct, then the next question is why db is returning an error that trove doesn't understand? Did you upgrade berkeley db? What's the actual error and why is iterate_handles causing it?

If this isn't the problem, it would be helpful to know what the return value is from iterate_handles_op_svc.

The changes I made to dbpf-thread.c are at:

http://www.pvfs.org/fisheye/browse/PVFS/src/io/trove/trove-dbpf/dbpf- thread.c?r1=1.36&r2=1.37

I defined DPBF_ERROR_UKNOWN to 4243.

-sam


I apologize that I don't have an exact stack dump to paste in the email, but if we need any further information from the core file I think I can still get it loaded up on another machine to look at.

Oh, and one other detail; the memory usage of the servers looks fine during startup, so this doesn't appear to be a memory leak. There is quite a bit of CPU work, but I am guessing that is just berkeley db keeping busy in the iteration function.

thanks,
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to