Re: [Pvfs2-developers] Trove patches

Sam Lang Thu, 20 Jul 2006 12:57:35 -0700


On Jul 18, 2006, at 4:00 PM, Julian Martin Kunkel wrote:

Hi,
enclosed you will find patches for the following issues:

Major changes:
* sync-coalesce:
If the last couple of operations which need to be synced finishwith an errorthen the other operations can be stalled, too due to the handlingby therequest scheduler. (I have found this bug a couple of weeks ago butdid notknow what caused the race condition). It took me quite a while totrack that
bug...

Cool. We should try to incorporate the performance tests you havethat show both success and failure numbers into CVS somehow...

The policy right now ensures that error requests never are enqueued.
(Note: should we flush the db in cases of an error ?)

Maybe if we could differentiate between user errors (ENOENT) andunexpected system errors (ENOMEM, etc.), we could do that.Unfortunately, right now we don't differentiate, so some fellow mightcome along and do an ls on a directory or file that doesn't exist,get an error, causing a flush of the dbs, and stalling the otherfile creates that another user or processing is trying to do. Iwould argue for not flushing for now.

*deleting in the background
Files are during the deletion renamed an deleted in the backgroundto shorten
response time of the server in case the file is very big...
Note: people have to put the storage dir into one filesystem.

Sorry if I missed previous conversations about this, but doesn'tunlink just free the inode? Wouldn't rename take just as long if notlonger..

* Trove multique support:
Now there is a thread for metadata ro, metadata rw, I/O and fordeleting in
the background.
This patch improves the throughput for read only ops, while writeops happen.Usually read ops are expected to be cached effectively, while allwrite ops
force disk operations.

This looks good. I have some comments about some specific pieces ofthe code, but I'll send a separate email with the comments inlined inthe patch for those.

* Trove queues are changed to support an internal number of queuedelems.
Also I removed a few functions, added more prefixes with _nolock.
The make move_op_to_completion_queue is removed also some otherfunctions arereplaced with dbpf_move_op_to_completion_queue anddbpf_op_pop_front_nolock.* dbpf_dspace_cancel modified to guarantee that operation is notfinished
right now, also change to use id_safe_gen instead of fast_gen.

AFAICT, the only case where id_gen_safe_register _should_ be used, isin returning an id back through the sysint calls. This is especiallyneeded for pvfs2-client-core, where the id is then passed to thekernel module, so it must be an opaque 64 bit value. In all theother cases (pvfs internal use), we are really just passing aroundpointers. If the pointers somehow become invalid, using gen_safeinstead of gen_fast isn't going to help us. I would argue that theother internal uses (it looks like there are a couple) of gen_safeshould be changed to gen_fast.

Overall this does cleanup the dbpf code nicely, so good work Julian.Hopefully we can get some really good performance numbers out of allthis. There's still some style quirks in there in different places,but those will go away over time, right? ;-)


-sam

This changes
make the interface more useful and remove some dependencies on theusage fromthe upper layers. Also test, testsome changed that they can becalled on the
same time without possible memleaks / segfaults.

Minor changes:
* added define for mkdir syscall in dbpf.h
* added dbpf_op_get_status (and set status) to change status ofreturn value
atomically (this reduces the lines of code and makes the calls more
consistant).
*Stripped out non threaded Trove code.
* Enhanced request scheduler debugging
Added a new function to pvfs2-internal.h server_op_to_str, whichallows to
fancy output the name of the op. The function is implemented in
PINT-reqproto-encode.c, maybe this is not the right place for thefunction ?This is used in the request scheduler, which now prints the wholequeue oneach enqueue op with current states of all ops for that particularhandle
(e.g. serviced or queued).
* performance debug:
I want to add a performance debugging option which allows to printa couple ofinteresting metrics on the server side which might be analysed postmortem.
(e.g. number of elements in the trove queues etc.)
For example one might run a skript to determine the timedistribution of syncrequests or I/O requests. I haven't added other logic yet but will,soon.
* replaced fsync with fdatasync,
this might help on some journaling filesystems to improvethroughput because
metadata is not forced to be written.

enjoy,
Julian
<patchTrove.patch>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] Trove patches

Reply via email to