Re: [Pvfs2-developers] threaded client-core and the device thread

Phil Carns Mon, 16 Oct 2006 06:25:48 -0700

Sam Lang wrote:

Hi All,
Dean and I are looking at trying to push the efficiency of requestsfrom the kernel module up through the device to client-core. I addedthe --threaded option to the client to allow the client-core to runwith multiple threads (one each for bmi, dev, and main -- and also aremount thread, but lets ignore that for now), so the device threadshould be able to keep pulling requests of the device without having towait for bmi operations to complete.
I noticed a couple things with the device thread that I wanted to askabout.
PINT_dev_test_unexpected takes an incount of 5, so its only going toread at most 5 requests off the device for each call. Once it returns,each of the unexpected requests is added to the completed jobs arrayand then we signal the jobs completed condition variable _for eachrequest_. It seems like this will be 5x the number of context switchesbetween the device thread and the main thread that we need.
Also, we poll every time before reading another request off thedevice. What about trying to read a number of requests off the deviceat once with one read (or possibly a readv so we can keep separatebuffers per request).
Also, it looks like we do a malloc for each new request buffer, andthen a free once we're done with it, and a memset of the info struct.It seems like we could manage the buffers on the stack instead of theheap, and save on a few system calls there.
For both threaded and nonthreaded, with the workload that Dean isusing, he found that the PINT_dev_test_unexpected always returned 5requests in the outcount. So it looks like there are always requestssitting on the device, waiting to be read by client-core. Are we justnot able to process requests fast enough through BMI and the statemachines, or is the cost of polling and signaling every time we read arequest off the device slowing us down? In other words, does it makesense to rework the code a little bit or will we just get bottleneckedelsewhere?

I am just speculating, but out of the things you list I would guess thatthese two things would be most likely to show improvement without muchcoding effort:

- increasing the testcount to something higher than 5 (since it soundslike that is getting maxed out for this workload)

- fixing the "signalling on every request problem"

The need for multiple reads and the mallocs could be a problem, but I amwith Murali in that I think problems in this area are more likelyrelated to inefficient threading or I/O stalls rather than CPU or memoryoverhead.


-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] threaded client-core and the device thread

Reply via email to