Re: [Pvfs2-developers] threaded client-core and the device thread

Murali Vilayannur Fri, 13 Oct 2006 20:00:42 -0700

Hi Sam,

Dean and I are looking at trying to push the efficiency of requestsfrom the kernel module up through the device to client-core. I addedthe --threaded option to the client to allow the client-core to runwith multiple threads (one each for bmi, dev, and main -- and also aremount thread, but lets ignore that for now), so the device threadshould be able to keep pulling requests of the device without havingto wait for bmi operations to complete.

Cool!

This could address some of the performance problems that Phil also hadpointed a while back where multiple outstanding requests were slowerthan a single outstanding request.

PINT_dev_test_unexpected takes an incount of 5, so its only going toread at most 5 requests off the device for each call. Once itreturns, each of the unexpected requests is added to the completedjobs array and then we signal the jobs completed condition variable_for each request_. It seems like this will be 5x the number ofcontext switches between the device thread and the main thread that weneed.
Also, we poll every time before reading another request off thedevice. What about trying to read a number of requests off the deviceat once with one read (or possibly a readv so we can keep separatebuffers per request).

Hmm.. both of these are good points. I had dabbled with doing a readv awhile back. It might make a difference although I suspect this might bein the noise region sinceif there are requests to be serviced, poll() will only take the time ofa syscall which should be pretty fast these days.. but worth a shot.

Also, it looks like we do a malloc for each new request buffer, andthen a free once we're done with it, and a memset of the info struct.It seems like we could manage the buffers on the stack instead of theheap, and save on a few system calls there.

Now we are definitely in the noise region.. :) just kidding. glibc'smalloc implementation should typically amortize overheads in invokingsystem calls (sbrk etc).

For both threaded and nonthreaded, with the workload that Dean isusing, he found that the PINT_dev_test_unexpected always returned 5requests in the outcount. So it looks like there are always requestssitting on the device, waiting to be read by client-core. Are we justnot able to process requests fast enough through BMI and the statemachines, or is the cost of polling and signaling every time we read arequest off the device slowing us down? In other words, does it makesense to rework the code a little bit or will we just get bottleneckedelsewhere?

It is definitely interesting to try all this out, but I am not sure ifthe bottlenecks are here or elsewhere.

What does this workload do by the way?

thanks,
Murali
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] threaded client-core and the device thread

Reply via email to