Hi Dean,
Searching for a slot out of a 5 element array should not be that slow (I hope :)) The only other thing I can think of: by breaking down buffers into 4 KB chunks (page size chunks that is) for copying, we could be slowing down when there are many such buffers/producer threads. Hmm.. oh well I am handwaving. I will dig a little bit later this week and see if I can come up with something concrete.
thanks,
Murali

Yes, this is exactly what we are seeing. I have never tried a single buffer, but looking a bit into the kernel code I can see some areas that are a little inefficient. 1) In wait_for_a_slot <ident?v=pvfs2;i=wait_for_a_slot>, while holding a spinlock, a thread must linearly search through all buffers to find an available buffer. If one is found, then fine. If all buffers are full (which is probably the common case if doing large I/O), it sleeps until woken up, at which point it starts all over again, rescanning the entire list for a buffer. Technically, we could have starvation for a thread, who continually gets unlucky about finding an empty buffer. My guess would be that this code is small should happen really fast no matter what, but who knows.....

2) With multiple buffers, the threads will be fighting over using kmap to copy the data to the mmapped buffer. From my understanding of the kernel (which may be outdated), there are very few kmap spinlocks available, effectively serializing the process of copying data into the mmapped buffers. As we increase the number of buffers, this contention will increase and the time to copy the data for any single buffer will increase.


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to