Hi Dean,
Searching for a slot out of a 5 element array should not be that slow (I
hope :))
The only other thing I can think of: by breaking down buffers into 4 KB
chunks (page size chunks that is)
for copying, we could be slowing down when there are many such
buffers/producer threads.
Hmm.. oh well I am handwaving. I will dig a little bit later this week
and see if I can come up with something concrete.
thanks,
Murali
Yes, this is exactly what we are seeing. I have never tried a single
buffer, but looking a bit into the kernel code I can see some areas
that are a little inefficient.
1) In wait_for_a_slot <ident?v=pvfs2;i=wait_for_a_slot>, while holding
a spinlock, a thread must linearly search through all buffers to find
an available buffer. If one is found, then fine. If all buffers are
full (which is probably the common case if doing large I/O), it sleeps
until woken up, at which point it starts all over again, rescanning
the entire list for a buffer. Technically, we could have starvation
for a thread, who continually gets unlucky about finding an empty
buffer. My guess would be that this code is small should happen
really fast no matter what, but who knows.....
2) With multiple buffers, the threads will be fighting over using kmap
to copy the data to the mmapped buffer. From my understanding of the
kernel (which may be outdated), there are very few kmap spinlocks
available, effectively serializing the process of copying data into
the mmapped buffers. As we increase the number of buffers, this
contention will increase and the time to copy the data for any single
buffer will increase.
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers