Re: [Pvfs2-developers] patches: tuning options

Phil Carns Thu, 10 Aug 2006 13:27:44 -0700

Sam Lang wrote:

On Aug 10, 2006, at 4:04 PM, Phil Carns wrote:
flow-proto-tuning.patch:
-----------
This patch adds "FlowBufferSizeBytes" and "FlowBuffersPerFlow"options to the configuration file format. They allow you to specifythe buffer size that the default flow protocol will use as well asthe maximum number of buffers to use per flow. Note that if youchange either of these parameters, then you need to remount anyactive clients so that they pick up the configuration change beforeperforming any I/O.
max-aio.patch:
----------
This patch adds "TroveMaxConcurrentIO" to the configuration fileformat. It allows you to specify the maximum number of I/Ooperations that trove will allow to proceed concurrently (currently16). Note from the previous email regarding AIO that depending onyour access pattern, AIO may queue all of your operations anywayregardless of this setting. It probably doesn't have much effectunless you are accessing more than one file at a time, or if you areusing an alternative to the stock AIO implementation.
I had made the same change in Julian's branch,


Oh, ok.  We will switch over when that stuff hits trunk.

there are still a couplethings that aren't clear to me about this max value though. First, itsa global value for all outstanding lio_listio calls the pvfs servermakes, but based on your previous email comments about glibc'sone-thread-per-fd oddity, it seems like we only want that value to maxout per datafile. Also, after we hit the max we just queue theoperations and post them once current ops have completed. If librtjust queues ops and does them in FIFO order though, its pretty much thesame thing. Why not just let librt handle the queuing? If we were todo ordering of the operations based on offsets, then it would makesense for us to queue, but we don't. Are we better at queueing thanlibrt?

I agree that if you are using librt for aio, then this max value isn'tdoing much of anything :) librt's queueing isn't exactly the same thingthough. librt allows N operations in flight at a time (where N can betuned using aio_init) by way of limiting the maximum number of threadsthat it will spawn. However, since it serializes on each fd, that limitnever kicks in unless you are accessing N different files. Otherwise itis really only going to do one thing at a time. The librt source that Ilooked at happened to default N=16, just like Trove was.

I think the point of the aio limit in trove was to try to throttle I/Oon the servers, but it turns out that librt was already throttling abovean beyond; so the trove limit wasn't actually decreasing the number ofposted I/O operations to the kernel any.

Maybe the throttling makes more sense when you bypass librt somehow (asin the previous patch) because then there is nothing to queue/throttlethe operations besides trove?

At any rate, we decided to make this configurable before understandingthe issues involved- it was just a hardcoded value we saw that lookedlike it should have been tunable.

I know Julian was looking at performance of aio and found resultssomewhere (I don't have a reference, sorry) that showed lio_listio didbetter in cases where multiple fds were passed to one lio_listiooperation (right now we just do one fd with multiple segments to onelio_listio). I wonder if that difference is based on the glibc queuingbehavior that you describe.

I would guess that the queueing behavior is the reason. I can't imaginethat using seperate files would make much difference once you get to thesystem call level.

Just a curiousity, but I wonder if the aioperformance would change if we were to post multiple trove operationsin the same lio_listio call, or possibly even break up the bstream intomultiple files based on strip size...sounds crazy right? :-)

On the former question, I guess it depends on who is better acoalescing- the kernel disk scheduler or the trove queue? Hopefully wecan avoid splitting files up :)


-Phil

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] patches: tuning options

Reply via email to