Am 28.11.2014 um 03:59 hat Ming Lei geschrieben: > Hi Kevin, > > On Wed, Nov 26, 2014 at 10:46 PM, Kevin Wolf <kw...@redhat.com> wrote: > > This improves the performance of requests because an ACB doesn't need to > > be allocated on the heap any more. It also makes the code nicer and > > smaller. > > I am not sure it is good way for linux aio optimization: > > - for raw image with some constraint, coroutine can be avoided since > io_submit() won't sleep most of times > > - handling one time coroutine takes much time than handling malloc, > memset and free on small buffer, following the test data: > > -- 241ns per coroutine > -- 61ns per (malloc, memset, free for 128bytes)
Please finally stop making comparisons between completely unrelated things and trying to make a case against coroutines out of it. It simply doesn't make any sense. The truth is that in the 'qemu-img bench' case as well as in the highest performing VM setup for Peter and me, the practically existing coroutine based git branches perform better then the practically existing bypass branches. If you think that theoretically the bypass branches must be better, show us the patches and benchmarks. If you can't, let's merge the coroutine improvements (which improve more than just the case of raw images using no block layer features, including cases that benefit the average user) and be done. > I still think we should figure out a fast path to avoid cocourinte > for linux-aio with raw image, otherwise it can't scale well for high > IOPS device. > > Also we can use simple buf pool to avoid the dynamic allocation > easily, can't we? Yes, the change to g_slice_alloc() was a bad move performance-wise. > > As a side effect, the codepath taken by aio=threads is changed to use > > paio_submit_co(). This doesn't change the performance at this point. > > > > Results of qemu-img bench -t none -c 10000000 [-n] /dev/loop0: > > > > | aio=native | aio=threads > > | before | with patch | before | with patch > > ------+----------+------------+----------+------------ > > run 1 | 29.921s | 26.932s | 35.286s | 35.447s > > run 2 | 29.793s | 26.252s | 35.276s | 35.111s > > run 3 | 30.186s | 27.114s | 35.042s | 34.921s > > run 4 | 30.425s | 26.600s | 35.169s | 34.968s > > run 5 | 30.041s | 26.263s | 35.224s | 35.000s > > > > TODO: Do some more serious benchmarking in VMs with less variance. > > Results of a quick fio run are vaguely positive. > > I will do the test with Paolo's fast path approach under > VM I/O situation. Currently, the best thing to compare it against is probably Peter's git branch at https://github.com/plieven/qemu.git perf_master2. This patch is only a first step in a whole series of possible optimisations. Kevin