On 11/26/2015 10:36 AM, Christian Borntraeger wrote: > On 11/24/2015 07:00 PM, Paolo Bonzini wrote: >> This large series is basically all that I would like to get into 2.6. >> It is a combination of several pieces of work on dataplane and >> multithreaded block layer. >> >> It's also a large part of why I would like someone else to look at >> miscellaneous patches for a while (in case you've missed that). I >> can foresee that following the reviews is going to be a huge time drain. >> >> With it I can get ~1300 Kiops on 8 disks (which I achieve with 2 iothreads >> and 5 VCPUs). The bulk of the improvement actually comes from the first >> 8 patches, but the rest of the series is what prepares for what's next >> to come in QEMU 2.7 and later, such as a multiqueue block layer. >> >> It's tedious to review, with some pretty large patches (3, 32, 33, 35). > On 11/24/2015 07:00 PM, Paolo Bonzini wrote: >> This large series is basically all that I would like to get into 2.6. >> It is a combination of several pieces of work on dataplane and >> multithreaded block layer. >> >> It's also a large part of why I would like someone else to look at >> miscellaneous patches for a while (in case you've missed that). I >> can foresee that following the reviews is going to be a huge time drain. >> >> With it I can get ~1300 Kiops on 8 disks (which I achieve with 2 iothreads >> and 5 VCPUs). The bulk of the improvement actually comes from the first >> 8 patches, but the rest of the series is what prepares for what's next >> to come in QEMU 2.7 and later, such as a multiqueue block layer. > > For some unknown reason, this seems to be slightly slower than 2.5-rc1 on my > old z196. (have not net tested the z13) > > your branch is certainly better regarding malloc, but worse regarding others.
Using the first 8 patches or so (commit be2f6b163e2b2a604f52a258fd932142c5974ffe vring: slim down allocation of VirtQueueElements) is slightly faster than 2.5.0-rc1, so the regression seems to come from some of the later patches.