Re: [Qemu-devel] Qemu and 32 PCIe devices

Paolo Bonzini Wed, 09 Aug 2017 00:33:56 -0700

On 09/08/2017 03:06, Laszlo Ersek wrote:
>>   20.14%  qemu-system-x86_64                  [.] render_memory_region
>>   17.14%  qemu-system-x86_64                  [.] subpage_register
>>   10.31%  qemu-system-x86_64                  [.] int128_add
>>    7.86%  qemu-system-x86_64                  [.] addrrange_end
>>    7.30%  qemu-system-x86_64                  [.] int128_ge
>>    4.89%  qemu-system-x86_64                  [.] int128_nz
>>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
>>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc


Yes, this is the O(n^3) thing.  An optimized build should be faster
because int128 operations will be inlined and become much more efficient.

> With this patch, I only tested the "93 devices" case, as the slowdown
> became visible to the naked eye from the trace messages, as the firmware
> enabled more and more BARs / command registers (and inversely, the
> speedup was perceivable when the firmware disabled more and more BARs /
> command registers).

This is an interesting observation, and it's expected.  Looking at the
O(n^3) complexity more in detail you have N operations, where the "i"th
operates on "i" DMA address spaces, all of which have at least "i"
memory regions (at least 1 BAR per device).

So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3).
Expressing it as a sum shows why it gets slower as time progresses.

The solution is to note that those "i" address spaces are actually all
the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2).

Thanks,

Paolo

Re: [Qemu-devel] Qemu and 32 PCIe devices

Reply via email to