Mark McLoughlin wrote:
My machine has four CPUs, with two 6Mb L2 caches - each cache is shared
between two of the CPUs, so I set things up as follows:
pcpu#3 - netserver, I/O thread, vcpu#0
pcup#4 - vcpu#1, virtio_net irq, netperf
which (hopefully) ensures that we're only doing one copy usi
Mark McLoughlin wrote:
> On Tue, 2008-11-04 at 08:23 -0700, David S. Ahern wrote:
>> Mark McLoughlin wrote:
>>
>>> Note also that when tuning for a specific workload, which CPU
>>> the I/O thread is pinned to is important.
>>>
>> Hi Mark:
>>
>> Can you give an example of when that has a noticeabl
On Tue, 2008-11-04 at 08:23 -0700, David S. Ahern wrote:
>
> Mark McLoughlin wrote:
>
> > Note also that when tuning for a specific workload, which CPU
> > the I/O thread is pinned to is important.
> >
>
> Hi Mark:
>
> Can you give an example of when that has a noticeable affect?
>
> For exam
Mark McLoughlin wrote:
> Note also that when tuning for a specific workload, which CPU
> the I/O thread is pinned to is important.
>
Hi Mark:
Can you give an example of when that has a noticeable affect?
For example, if the guest handles network interrupts on vcpu0 and it is
pinned to pcpu0
On Fri, 2008-10-31 at 09:16 +, Mark McLoughlin wrote:
> +static void virtio_net_tx_bh(void *opaque)
> {
> VirtIONet *n = opaque;
>
> -n->tx_timer_active = 0;
> -
> -/* Just in case the driver is not ready on more */
> -if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
> -
Mark McLoughlin wrote:
By removing the tx timer altogether and doing all the copies in the
I/O thread, we can keep the I/O churning away in parallel with the
guest generating more I/O.
On a multi-socket machine, you may also be doing the copy on the wrong
cache. We're also now increasing
On Thu, 2008-10-30 at 14:24 -0500, Anthony Liguori wrote:
> Instead of using an event fd, perhaps you could just schedule a bottom
> half? I think that would be a whole lot cleaner.
Nice, I hadn't noticed the bottom halves. Much cleaner, indeed.
Results are a little better too:
Mark McLoughlin wrote:
By removing the tx timer altogether and doing all the copies in the
I/O thread, we can keep the I/O churning away in parallel with the
guest generating more I/O.
In my tests, this significantly increases guest->host throughput,
causes a minor increase in host->guest throug
By removing the tx timer altogether and doing all the copies in the
I/O thread, we can keep the I/O churning away in parallel with the
guest generating more I/O.
In my tests, this significantly increases guest->host throughput,
causes a minor increase in host->guest throughput, reduces CPU
utiliza