from:"Sasha Levin"

Re: [RFC] kvmtool: add support for modern virtio-pci

2015-11-19 Thread Sasha Levin

On 11/19/2015 02:21 AM, Gerd Hoffmann wrote:
> On Mi, 2015-11-18 at 23:01 -0500, Sasha Levin wrote:
>> On 11/18/2015 11:00 PM, Sasha Levin wrote:
>>> Anyways, I debugged it for a bit a found that seabios attempts to write to
>>> the notification BAR, I look further tomorrow to narrow it down and fix it.
>>
>> Err, *read*, obviously.
>>
>> I've never implemented that because the kernel doesn't try to do that (it 
>> doesn't
>> make much sense, I think...).
> 
> It doesn't make sense indeed (kvmtool still shouldn't segfault though),
> and on a quick look I can't spot a place in seabios doing that ...
> 
> It's reading ISR, as part of device reset, to make sure any pending
> interrupts are cleared.

That was indeed the ISR field. Fixing that makes seabios reach the same point as
legacy virtio before failing.

I don't see the original correspondence about seabios failures you've reported, 
if
you want to forward them over we can look at it further.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] kvmtool: add support for modern virtio-pci

2015-11-18 Thread Sasha Levin

On 11/18/2015 03:22 AM, Gerd Hoffmann wrote:
> /me goes undust the kvmtool patches for seabios.
> 
> (see https://www.kraxel.org/cgit/seabios/commit/?h=kvmtool,
> build with CONFIG_KVMTOOL=y + CONFIG_DEBUG_LEVEL=9)
> 
> nilsson kraxel ~# ~kraxel/projects/kvmtool/lkvm run --name seabios
> --firmware /home/kraxel/projects/seabios/out-bios-kvmtool/bios.bin
> --disk /vmdisk/cloud/persistent/Fedora-Cloud-Base-22-20150521.x86_64.raw
>   # lkvm run -k /boot/vmlinuz-3.10.0-324.el7.x86_64 -m 448 -c 4 --name
> seabios

Thanks for testing! I didn't even thing about seabios as a testing target.

I tried to do what you described, and built seabios with:

$ cat .config | grep 'KVMTOOL\|DEBUG'
CONFIG_KVMTOOL=y
CONFIG_DEBUG_LEVEL=9

But when booting, it just hangs on:

$ ./lkvm run --firmware ~/seabios/out/bios.bin -d dummy
  # lkvm run -k /boot/vmlinuz-4.2.0-17-generic -m 448 -c 4 --name guest-12566


And same result with --virtio-legacy...

What did I miss?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] kvmtool: add support for modern virtio-pci

2015-11-18 Thread Sasha Levin

On 11/18/2015 11:00 PM, Sasha Levin wrote:
> Anyways, I debugged it for a bit a found that seabios attempts to write to
> the notification BAR, I look further tomorrow to narrow it down and fix it.

Err, *read*, obviously.

I've never implemented that because the kernel doesn't try to do that (it 
doesn't
make much sense, I think...).


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] kvmtool: add support for modern virtio-pci

2015-11-18 Thread Sasha Levin

On 11/18/2015 12:52 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> Thanks for testing! I didn't even thing about seabios as a testing target.
> 
> Not surprising, support isn't upstream, ran into a bunch of issues[1][2]
> last time I tried to combine the two, ran into some issues and nobody
> seemed to care, so the seabios patches where just sitting in a branch in
> my repo ...
> 
>> $ cat .config | grep 'KVMTOOL\|DEBUG'
>> CONFIG_KVMTOOL=y
>> CONFIG_DEBUG_LEVEL=9
> 
> Hmm, 'CONFIG_KVMTOOL=y > .config; make olddefconfig' should give you a
> working configuration.
> 
> Setting 'CONFIG_DEBUG_LEVEL=9' is useful for trouble-shooting as it
> makes the virtio drivers more verbose, but not mandatory to have.
> 
> Serial line support is needed to get output:
> 
> CONFIG_DEBUG_SERIAL=y
> CONFIG_DEBUG_SERIAL_PORT=0x3f8
> 
> Also I think rom size must be 128k b/c kvmtool expects it to be that
> way:
> 
> CONFIG_ROM_SIZE=128
> 
> But those are the defaults, and after "make olddefconfig" you should
> already have them ...

It was the ROM_SIZE one as it seems, it was set to 0 here.

Anyways, I debugged it for a bit a found that seabios attempts to write to
the notification BAR, I look further tomorrow to narrow it down and fix it.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] kvmtool: add support for modern virtio-pci

2015-11-17 Thread Sasha Levin

This is a first go at adding support for the modern (based on the 1.0 virtio
spec) virtio-pci implementation.

kvmtool makes it simple to add additional transports such as this because of
it's layering, so we are able to add it as a 3rd (after legacy virtio-pci and
virtio-mmio) transport layer, and still allow users to choose to use either
the legacy or the modern implementations (but setting the modern one as
default.

The changes to the virtio devices are mostly the result of needing to support
>32bit features, and the different initialization method for VQs.

It's worth noting that supporting v1.0 implies any_layout, but some of our
devices made assumptions about the layout - which I've fixed. But it's worth
to keep in mind that some probably went unnoticed.

To sum it up: this is a lightly tested version for feedback about the design
and to weed out major bugs people notice. Feedback is very welcome!

Signed-off-by: Sasha Levin <sasha.le...@oracle.com>
---
 Makefile  |   1 +
 builtin-run.c |   4 +
 include/kvm/kvm-config.h  |   1 +
 include/kvm/pci.h |   8 +-
 include/kvm/virtio-9p.h   |   2 +-
 include/kvm/{virtio-pci.h => virtio-pci-modern.h} |  23 +-
 include/kvm/virtio-pci.h  |   6 +-
 include/kvm/virtio.h  |  25 +-
 include/linux/virtio_pci.h| 199 +++
 net/uip/core.c|   7 +-
 virtio/9p.c   |  35 +-
 virtio/balloon.c  |  37 +-
 virtio/blk.c  |  50 +-
 virtio/console.c  |  42 +-
 virtio/core.c |  16 +
 virtio/mmio.c |  13 +-
 virtio/net.c  |  59 ++-
 virtio/pci.c  |   4 +-
 virtio/pci_modern.c   | 599 ++
 virtio/rng.c  |  29 +-
 virtio/scsi.c |  36 +-
 x86/include/kvm/kvm-arch.h|   2 +-
 22 files changed, 1109 insertions(+), 89 deletions(-)
 copy include/kvm/{virtio-pci.h => virtio-pci-modern.h} (69%)
 create mode 100644 include/linux/virtio_pci.h
 create mode 100644 virtio/pci_modern.c

diff --git a/Makefile b/Makefile
index 59622c3..13a12f8 100644
--- a/Makefile
+++ b/Makefile
@@ -67,6 +67,7 @@ OBJS  += virtio/net.o
 OBJS   += virtio/rng.o
 OBJS+= virtio/balloon.o
 OBJS   += virtio/pci.o
+OBJS   += virtio/pci_modern.o
 OBJS   += disk/blk.o
 OBJS   += disk/qcow.o
 OBJS   += disk/raw.o
diff --git a/builtin-run.c b/builtin-run.c
index edcaf3e..e133b10 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -128,6 +128,8 @@ void kvm_run_set_wrapper_sandbox(void)
" rootfs"), \
OPT_STRING('\0', "hugetlbfs", &(cfg)->hugetlbfs_path, "path",   \
"Hugetlbfs path"),  \
+   OPT_BOOLEAN('\0', "virtio-legacy", &(cfg)->old_virtio, "Use"\
+   " legacy virtio-pci devices"),  \
\
OPT_GROUP("Kernel options:"),   \
OPT_STRING('k', "kernel", &(cfg)->kernel_filename, "kernel",\
@@ -517,6 +519,8 @@ static struct kvm *kvm_cmd_run_init(int argc, const char 
**argv)
kvm->cfg.vmlinux_filename = find_vmlinux();
kvm->vmlinux = kvm->cfg.vmlinux_filename;
 
+   default_transport = kvm->cfg.old_virtio ? VIRTIO_PCI : 
VIRTIO_PCI_MODERN;
+
if (kvm->cfg.nrcpus == 0)
kvm->cfg.nrcpus = nr_online_cpus;
 
diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
index 386fa8c..b1512a1 100644
--- a/include/kvm/kvm-config.h
+++ b/include/kvm/kvm-config.h
@@ -57,6 +57,7 @@ struct kvm_config {
bool no_dhcp;
bool ioport_debug;
bool mmio_debug;
+   bool old_virtio;
 };
 
 #endif
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index b0c28a1..19ec56a 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "kvm/devices.h"
@@ -81,7 +82,12 @@ struct pci_device_header {
u8  min_gnt;
u8  max_lat;
struct msix_cap msix;
-   u8  empty[136]; /* Rest of PCI config space */
+   struct virtio_pci_cap common_cap;
+   struct virtio_pci_notify_cap notify_cap;
+   struct virtio_pci_cap isr_cap

Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too

2015-11-05 Thread Sasha Levin

On 11/05/2015 09:41 AM, Will Deacon wrote:
> On Wed, Nov 04, 2015 at 06:51:12PM -0500, Sasha Levin wrote:
>> > On 11/04/2015 06:51 AM, Will Deacon wrote:
>>> > > +   mutex_lock(_lock);
>>> > > +
>>> > > +   /* The kvm->cpus array contains a null pointer in the last 
>>> > > location */
>>> > > +   for (i = 0; ; i++) {
>>> > > +   if (kvm->cpus[i])
>>> > > +   pthread_kill(kvm->cpus[i]->thread, SIGKVMEXIT);
>>> > > +   else
>>> > > +   break;
>>> > > +   }
>>> > > +
>>> > > +   kvm__continue(kvm);
>> > 
>> > In this scenario: if we grabbed pause_lock, signaled vcpu0 to exit, and it 
>> > did
>> > before we called kvm__continue(), we won't end up releasing pause_lock, 
>> > which
>> > might cause a lockup later, no?
> Hmm, yeah, maybe that should be an explicit mutex_unlock rather than a
> call to kvm__continue.

Yeah, that should do the trick.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too

2015-11-04 Thread Sasha Levin

On 11/04/2015 06:51 AM, Will Deacon wrote:
> + mutex_lock(_lock);
> +
> + /* The kvm->cpus array contains a null pointer in the last location */
> + for (i = 0; ; i++) {
> + if (kvm->cpus[i])
> + pthread_kill(kvm->cpus[i]->thread, SIGKVMEXIT);
> + else
> + break;
> + }
> +
> + kvm__continue(kvm);

In this scenario: if we grabbed pause_lock, signaled vcpu0 to exit, and it did
before we called kvm__continue(), we won't end up releasing pause_lock, which
might cause a lockup later, no?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sanitizing kvmtool

2015-10-25 Thread Sasha Levin

On 10/25/2015 11:19 AM, Paolo Bonzini wrote:
> 
> 
> On 21/10/2015 19:07, Sasha Levin wrote:
>> On 10/19/2015 11:15 AM, Dmitry Vyukov wrote:
>>> But still: if result of a racy read is passed to guest, that can leak
>>> arbitrary host data into guest.
>>
>> I see what you're saying.
> 
> I don't... how can it leak arbitrary host data?  The memcpy cannot write
> out of bounds.

The issue I had in mind (simplified) is:

vcpu1   vcpu2

guest writes idx
check if idx is valid
guest writes new idx
access (guest mem + idx)


So I'm not sure if cover both the locking, and potential compiler tricks
sufficiently enough to prevent that scenario.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sanitizing kvmtool

2015-10-21 Thread Sasha Levin

On 10/19/2015 11:15 AM, Dmitry Vyukov wrote:
> On Mon, Oct 19, 2015 at 5:08 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
>> > On 10/19/2015 10:47 AM, Dmitry Vyukov wrote:
>>>> >>> Right, the memory areas that are accessed both by the hypervisor and 
>>>> >>> the guest
>>>>> >>> > should be treated as untrusted input, but the hypervisor is 
>>>>> >>> > supposed to validate
>>>>> >>> > the input carefully before using it - so I'm not sure how data 
>>>>> >>> > races would
>>>>> >>> > introduce anything new that we didn't catch during validation.
>>> >>
>>> >> One possibility would be: if result of a racy read is passed to guest,
>>> >> that can leak arbitrary host data into guest. Does not sound good.
>>> >> Also, without usage of proper atomic operations, it is basically
>>> >> impossible to verify untrusted data, as it can be changing under your
>>> >> feet. And storing data into a local variable does not prevent the data
>>> >> from changing.
>> >
>> > What's missing here is that the guest doesn't directly read/write the 
>> > memory:
>> > every time it accesses a memory that is shared with the host it will 
>> > trigger
>> > an exit, which will stop the vcpu thread that made the access and kernel 
>> > side
>> > kvm will pass the hypervisor the value the guest wrote (or the memory 
>> > address
>> > it attempted to read). The value/address can't change under us in that 
>> > scenario.
> But still: if result of a racy read is passed to guest, that can leak
> arbitrary host data into guest.

I see what you're saying. I need to think about it a bit, maybe we do need 
locking
for each of the virtio devices we emulate.


On an unrelated note, a few of the reports are pointing to ioport__unregister():

==
WARNING: ThreadSanitizer: data race (pid=109228)
  Write of size 8 at 0x7d1cdf40 by main thread:
#0 free tsan/rtl/tsan_interceptors.cc:570 (lkvm+0x00443376)
#1 ioport__unregister ioport.c:138:2 (lkvm+0x004a9ff9)
#2 pci__exit pci.c:247:2 (lkvm+0x004ac857)
#3 init_list__exit util/init.c:59:8 (lkvm+0x004bca6e)
#4 kvm_cmd_run_exit builtin-run.c:645:2 (lkvm+0x004a68a7)
#5 kvm_cmd_run builtin-run.c:661 (lkvm+0x004a68a7)
#6 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
#7 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
#8 main main.c:18 (lkvm+0x004ac0b4)

  Previous read of size 8 at 0x7d1cdf40 by thread T55:
#0 rb_int_search_single util/rbtree-interval.c:14:17 (lkvm+0x004bf968)
#1 ioport_search ioport.c:41:9 (lkvm+0x004aa05f)
#2 kvm__emulate_io ioport.c:186 (lkvm+0x004aa05f)
#3 kvm_cpu__emulate_io x86/include/kvm/kvm-cpu-arch.h:41:9 
(lkvm+0x004aa718)
#4 kvm_cpu__start kvm-cpu.c:126 (lkvm+0x004aa718)
#5 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)

  Thread T55 'kvm-vcpu-2' (tid=109285, finished) created by main thread at:
#0 pthread_create tsan/rtl/tsan_interceptors.cc:848 (lkvm+0x004478a3)
#1 kvm_cmd_run_work builtin-run.c:633:7 (lkvm+0x004a683f)
#2 kvm_cmd_run builtin-run.c:660 (lkvm+0x004a683f)
#3 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
#4 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
#5 main main.c:18 (lkvm+0x004ac0b4)

SUMMARY: ThreadSanitizer: data race ioport.c:138:2 in ioport__unregister
==

I think this is because we don't perform locking using pthread, but rather pause
the vm entirely - so the cpu threads it's pointing to aren't actually running 
when
we unregister ioports. Is there a way to annotate that for tsan?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network hangs when communicating with host

2015-10-20 Thread Sasha Levin

On 10/20/2015 09:42 AM, Dmitry Vyukov wrote:
> I now have another issue. My binary fails to mmap a file within lkvm
> sandbox. The same binary works fine on host and in qemu. I've added
> strace into sandbox script, and here is the output:
> 
> [pid   837] openat(AT_FDCWD, "syzkaller-shm048878722", O_RDWR|O_CLOEXEC) = 5
> [pid   837] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 5,
> 0) = -1 EINVAL (Invalid argument)
> 
> I don't see anything that can potentially cause EINVAL here. Is it
> possible that lkvm somehow affects kernel behavior here?
> 
> I run lkvm as:
> 
> $ taskset 1 /kvmtool/lkvm sandbox --disk syz-0 --mem=2048 --cpus=2
> --kernel /arch/x86/boot/bzImage --network mode=user --sandbox
> /workdir/kvm/syz-0.sh

It's possible that something in the virtio-9p layer is broken. I'll give
it a look in the evening.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sanitizing kvmtool

2015-10-19 Thread Sasha Levin

On 10/19/2015 04:37 AM, Dmitry Vyukov wrote:
>> So in this case (and most of the other data race cases described in the full 
>> log) it
>> > seems like ThreadSanitizer is mixing with different accesses by the guest 
>> > to one underlying
>> > block of memory on the host.
>> >
>> > Here, for example, T55 accesses the msix block of the virtio-net PCI 
>> > device, and T58 is accessing
>> > the virtqueue exposed by that device. While they both get to the same 
>> > block of memory inside
> I don't understand this.
> Do you mean that this is a false positive? Or it is a real issue in lkvm?

I suspect it's a false positive because ThreadSanitizer sees the memory as one 
big
block, but the logic that makes sure we don't race here is within the guest vm, 
which
ThreadSanitizer doesn't see.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network hangs when communicating with host

2015-10-19 Thread Sasha Levin

On 10/19/2015 05:28 AM, Dmitry Vyukov wrote:
> On Mon, Oct 19, 2015 at 11:22 AM, Andre Przywara <andre.przyw...@arm.com> 
> wrote:
>> Hi Dmitry,
>>
>> On 19/10/15 10:05, Dmitry Vyukov wrote:
>>> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
>>>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>>>>> Hello,
>>>>>
>>>>> I am trying to run a program in lkvm sandbox so that it communicates
>>>>> with a program on host. I run lkvm as:
>>>>>
>>>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>>>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>>>>
>>>>> /my_prog then connects to a program on host over a tcp socket.
>>>>> I see that host receives some data, sends some data back, but then
>>>>> my_prog hangs on network read.
>>>>>
>>>>> To localize this I wrote 2 programs (attached). ping is run on host
>>>>> and pong is run from lkvm sandbox. They successfully establish tcp
>>>>> connection, but after some iterations both hang on read.
>>>>>
>>>>> Networking code in Go runtime is there for more than 3 years, widely
>>>>> used in production and does not have any known bugs. However, it uses
>>>>> epoll edge-triggered readiness notifications that known to be tricky.
>>>>> Is it possible that lkvm contains some networking bug? Can it be
>>>>> related to the data races in lkvm I reported earlier today?
>>
>> Just to let you know:
>> I think we have seen networking issues in the past - root over NFS had
>> issues IIRC. Will spent some time on debugging this and it looked like a
>> race condition in kvmtool's virtio implementation. I think pinning
>> kvmtool's virtio threads to one host core made this go away. However
>> although he tried hard (even by Will's standards!) he couldn't find a
>> the real root cause or a fix at the time he looked at it and we found
>> other ways to work around the issues (using virtio-blk or initrd's).
>>
>> So it's quite possible that there are issues. I haven't had time yet to
>> look at your sanitizer reports, but it looks like a promising approach
>> to find the root cause.
> 
> 
> Thanks, Andre!
> 
> ping/pong does not hang within at least 5 minutes when I run lkvm
> under taskset 1.
> 
> And, yeah, this pretty strongly suggests a data race. ThreadSanitizer
> can point you to the bug within a minute, so you just need to say
> "aha! it is here". Or maybe not. There are no guarantees. But if you
> already spent significant time on this, then checking the reports
> definitely looks like a good idea.

Okay, that's good to know.

I have a few busy days, but I'll definitely try to clear up these reports
as they seem to be pointing to real issues.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sanitizing kvmtool

2015-10-19 Thread Sasha Levin

On 10/19/2015 10:47 AM, Dmitry Vyukov wrote:
>> Right, the memory areas that are accessed both by the hypervisor and the 
>> guest
>> > should be treated as untrusted input, but the hypervisor is supposed to 
>> > validate
>> > the input carefully before using it - so I'm not sure how data races would
>> > introduce anything new that we didn't catch during validation.
> 
> One possibility would be: if result of a racy read is passed to guest,
> that can leak arbitrary host data into guest. Does not sound good.
> Also, without usage of proper atomic operations, it is basically
> impossible to verify untrusted data, as it can be changing under your
> feet. And storing data into a local variable does not prevent the data
> from changing.

What's missing here is that the guest doesn't directly read/write the memory:
every time it accesses a memory that is shared with the host it will trigger
an exit, which will stop the vcpu thread that made the access and kernel side
kvm will pass the hypervisor the value the guest wrote (or the memory address
it attempted to read). The value/address can't change under us in that scenario.

> There are also some data races with free(), it does not looks good at all.

I definitely agree there are quite a few real bugs there :)


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sanitizing kvmtool

2015-10-19 Thread Sasha Levin

On 10/19/2015 10:24 AM, Dmitry Vyukov wrote:
> On Mon, Oct 19, 2015 at 4:19 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
>> > On 10/19/2015 04:37 AM, Dmitry Vyukov wrote:
>>>> >>> So in this case (and most of the other data race cases described in 
>>>> >>> the full log) it
>>>>> >>> > seems like ThreadSanitizer is mixing with different accesses by the 
>>>>> >>> > guest to one underlying
>>>>> >>> > block of memory on the host.
>>>>> >>> >
>>>>> >>> > Here, for example, T55 accesses the msix block of the virtio-net 
>>>>> >>> > PCI device, and T58 is accessing
>>>>> >>> > the virtqueue exposed by that device. While they both get to the 
>>>>> >>> > same block of memory inside
>>> >> I don't understand this.
>>> >> Do you mean that this is a false positive? Or it is a real issue in lkvm?
>> >
>> > I suspect it's a false positive because ThreadSanitizer sees the memory as 
>> > one big
>> > block, but the logic that makes sure we don't race here is within the 
>> > guest vm, which
>> > ThreadSanitizer doesn't see.
> 
> But isn't the task of a hypervisor to sandbox the guest OS and to not
> trust it in any way, shape or form? What if the guest VM intentionally
> omits the synchronization? Can it exploit the host?

Right, the memory areas that are accessed both by the hypervisor and the guest
should be treated as untrusted input, but the hypervisor is supposed to validate
the input carefully before using it - so I'm not sure how data races would
introduce anything new that we didn't catch during validation.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sanitizing kvmtool

2015-10-17 Thread Sasha Levin

On 10/15/2015 06:21 AM, Dmitry Vyukov wrote:
> Hello,
> 
> I've run a set of sanitizers on
> git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git commit
> 3695adeb227813d96d9c41850703fb53a23845eb. I've just booted a VM and
> shut it down.
> 
> AddressSanitizer detected a heap-use-after-free:
> 
> AddressSanitizer: heap-use-after-free on address 0x6040df90 at pc
> 0x004f46d0 bp 0x7ffc79def2d0 sp 0x7ffc79def2c8
> READ of size 8 at 0x6040df90 thread T0
> #0 0x4f46cf in kvm__pause kvm.c:436:7
> #1 0x4f0d5d in ioport__unregister ioport.c:129:2
> #2 0x4efb2f in serial8250__exit hw/serial.c:446:7
> #3 0x516204 in init_list__exit util/init.c:59:8
> #4 0x4ea956 in kvm_cmd_run_exit builtin-run.c:645:2
> #5 0x4ea956 in kvm_cmd_run builtin-run.c:661
> #6 0x51596f in handle_command kvm-cmd.c:84:8
> #7 0x7fa398101ec4 in __libc_start_main
> /build/buildd/eglibc-2.19/csu/libc-start.c:287
> #8 0x41a505 in _start (lkvm+0x41a505)
> 
> 0x6040df90 is located 0 bytes inside of 40-byte region
> [0x6040df90,0x6040dfb8)
> freed by thread T0 here:
> #0 0x4b75a0 in free
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cc:30
> #1 0x4f29e6 in kvm_cpu__exit kvm-cpu.c:263:2
> #2 0x5160c4 in init_list__exit util/init.c:59:8
> 
> previously allocated by thread T0 here:
> #0 0x4b7a2c in calloc
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56
> #1 0x4f2491 in kvm_cpu__init kvm-cpu.c:221:14

I'm sending a patch to fix this + another issue it uncovered. This was caused by
the kvm cpu exit function to be marked as late call rather than core call, so it
would free the vcpus before anything else had a chance to exit.

> 
> ThreadSanitizer detected a whole bunch of data races, for example:
> 
> WARNING: ThreadSanitizer: data race (pid=109228)
>   Write of size 1 at 0x7d6c0001f384 by thread T55:
> #0 memcpy 
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:608
> (lkvm+0x0044b28b)
> #1 virtio_pci__msix_mmio_callback virtio/pci.c:269:3 (lkvm+0x004b3ee0)
> #2 kvm__emulate_mmio mmio.c:131:3 (lkvm+0x004ac332)
> #3 kvm_cpu__emulate_mmio x86/include/kvm/kvm-cpu-arch.h:46:9
> (lkvm+0x004aa8c6)
> #4 kvm_cpu__start kvm-cpu.c:147 (lkvm+0x004aa8c6)
> #5 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)
> 
>   Previous read of size 4 at 0x7d6c0001f384 by thread T58:
> #0 virtio_pci__signal_vq virtio/pci.c:290:29 (lkvm+0x004b36b6)
> #1 virtio_net_tx_thread virtio/net.c:210:4 (lkvm+0x004b1fb5)
> 
>   Location is heap block of size 1648 at 0x7d6c0001f100 allocated by
> main thread:
> #0 calloc 
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:544
> (lkvm+0x0043e812)
> #1 virtio_init virtio/core.c:191:12 (lkvm+0x004afa48)
> #2 virtio_net__init_one virtio/net.c:846:2 (lkvm+0x004b095d)
> #3 virtio_net__init virtio/net.c:868:3 (lkvm+0x004b0296)
> #4 init_list__init util/init.c:40:8 (lkvm+0x004bc7ee)
> #5 kvm_cmd_run_init builtin-run.c:621:6 (lkvm+0x004a6799)
> #6 kvm_cmd_run builtin-run.c:656 (lkvm+0x004a6799)
> #7 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #8 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #9 main main.c:18 (lkvm+0x004ac0b4)
> 
>   Thread T55 'kvm-vcpu-2' (tid=109285, running) created by main thread at:
> #0 pthread_create
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848
> (lkvm+0x004478a3)
> #1 kvm_cmd_run_work builtin-run.c:633:7 (lkvm+0x004a683f)
> #2 kvm_cmd_run builtin-run.c:660 (lkvm+0x004a683f)
> #3 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #4 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #5 main main.c:18 (lkvm+0x004ac0b4)
> 
>   Thread T58 'virtio-net-tx' (tid=109334, running) created by thread T53 at:
> #0 pthread_create
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848
> (lkvm+0x004478a3)
> #1 init_vq virtio/net.c:526:4 (lkvm+0x004b1523)
> #2 virtio_pci__io_out virtio/pci.c:219:3 (lkvm+0x004b484c)
> #3 kvm__emulate_io ioport.c:196:11 (lkvm+0x004aa0f8)
> #4 virtio_pci__io_mmio_callback virtio/pci.c:340:2 (lkvm+0x004b3e55)
> #5 kvm__emulate_mmio mmio.c:131:3 (lkvm+0x004ac332)
> #6 kvm_cpu__emulate_mmio x86/include/kvm/kvm-cpu-arch.h:46:9
> (lkvm+0x004aa8c6)
> #7 kvm_cpu__start kvm-cpu.c:147 (lkvm+0x004aa8c6)
> #8 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)

So in this case (and most of the other data race cases described in the full 
log) it
seems like ThreadSanitizer is mixing with different accesses by the guest to 
one

Re: Network hangs when communicating with host

2015-10-16 Thread Sasha Levin

On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
> Hello,
> 
> I am trying to run a program in lkvm sandbox so that it communicates
> with a program on host. I run lkvm as:
> 
> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
> /arch/x86/boot/bzImage --network mode=user -- /my_prog
> 
> /my_prog then connects to a program on host over a tcp socket.
> I see that host receives some data, sends some data back, but then
> my_prog hangs on network read.
> 
> To localize this I wrote 2 programs (attached). ping is run on host
> and pong is run from lkvm sandbox. They successfully establish tcp
> connection, but after some iterations both hang on read.
> 
> Networking code in Go runtime is there for more than 3 years, widely
> used in production and does not have any known bugs. However, it uses
> epoll edge-triggered readiness notifications that known to be tricky.
> Is it possible that lkvm contains some networking bug? Can it be
> related to the data races in lkvm I reported earlier today?
> 
> I am on commit 3695adeb227813d96d9c41850703fb53a23845eb.

Hey Dmitry,

How long does it take to reproduce? I've been running ping/pong as you've
described and it looks like it doesn't get stuck (read/writes keep going
on both sides).

Can you share your guest kernel config maybe?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm: odd time values since kvmclock: set scheduler clock stable

2015-05-26 Thread Sasha Levin

On 05/26/2015 09:21 AM, Luiz Capitulino wrote:
 Sasha,
  
  Can you give the suggested patch (hypervisor patch...) a try please?
  (with a patched guest, obviously).
  
  KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system
  MSR
 I've tried your v2, it works for me. My test-case is very simple though:
 I just boot a VM, log in and reboot. This reproduces the issue Sasha
 reported 100% of the times for me (don't need multi-vcpu guest either).

Sorry for the delay, we had a long weekend here.

It seems to work fine here, no more jumps when booting.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[added to the 3.18 stable tree] KVM: MIPS: Fix trace event to save PC directly

2015-03-14 Thread Sasha Levin

From: James Hogan james.ho...@imgtec.com

commit b3cffac04eca9af46e1e23560a8ee22b1bd36d43 upstream.

Currently the guest exit trace event saves the VCPU pointer to the
structure, and the guest PC is retrieved by dereferencing it when the
event is printed rather than directly from the trace record. This isn't
safe as the printing may occur long afterwards, after the PC has changed
and potentially after the VCPU has been freed. Usually this results in
the same (wrong) PC being printed for multiple trace events. It also
isn't portable as userland has no way to access the VCPU data structure
when interpreting the trace record itself.

Lets save the actual PC in the structure so that the correct value is
accessible later.

Fixes: 669e846e6c4e (KVM/MIPS32: MIPS arch specific APIs for KVM)
Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Ralf Baechle r...@linux-mips.org
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: Gleb Natapov g...@kernel.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@redhat.com
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Acked-by: Steven Rostedt rost...@goodmis.org
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 arch/mips/kvm/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/trace.h b/arch/mips/kvm/trace.h
index c1388d4..bd6437f 100644
--- a/arch/mips/kvm/trace.h
+++ b/arch/mips/kvm/trace.h
@@ -24,18 +24,18 @@ TRACE_EVENT(kvm_exit,
TP_PROTO(struct kvm_vcpu *vcpu, unsigned int reason),
TP_ARGS(vcpu, reason),
TP_STRUCT__entry(
-   __field(struct kvm_vcpu *, vcpu)
+   __field(unsigned long, pc)
__field(unsigned int, reason)
),
 
TP_fast_assign(
-   __entry-vcpu = vcpu;
+   __entry-pc = vcpu-arch.pc;
__entry-reason = reason;
),
 
TP_printk([%s]PC: 0x%08lx,
  kvm_mips_exit_types_str[__entry-reason],
- __entry-vcpu-arch.pc)
+ __entry-pc)
 );
 
 #endif /* _TRACE_KVM_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: stand-alone kvmtool

2015-02-19 Thread Sasha Levin

On 02/13/2015 05:39 AM, Andre Przywara wrote:
 Hi,
 
 as I found it increasingly inconvenient to use kvmtool[1] as part of a
 Linux repository, I decided to give it a go and make it a stand-alone
 project. So I filtered all the respective commits, adjusted the paths in
 there (while keeping authorship and commit date, of course) and then
 added the missing bits to let it compile without a kernel tree nearby.
 The result is now available on:
 
 git://linux-arm.org/kvmtool.git
 http://linux-arm.org/kvmtool.git

Hi Andre,

What inconvenience is caused by having it sit inside the kernel tree
beyond an increased requirement in disk space?

Moving it out will make us lose all the new features and bug fixes we
gain from using the kernel code directly rather than copying it once
in a while.

With your suggestion we'll end up needing something that copies stuff
from the kernel into that standalone tree, just like what qemu does.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-17 Thread Sasha Levin

On 02/15/2015 01:01 AM, Raghavendra K T wrote:
 On 02/15/2015 11:25 AM, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
  prev = *lock;
  add_smp(lock-tickets.head, TICKET_LOCK_INC);

  /* add_smp() is a full mb() */

  if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
  __ticket_unlock_slowpath(lock, prev);

 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.

 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.

 So this patch implements the fix with:
 1. Moving slowpath flag to head (Oleg):
 Unlocked locks don't care about the slowpath flag; therefore we can keep
 it set after the last unlock, and clear it again on the first (try)lock.
 -- this removes the write after unlock. note that keeping slowpath flag would
 result in unnecessary kicks.
 By moving the slowpath flag from the tail to the head ticket we also avoid
 the need to access both the head and tail tickets on unlock.

 2. use xadd to avoid read/write after unlock that checks the need for
 unlock_kick (Linus):
 We further avoid the need for a read-after-release by using xadd;
 the prev head value will include the slowpath flag and indicate if we
 need to do PV kicking of suspended spinners -- on modern chips xadd
 isn't (much) more expensive than an add + load.

 Result:
   setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
   benchmark overcommit %improve
   kernbench  1x   -0.13
   kernbench  2x0.02
   dbench 1x   -1.77
   dbench 2x   -0.63

 [Jeremy: hinted missing TICKET_LOCK_INC for kick]
 [Oleg: Moving slowpath flag to head, ticket_equals idea]
 [PeterZ: Detailed changelog]

 Reported-by: Sasha Levin sasha.le...@oracle.com
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 
 Sasha, Hope this addresses invalid read concern you had with latest
 xadd based implementation.
 
 (Think we need to test without Oleg's PATCH] sched/completion: 
 completion_done() should serialize with complete() reported by Paul.)
 

I ran it for a while and everything seems to work correctly:

Tested-by: Sasha Levin sasha.le...@oracle.com


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Sasha Levin

On 02/10/2015 04:30 AM, Raghavendra K T wrote:

 So I think Raghavendra's last version (which hopefully fixes the
 lockup problem that Sasha reported) together with changing that
 
 V2 did pass the stress, but getting confirmation Sasha would help.

I've been running it for the last two days, and didn't see any lockups
or other strange behaviour aside from some invalid reads which we
expected.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-06 Thread Sasha Levin

On 02/06/2015 02:42 PM, Davidlohr Bueso wrote:
 On Fri, 2015-02-06 at 08:25 -0800, Linus Torvalds wrote:
 On Fri, Feb 6, 2015 at 6:49 AM, Raghavendra K T
 raghavendra...@linux.vnet.ibm.com wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 [ fix edited out ]

 So I'm not going to be applying this for 3.19, because it's much too
 late and the patch is too scary. Plus the bug probably effectively
 never shows up in real life (it is probably easy to trigger the
 speculative *read* but probably never the actual speculative write
 after dropping the lock last).

 This will need a lot of testing by the paravirt people - both
 performance and correctness. So *maybe* for 3.20, but maybe for even
 later, and then marked for stable, of course.

 Are there any good paravirt stress-tests that people could run for
 extended times?
 
 locktorture inside a VM should give a proper pounding.

Would it catch lifetime issues too? I thought it just tests out correctness.

I tried it and other unrelated stuff broke. I'll send separate mails for that...


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-06 Thread Sasha Levin

On 02/06/2015 09:49 AM, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);
 
 /* add_smp() is a full mb() */
 
 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);
 
 
 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.
 
 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.
 
 However it brings additional case to be handled, viz., slowpath still
 could be set when somebody does arch_trylock. Handle that too by ignoring
 slowpath flag during lock availability check.
 
 Reported-by: Sasha Levin sasha.le...@oracle.com
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

With this patch, my VMs lock up quickly after boot with:

[  161.613469] BUG: spinlock lockup suspected on CPU#31, kworker/31:1/5213
[  161.613469]  lock: purge_lock.28981+0x0/0x40, .magic: dead4ead, .owner: 
kworker/7:1/6400, .owner_cpu: 7
[  161.613469] CPU: 31 PID: 5213 Comm: kworker/31:1 Not tainted 
3.19.0-rc7-next-20150204-sasha-00048-gee3a350 #1869
[  161.613469] Workqueue: events bpf_prog_free_deferred
[  161.613469]   f03dd27f 88056b227a88 
a1702276
[  161.613469]   88017cf7 88056b227aa8 
9e1d009c
[  161.613469]  a3edae60 86c3f830 88056b227ad8 
9e1d01d7
[  161.613469] Call Trace:
[  161.613469] dump_stack (lib/dump_stack.c:52)
[  161.613469] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 8))
[  161.613469] do_raw_spin_lock (include/linux/nmi.h:48 
kernel/locking/spinlock_debug.c:119 kernel/locking/spinlock_debug.c:137)
[  161.613469] _raw_spin_lock (kernel/locking/spinlock.c:152)
[  161.613469] ? __purge_vmap_area_lazy (mm/vmalloc.c:615)
[  161.613469] __purge_vmap_area_lazy (mm/vmalloc.c:615)
[  161.613469] ? vm_unmap_aliases (mm/vmalloc.c:1021)
[  161.613469] vm_unmap_aliases (mm/vmalloc.c:1052)
[  161.613469] ? vm_unmap_aliases (mm/vmalloc.c:1021)
[  161.613469] ? __lock_acquire (kernel/locking/lockdep.c:2019 
kernel/locking/lockdep.c:3184)
[  161.613469] change_page_attr_set_clr (arch/x86/mm/pageattr.c:1394)
[  161.613469] ? debug_object_deactivate (lib/debugobjects.c:463)
[  161.613469] set_memory_rw (arch/x86/mm/pageattr.c:1662)
[  161.613469] ? __lock_is_held (kernel/locking/lockdep.c:3518)
[  161.613469] bpf_jit_free (include/linux/filter.h:377 
arch/x86/net/bpf_jit_comp.c:991)
[  161.613469] bpf_prog_free_deferred (kernel/bpf/core.c:646)
[  161.613469] process_one_work (kernel/workqueue.c:2014 
include/linux/jump_label.h:114 include/trace/events/workqueue.h:111 
kernel/workqueue.c:2019)
[  161.613469] ? process_one_work (./arch/x86/include/asm/atomic64_64.h:33 
include/asm-generic/atomic-long.h:38 kernel/workqueue.c:598 
kernel/workqueue.c:625 kernel/workqueue.c:2007)
[  161.613469] worker_thread (include/linux/list.h:189 kernel/workqueue.c:2147)
[  161.613469] ? process_one_work (kernel/workqueue.c:2091)
[  161.613469] kthread (kernel/kthread.c:207)
[  161.613469] ? finish_task_switch (./arch/x86/include/asm/current.h:14 
kernel/sched/sched.h:1058 kernel/sched/core.c:2258)
[  161.613469] ? flush_kthread_work (kernel/kthread.c:176)
[  161.613469] ret_from_fork (arch/x86/kernel/entry_64.S:283)
[  161.613469] ? flush_kthread_work (kernel/kthread.c:176)

And a few soft lockup messages inside the scheduler after that.


Thanks,
Sasha


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-06 Thread Sasha Levin

On 02/06/2015 09:49 AM, Raghavendra K T wrote:
  static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
   if (TICKET_SLOWPATH_FLAG 
 - static_key_false(paravirt_ticketlocks_enabled)) {
 - arch_spinlock_t prev;
 + static_key_false(paravirt_ticketlocks_enabled)) {
 + __ticket_t prev_head;
  
 - prev = *lock;
 + prev_head = lock-tickets.head;
   add_smp(lock-tickets.head, TICKET_LOCK_INC);
  
   /* add_smp() is a full mb() */
  
 - if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 - __ticket_unlock_slowpath(lock, prev);
 + if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG)) {
 + BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
 + __ticket_unlock_kick(lock, prev_head);

Can we modify it slightly to avoid potentially accessing invalid memory:

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 5315887..cd22d73 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -144,13 +144,13 @@ static __always_inline void 
arch_spin_unlock(arch_spinlock_t *lock
if (TICKET_SLOWPATH_FLAG 
static_key_false(paravirt_ticketlocks_enabled)) {
__ticket_t prev_head;
-
+   bool needs_kick = lock-tickets.tail  TICKET_SLOWPATH_FLAG;
prev_head = lock-tickets.head;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

-   if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG)) {
+   if (unlikely(needs_kick)) {
BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
__ticket_unlock_kick(lock, prev_head);
}


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] mm: gup: use get_user_pages_fast and get_user_pages_unlocked

2014-10-01 Thread Sasha Levin

Hi Andrea,

On 10/01/2014 04:56 AM, Andrea Arcangeli wrote:
 diff --git a/mm/mempolicy.c b/mm/mempolicy.c
 index 8f5330d..6606c10 100644
 --- a/mm/mempolicy.c
 +++ b/mm/mempolicy.c
 @@ -881,7 +881,7 @@ static int lookup_node(struct mm_struct *mm, unsigned 
 long addr)
   struct page *p;
   int err;
  
 - err = get_user_pages(current, mm, addr  PAGE_MASK, 1, 0, 0, p, NULL);
 + err = get_user_pages_fast(addr  PAGE_MASK, 1, 0, p);
   if (err = 0) {
   err = page_to_nid(p);
   put_page(p);

This change looks bogus. mmap_sem might get locked in do_get_mempolicy(), and 
with this
change we'll try locking it again in get_user_pages_fast.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: share config interrupt between virtio devices

2014-09-21 Thread Sasha Levin

On 09/21/2014 04:09 AM, Michael S. Tsirkin wrote:
 The virtio 0.9.5 spec says that ISR is unused when in MSI-X mode. I 
  don't think that you can depend on the device to set the configuration 
  changed bit.
  The virtio 1.0 spec seems to have fixed that.
 Yes, virtio 0.9.5 has this bug. But in practice qemu always set this
 bit, so for qemu we could do that unconditionally.  Pekka's lkvm tool
 doesn't unfortunately.  It's easy to fix that, but it would be nicer to
 additionally probe for old versions of the tool, and disable IRQF_SHARED
 in that case.

 To complicate things, lkvm does not use a distinct subsystem vendor ID,
 in spite of the fact the virtio spec always required this explicitly.

I think I may be a bit confused here, but AFAIK we do set subsystem vendor
ID properly for our virtio-pci devices?

vpci-pci_hdr = (struct pci_device_header) {
.vendor_id  = 
cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
.device_id  = cpu_to_le16(device_id),
[...]
.subsys_vendor_id   = 
cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: share config interrupt between virtio devices

2014-09-21 Thread Sasha Levin

On 09/21/2014 11:02 AM, Michael S. Tsirkin wrote:
 On Sun, Sep 21, 2014 at 09:47:51AM -0400, Sasha Levin wrote:
  On 09/21/2014 04:09 AM, Michael S. Tsirkin wrote:
   The virtio 0.9.5 spec says that ISR is unused when in MSI-X mode. I 
don't think that you can depend on the device to set the 
configuration 
changed bit.
The virtio 1.0 spec seems to have fixed that.
   Yes, virtio 0.9.5 has this bug. But in practice qemu always set this
   bit, so for qemu we could do that unconditionally.  Pekka's lkvm tool
   doesn't unfortunately.  It's easy to fix that, but it would be nicer to
   additionally probe for old versions of the tool, and disable IRQF_SHARED
   in that case.
  
   To complicate things, lkvm does not use a distinct subsystem vendor ID,
   in spite of the fact the virtio spec always required this explicitly.
  
  I think I may be a bit confused here, but AFAIK we do set subsystem vendor
  ID properly for our virtio-pci devices?
  
  vpci-pci_hdr = (struct pci_device_header) {
  .vendor_id  = 
  cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
  .device_id  = cpu_to_le16(device_id),
 [...]
  .subsys_vendor_id   = 
  cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
  
  
  Thanks,
  Sasha
 
 Yes but the spec says:
   The Subsystem Vendor ID should reflect the PCI Vendor ID of the 
 environment.
 
 IOW lkvm shouldn't reuse the ID from qemu, it should have its own
 (qemu and lkvm hypervisors being a different environment).
 
 virtio 1.0 have weakened this requirement:
   The PCI Subsystem Vendor ID and the PCI Subsystem Device ID MAY
   reflect the PCI Vendor and Device
   ID of the environment (for informational purposes by the driver).
 
 I reasoned that since it's for informational purposes only, there's no
 reason to make it a SHOULD.
 
 It might or might not be a good idea to change it back, worth
 considering.

Ow. The 0.9.5 spec also says:

(it's currently only used for informational purposes by the guest).

That and the combination of should rather then must (recommended rather than
required) prompted us to just put something that works in there and leave it be.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: MMU: Use hashtable for MMU page hash

2014-08-04 Thread Sasha Levin

Use the kernel hashtable interface instead of the hlist interface.
This allows us to eliminate some unneeded code and make the code
simpler.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 arch/x86/include/asm/kvm_host.h |4 ++--
 arch/x86/kvm/mmu.c  |   16 ++--
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5724601..2c8e3c5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -17,6 +17,7 @@
 #include linux/tracepoint.h
 #include linux/cpumask.h
 #include linux/irq_work.h
+#include linux/hashtable.h
 
 #include linux/kvm.h
 #include linux/kvm_para.h
@@ -90,7 +91,6 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, 
int level)
 #define KVM_PERMILLE_MMU_PAGES 20
 #define KVM_MIN_ALLOC_MMU_PAGES 64
 #define KVM_MMU_HASH_SHIFT 10
-#define KVM_NUM_MMU_PAGES (1  KVM_MMU_HASH_SHIFT)
 #define KVM_MIN_FREE_MMU_PAGES 5
 #define KVM_REFILL_PAGES 25
 #define KVM_MAX_CPUID_ENTRIES 80
@@ -556,7 +556,7 @@ struct kvm_arch {
unsigned int n_max_mmu_pages;
unsigned int indirect_shadow_pages;
unsigned long mmu_valid_gen;
-   struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
+   DECLARE_HASHTABLE(mmu_page_hash, KVM_MMU_HASH_SHIFT);
/*
 * Hash table of struct kvm_mmu_page.
 */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..db1ae90 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1525,7 +1525,7 @@ static inline void kvm_mod_used_mmu_pages(struct kvm 
*kvm, int nr)
 static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 {
ASSERT(is_empty_shadow_page(sp-spt));
-   hlist_del(sp-hash_link);
+   hash_del(sp-hash_link);
list_del(sp-link);
free_page((unsigned long)sp-spt);
if (!sp-role.direct)
@@ -1533,11 +1533,6 @@ static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
kmem_cache_free(mmu_page_header_cache, sp);
 }
 
-static unsigned kvm_page_table_hashfn(gfn_t gfn)
-{
-   return gfn  ((1  KVM_MMU_HASH_SHIFT) - 1);
-}
-
 static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, u64 *parent_pte)
 {
@@ -1724,8 +1719,8 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
  * all the obsolete pages.
  */
 #define for_each_gfn_sp(_kvm, _sp, _gfn)   \
-   hlist_for_each_entry(_sp,   \
- (_kvm)-arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
+   hash_for_each_possible((_kvm)-arch.mmu_page_hash, (_sp),   \
+   hash_link, (_gfn))  \
if ((_sp)-gfn != (_gfn)) {} else
 
 #define for_each_gfn_indirect_valid_sp(_kvm, _sp, _gfn)
\
@@ -1973,8 +1968,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
return sp;
sp-gfn = gfn;
sp-role = role;
-   hlist_add_head(sp-hash_link,
-   vcpu-kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
+   hash_add(vcpu-kvm-arch.mmu_page_hash, sp-hash_link, gfn);
if (!direct) {
if (rmap_write_protect(vcpu-kvm, gfn))
kvm_flush_remote_tlbs(vcpu-kvm);
@@ -3885,6 +3879,8 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 
 static void init_kvm_mmu(struct kvm_vcpu *vcpu)
 {
+   hash_init(kvm-arch.mmu_page_hash);
+
if (mmu_is_nested(vcpu))
return init_kvm_nested_mmu(vcpu);
else if (tdp_enabled)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled

2014-03-21 Thread Sasha Levin


On 03/21/2014 04:07 PM, Bjorn Helgaas wrote:

I think I figured out what the problem is.  In virtio_pci__init(), we
allocate some address space with pci_get_io_space_block(), save its
address in vpci-mmio_addr, and hook that address space up to
virtio_pci__io_mmio_callback with kvm__register_mmio().

But when we update the BAR value in pci__config_wr(), the address
space mapping is never updated.  I think this means that virtio-pci
can't tolerate its devices being moved by the OS.

In my opinion, this is a bug in linux-kvm.  We've managed to avoid
triggering this bug by preventing Linux from moving the BAR (either by
me reverting my patch, or by Sasha's linux-kvm change [1]).  But it's
not very robust to assume that the OS will never change the BAR, so
it's quite possible that you'll trip over this again in the future.


The purpose of KVM tool is to implement as much as possible of the KVM
interface and the virtio spec so that we'll have a good development/testing
environment with a very simple to understand codebase.

The issue you've mentioned is the evil side of the KVM tool. It never
tried (or claimed) to implement anything close to legacy hardware
interfaces. This means, for example, that it doesn't run any BIOS, there's
very lacking APIC support and the kernel is just injected into the virtual
RAM and gets run from there.

It also means that we went into the PCI spec deep enough to get the code
to work with the kernel. The only reason we implemented MSI interrupts
for example is because they provide improved performance with KVM, not
because we were trying to get a complete implementation of the PCI spec.

So yes, the PCI implementation in the KVM tool is lacking and what we
have there might be broken by making the kernel conform more closely
to the spec, but we are always happy to adapt and improve our code to
work with any changes in the kernel.

To sum it up, If you'll end up adding a change to the kernel that is
valid according to the spec but breaks the KVM tool we'll just go ahead
and fix the tool. You really don't need to worry about breaking it.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: treat uids and gids in stat structure properly

2014-03-19 Thread Sasha Levin

Upstream commit 9p: Modify the stat structures to use kuid_t and kgid_t
has modified the type of uid and gid in the stat structure, which breaks
build for us.

This is a rather trivial conversion from u32 to kuid_t and kgid_t.

Reported-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/9p-pdu.c |4 +++-
 tools/kvm/virtio/9p.c |   11 ++-
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/virtio/9p-pdu.c b/tools/kvm/virtio/9p-pdu.c
index 9e95f3b..cc2e871 100644
--- a/tools/kvm/virtio/9p-pdu.c
+++ b/tools/kvm/virtio/9p-pdu.c
@@ -129,7 +129,9 @@ static int virtio_p9_decode(struct p9_pdu *pdu, const char 
*fmt, va_list ap)
{
struct p9_wstat *stbuf = va_arg(ap, struct p9_wstat *);
memset(stbuf, 0, sizeof(struct p9_wstat));
-   stbuf-n_uid = stbuf-n_gid = stbuf-n_muid = -1;
+   stbuf-n_uid = KUIDT_INIT(-1);
+   stbuf-n_gid = KGIDT_INIT(-1);
+   stbuf-n_muid = KUIDT_INIT(-1);
retval = virtio_p9_pdu_readf(pdu, wwdQdddq,
stbuf-size, stbuf-type,
stbuf-dev, stbuf-qid,
diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index 25cdd8c..847eddb 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -441,8 +441,8 @@ static void virtio_p9_fill_stat(struct p9_dev *p9dev,
memset(statl, 0, sizeof(*statl));
statl-st_mode  = st-st_mode;
statl-st_nlink = st-st_nlink;
-   statl-st_uid   = st-st_uid;
-   statl-st_gid   = st-st_gid;
+   statl-st_uid   = KUIDT_INIT(st-st_uid);
+   statl-st_gid   = KGIDT_INIT(st-st_gid);
statl-st_rdev  = st-st_rdev;
statl-st_size  = st-st_size;
statl-st_blksize   = st-st_blksize;
@@ -668,12 +668,13 @@ static void virtio_p9_setattr(struct p9_dev *p9dev,
((p9attr.valid  ATTR_CTIME)
  !((p9attr.valid  ATTR_MASK)  ~ATTR_CTIME))) {
if (!(p9attr.valid  ATTR_UID))
-   p9attr.uid = -1;
+   p9attr.uid = KUIDT_INIT(-1);
 
if (!(p9attr.valid  ATTR_GID))
-   p9attr.gid = -1;
+   p9attr.gid = KGIDT_INIT(-1);
 
-   ret = lchown(fid-abs_path, p9attr.uid, p9attr.gid);
+   ret = lchown(fid-abs_path, __kuid_val(p9attr.uid),
+   __kgid_val(p9attr.gid));
if (ret  0)
goto err_out;
}
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: mark our PCI card as PIO and MMIO able

2014-03-05 Thread Sasha Levin

A recent -next patch named PCI: Ignore BAR contents when
firmware left decoding disabled has pointed out that PCI
cards are supposed to declare that they have either PIO or
MMIO BARs by disabling them if it didn't.

Fix it by correctly marking our emulated PCI card as PIO/MMIO
enabled.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index fa7aa00..665d492 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -360,6 +360,7 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct 
virtio_device *vdev,
vpci-pci_hdr = (struct pci_device_header) {
.vendor_id  = 
cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
.device_id  = cpu_to_le16(device_id),
+   .command= PCI_COMMAND_IO | PCI_COMMAND_MEMORY,
.header_type= PCI_HEADER_TYPE_NORMAL,
.revision_id= 0,
.class[0]   = class  0xff,
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: mmu: delay mmu audit activation

2013-11-19 Thread Sasha Levin

We should not be using jump labels before they were initialized. Push back
the callback to until after jump label initialization.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 arch/x86/kvm/mmu_audit.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index daff69e..1185fe7 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -296,4 +296,4 @@ static struct kernel_param_ops audit_param_ops = {
.get = param_get_bool,
 };
 
-module_param_cb(mmu_audit, audit_param_ops, mmu_audit, 0644);
+arch_param_cb(mmu_audit, audit_param_ops, mmu_audit, 0644);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: lkvm issue: --network mode=user,trans=mmio doesn't work

2013-10-12 Thread Sasha Levin


On 10/12/2013 05:49 AM, Pekka Enberg wrote:

Hi Peter,

(Adding bunch of CCs.)

On 10/12/13 11:05 AM, Péter Szabó wrote:

First, thank you very much for writing lkvm, it's awesome, and very
easy to set up.

If sending an e-mail to you is not the right way to report lkvm
issues, please tell me how I should do it.


Sure, you can report problems to me and I'll do my best to
try to get them sorted out.


I'm using afdf92030c7c43b0f9b32b7edbe07ac3b13780f1 from
git://github.com/penberg/linux-kvm.git and Linux kernel 3.2.51:

config-3.2.51 (42 KB)
https://mega.co.nz/#!hgxB1TDJ!SdbX-jp_yr8E6EUJl7t7Tzrh1p4qKxkTHieoss8yu_Y
bzImage-3.2.51 (2.0 MB)
https://mega.co.nz/#!5lBVmDZL!WpPRWA7ZflevBIPPNGNM_FkkY-ErBNQMoEbw0XePi5I

Config:

$ grep -E 'VIRTIO|MMIO' .config
CONFIG_NET_9P_VIRTIO=y
CONFIG_VIRTIO_BLK=y
CONFIG_VIRTIO_NET=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_VIRTIO=y
CONFIG_VIRTIO_RING=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
CONFIG_VIRTIO_MMIO=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y

I'm running `lkvm run --network mode=user,trans=mmio', and it fails to
create the eth0 device (`cat /proc/net/dev' doesn't show the device).
With `lkvm run --network mode=user' everything just works.

I have also tried kernels 3.0.99, 3.4.65 an 3.10.15, but they seem to
fail the same way.

Am I using it correctly?


Asias, Sasha, is mmio transport supported by virtio networking?


It should be. The ARM folks use it all the time (I suppose).

Did you properly define the network device within the guest? MMIO doesn't have automatic detection 
like PCI does.



Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/3] kvm tools: remove periodic tick in favour of a polling thread

2013-09-07 Thread Sasha Levin


On 09/05/2013 12:39 PM, Jonathan Austin wrote:

Hi Sasha,

On 04/09/13 19:01, Sasha Levin wrote:

On 09/04/2013 01:48 PM, Pekka Enberg wrote:

On Wed, Sep 4, 2013 at 8:40 PM, Jonathan Austin jonathan.aus...@arm.com wrote:

'top' works on ARM with virtio console. I've just done some new testing
and with the serial console emulation and I see the same as you're reporting.
Previously with the 8250 emulation I'd booted to a prompt but didn't actually
test top...

I'm looking in to fixing this now... Looks like I need to find the right place
from which to call serial8250_flush_tx now that it isn't getting called every 
tick.

I've done the following and it works fixes 'top' with serial8250:
---8--
diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c
index 931067f..a71e68d 100644
--- a/tools/kvm/hw/serial.c
+++ b/tools/kvm/hw/serial.c
@@ -260,6 +260,7 @@ static bool serial8250_out(struct ioport *ioport, struct 
kvm *kvm, u16 port,
  dev-lsr = ~UART_LSR_TEMT;
  if (dev-txcnt == FIFO_LEN / 2)
  dev-lsr = ~UART_LSR_THRE;
+   serial8250_flush_tx(kvm, dev);
  } else {
  /* Should never happpen */
  dev-lsr = ~(UART_LSR_TEMT | UART_LSR_THRE);

-8---

I guess it's a shame that we'll be printing each character (admittedly the rate 
will always be
relatively low...) rather than flushing the buffer in a batch. Without a timer, 
though, I'm
not sure I see a better option - every N chars doesn't seem like a good one to 
me.

If you think that looks about right then I'll fold that in to the patch series, 
probably also
removing the call to serial8250_flush_tx() in serial8250__receive.


Yeah, looks good to me and makes top work again.


We might want to make sure performance isn't hit with stuff that's intensive on 
the serial console.


Indeed, the intention here is very much to reduce overhead...

Do you have an idea already of what you'd like to test?

I've written a little testcase that just prints an incrementing counter to the 
console in a tight
loop for 5 seconds (I've tested both buffered and unbuffered output). The 
measure of 'performance'
is how high we count in those 5 seconds.

These are averages (mean) of 5 runs on x86.

+unbuffered+-buffered-+
native  |  36880   |  40354   |
+--+--+
lkvm - original |  24302   |  25335   |
+--+--+
lkvm - no-tick  |  22895   |  28202   |
+--+--+

I ran these all on the framebuffer console. I found that the numbers on 
gnome-terminal seemed to be
affected by the activity level of other gui-ish things, and the numbers were 
different in
gnome-terminal and xterm. If you want to see more testing then a suggestion of 
a way we can take
host terminal performance out of the equation would make me more comfortable 
with the numbers. I do
think that as comparisons to each other they're reasonable as they are, though.

So at least in this reasonably artificial case it looks like a minor win for 
buffered output and a
minor loss in the unbuffered case.

Happy to try out other things if you're interested.


I've played around with it over here. Looks good to me. Thanks!



Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/3] kvm tools: remove periodic tick in favour of a polling thread

2013-09-04 Thread Sasha Levin


On 09/04/2013 01:48 PM, Pekka Enberg wrote:

On Wed, Sep 4, 2013 at 8:40 PM, Jonathan Austin jonathan.aus...@arm.com wrote:

'top' works on ARM with virtio console. I've just done some new testing
and with the serial console emulation and I see the same as you're reporting.
Previously with the 8250 emulation I'd booted to a prompt but didn't actually
test top...

I'm looking in to fixing this now... Looks like I need to find the right place
from which to call serial8250_flush_tx now that it isn't getting called every 
tick.

I've done the following and it works fixes 'top' with serial8250:
---8--
diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c
index 931067f..a71e68d 100644
--- a/tools/kvm/hw/serial.c
+++ b/tools/kvm/hw/serial.c
@@ -260,6 +260,7 @@ static bool serial8250_out(struct ioport *ioport, struct 
kvm *kvm, u16 port,
 dev-lsr = ~UART_LSR_TEMT;
 if (dev-txcnt == FIFO_LEN / 2)
 dev-lsr = ~UART_LSR_THRE;
+   serial8250_flush_tx(kvm, dev);
 } else {
 /* Should never happpen */
 dev-lsr = ~(UART_LSR_TEMT | UART_LSR_THRE);

-8---

I guess it's a shame that we'll be printing each character (admittedly the rate 
will always be
relatively low...) rather than flushing the buffer in a batch. Without a timer, 
though, I'm
not sure I see a better option - every N chars doesn't seem like a good one to 
me.

If you think that looks about right then I'll fold that in to the patch series, 
probably also
removing the call to serial8250_flush_tx() in serial8250__receive.


Yeah, looks good to me and makes top work again.


We might want to make sure performance isn't hit with stuff that's intensive on 
the serial console.


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] kvm tools: fix boot of guests with more than 4gb of ram

2013-07-07 Thread Sasha Levin

Commit kvm tools: virtio: remove hardcoded assumptions
about guest page size has introduced a bug that prevented
guests with more than 4gb of ram from booting.

The issue is that 'pfn' is a 32bit integer, so when multiplying
it by page size to get the actual page will cause an overflow if
the pfn referred to a memory area above 4gb.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/virtio.h | 6 ++
 tools/kvm/virtio/9p.c  | 2 +-
 tools/kvm/virtio/balloon.c | 2 +-
 tools/kvm/virtio/blk.c | 2 +-
 tools/kvm/virtio/console.c | 2 +-
 tools/kvm/virtio/net.c | 2 +-
 tools/kvm/virtio/rng.c | 2 +-
 tools/kvm/virtio/scsi.c| 2 +-
 8 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index 269ea4a..820b94a 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -90,4 +90,10 @@ int virtio_init(struct kvm *kvm, void *dev, struct 
virtio_device *vdev,
struct virtio_ops *ops, enum virtio_trans trans,
int device_id, int subsys_id, int class);
 int virtio_compat_add_message(const char *device, const char *config);
+
+static inline void *virtio_get_vq(struct kvm *kvm, u32 pfn, u32 page_size)
+{
+   return guest_flat_to_host(kvm, (u64)pfn * page_size);
+}
+
 #endif /* KVM__VIRTIO_H */
diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index b7a5723..127b13a 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -1267,7 +1267,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 page_size, u32 align,
 
queue   = p9dev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = virtio_get_vq(kvm, queue-pfn, page_size);
job = p9dev-jobs[vq];
 
vring_init(queue-vring, VIRTQUEUE_NUM, p, align);
diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
index d1b64fa..4863535 100644
--- a/tools/kvm/virtio/balloon.c
+++ b/tools/kvm/virtio/balloon.c
@@ -204,7 +204,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = bdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = virtio_get_vq(kvm, queue-pfn, page_size);
 
thread_pool__init_job(bdev-jobs[vq], kvm, virtio_bln_do_io, queue);
vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, align);
diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index 44ac44b..ab18716 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -167,7 +167,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = bdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = virtio_get_vq(kvm, queue-pfn, page_size);
 
vring_init(queue-vring, VIRTIO_BLK_QUEUE_SIZE, p, align);
 
diff --git a/tools/kvm/virtio/console.c b/tools/kvm/virtio/console.c
index 1f88a4b..83c58bf 100644
--- a/tools/kvm/virtio/console.c
+++ b/tools/kvm/virtio/console.c
@@ -137,7 +137,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = cdev.vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = virtio_get_vq(kvm, queue-pfn, page_size);
 
vring_init(queue-vring, VIRTIO_CONSOLE_QUEUE_SIZE, p, align);
 
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 7855cfc..298903c 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -410,7 +410,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = ndev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = virtio_get_vq(kvm, queue-pfn, page_size);
 
vring_init(queue-vring, VIRTIO_NET_QUEUE_SIZE, p, align);
 
diff --git a/tools/kvm/virtio/rng.c b/tools/kvm/virtio/rng.c
index 2ce8afd..394de71 100644
--- a/tools/kvm/virtio/rng.c
+++ b/tools/kvm/virtio/rng.c
@@ -98,7 +98,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = rdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = virtio_get_vq(kvm, queue-pfn, page_size);
 
job = rdev-jobs[vq];
 
diff --git a/tools/kvm/virtio/scsi.c b/tools/kvm/virtio/scsi.c
index 05b2dc6..8da2420 100644
--- a/tools/kvm/virtio/scsi.c
+++ b/tools/kvm/virtio/scsi.c
@@ -63,7 +63,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue

Re: [RFC 0/5] Introduce VM Sockets virtio transport

2013-06-27 Thread Sasha Levin


Hi Asias,

Looks nice! Some comments inline below (I've removed anything that mst already
commented on).

On 06/27/2013 03:59 AM, Asias He wrote:

Hello guys,

In commit d021c344051af91 (VSOCK: Introduce VM Sockets), VMware added VM
Sockets support. VM Sockets allows communication between virtual
machines and the hypervisor. VM Sockets is able to use different
hyervisor neutral transport to transfer data. Currently, only VMware
VMCI transport is supported.

This series introduces virtio transport for VM Sockets.

Any comments are appreciated! Thanks!

Code:
=
1) kernel bits
git://github.com/asias/linux.git vsock

2) userspace bits:
git://github.com/asias/linux-kvm.git vsock

Howto:
=
Make sure you have these kernel options:

   CONFIG_VSOCKETS=y
   CONFIG_VIRTIO_VSOCKETS=y
   CONFIG_VIRTIO_VSOCKETS_COMMON=y
   CONFIG_VHOST_VSOCK=m

$ git clone git://github.com/asias/linux-kvm.git
$ cd linux-kvm/tools/kvm
$ co -b vsock origin/vsock
$ make
$ modprobe vhost_vsock
$ ./lkvm run -d os.img -k bzImage --vsock guest_cid

Test:
=
I hacked busybox's http server and wget to run over vsock. Start http
server in host and guest, download a 512MB file in guest and host
simultaneously for 6000 times. Manged to run the http stress test.

Also, I wrote a small libvsock.so to play the LD_PRELOAD trick and
managed to make sshd and ssh work over virito-vsock without modifying
the source code.


Why did it require hacking in the first place? Does running with kvmtool
and just doing regular networking over virtio-net running on top of vsock
achieves the same goal?


Draft VM Sockets Virtio Device spec:
=
Appendix K: VM Sockets Device

The virtio VM sockets device is a virtio transport device for VM Sockets. VM
Sockets allows communication between virtual machines and the hypervisor.

Configuration:

Subsystem Device ID 13

Virtqueues:
 0:controlq; 1:receiveq0; 2:transmitq0 ... 2N+1:receivqN; 2N+2:transmitqN


controlq is defined but not used, is there something in mind for it? if not,
does it make sense keeping it here? we can always re-add it to the end just
like in virtio-net.


Feature bits:
 Currently, no feature bits are defined.

Device configuration layout:

Two configuration fields are currently defined.

struct virtio_vsock_config {
__u32 guest_cid;
__u32 max_virtqueue_pairs;
} __packed;

The guest_cid field specifies the guest context id which likes the guest IP
address. The max_virtqueue_pairs field specifies the maximum number of receive
and transmit virtqueue pairs (receiveq0 ...  receiveqN and transmitq0 ...
transmitqN respectively; N = max_virtqueue_pairs - 1 ) that can be configured.
The driver is free to use only one virtqueue pairs, or it can use more to
achieve better performance.


How does the driver tell the device how many vqs it's planning on actually 
using?
or is it assumed that all of them are in use?



Device Initialization:
The initialization routine should discover the device's virtqueues.

Device Operation:
Packets are transmitted by placing them in the transmitq0..transmitqN, and
buffers for incoming packets are placed in the receiveq0..receiveqN. In each
case, the packet itself is preceded by a header:

struct virtio_vsock_hdr {
__u32   src_cid;
__u32   src_port;
__u32   dst_cid;
__u32   dst_port;
__u32   len;
__u8type;
__u8op;
__u8shut;
__u64   fwd_cnt;
__u64   buf_alloc;
} __packed;

src_cid and dst_cid: specify the source and destination context id.
src_port and dst_port: specify the source and destination port.
len: specifies the size of the data payload, it could be zero if no data
payload is transferred.
type: specifies the type of the packet, it can be SOCK_STREAM or SOCK_DGRAM.
op: specifies the operation of the packet, it is defined as follows.

enum {
VIRTIO_VSOCK_OP_INVALID = 0,
VIRTIO_VSOCK_OP_REQUEST = 1,
VIRTIO_VSOCK_OP_NEGOTIATE = 2,
VIRTIO_VSOCK_OP_OFFER = 3,
VIRTIO_VSOCK_OP_ATTACH = 4,
VIRTIO_VSOCK_OP_RW = 5,
VIRTIO_VSOCK_OP_CREDIT = 6,
VIRTIO_VSOCK_OP_RST = 7,
VIRTIO_VSOCK_OP_SHUTDOWN = 8,
};

shut: specifies the shutdown mode when the socket is being shutdown. 1 is for
receive shutdown, 2 is for transmit shutdown, 3 is for both receive and transmit
shutdown.
fwd_cnt: specifies the the number of bytes the receiver has forwarded to 
userspace.


For the previous packet? For the entire session? Reading ahead makes it clear 
but
it's worth mentioning here the context just to make it easy for implementers.


buf_alloc: specifies the size of the receiver's recieve buffer in bytes.

  receive


Virtio VM socket connection creation:
1) Client

Re: [PATCH] kvm tools: fix boot of guests with more than 4gb of ram

2013-06-25 Thread Sasha Levin


On 06/24/2013 08:58 PM, Michael Ellerman wrote:

On Sun, 2013-06-23 at 21:23 -0400, Sasha Levin wrote:

Commit kvm tools: virtio: remove hardcoded assumptions
about guest page size has introduced a bug that prevented
guests with more than 4gb of ram from booting.

The issue is that 'pfn' is a 32bit integer, so when multiplying
it by page size to get the actual page will cause an overflow if
the pfn referred to a memory area above 4gb.


Couldn't we just make pfn 64 bit?


pfn is passed to us by the guest virtio driver, and is 32 bit.


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: fix boot of guests with more than 4gb of ram

2013-06-23 Thread Sasha Levin

Commit kvm tools: virtio: remove hardcoded assumptions
about guest page size has introduced a bug that prevented
guests with more than 4gb of ram from booting.

The issue is that 'pfn' is a 32bit integer, so when multiplying
it by page size to get the actual page will cause an overflow if
the pfn referred to a memory area above 4gb.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/9p.c  | 2 +-
 tools/kvm/virtio/balloon.c | 2 +-
 tools/kvm/virtio/blk.c | 2 +-
 tools/kvm/virtio/console.c | 2 +-
 tools/kvm/virtio/net.c | 2 +-
 tools/kvm/virtio/rng.c | 2 +-
 tools/kvm/virtio/scsi.c| 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index b7a5723..033b638 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -1267,7 +1267,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 page_size, u32 align,
 
queue   = p9dev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);
job = p9dev-jobs[vq];
 
vring_init(queue-vring, VIRTQUEUE_NUM, p, align);
diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
index d1b64fa..557ea25 100644
--- a/tools/kvm/virtio/balloon.c
+++ b/tools/kvm/virtio/balloon.c
@@ -204,7 +204,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = bdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);
 
thread_pool__init_job(bdev-jobs[vq], kvm, virtio_bln_do_io, queue);
vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, align);
diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index 44ac44b..c977353 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -167,7 +167,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = bdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);
 
vring_init(queue-vring, VIRTIO_BLK_QUEUE_SIZE, p, align);
 
diff --git a/tools/kvm/virtio/console.c b/tools/kvm/virtio/console.c
index 1f88a4b..3ac4c9f 100644
--- a/tools/kvm/virtio/console.c
+++ b/tools/kvm/virtio/console.c
@@ -137,7 +137,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = cdev.vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);
 
vring_init(queue-vring, VIRTIO_CONSOLE_QUEUE_SIZE, p, align);
 
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 7855cfc..34a5f27 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -410,7 +410,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = ndev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);
 
vring_init(queue-vring, VIRTIO_NET_QUEUE_SIZE, p, align);
 
diff --git a/tools/kvm/virtio/rng.c b/tools/kvm/virtio/rng.c
index 2ce8afd..d1754dc 100644
--- a/tools/kvm/virtio/rng.c
+++ b/tools/kvm/virtio/rng.c
@@ -98,7 +98,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = rdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);
 
job = rdev-jobs[vq];
 
diff --git a/tools/kvm/virtio/scsi.c b/tools/kvm/virtio/scsi.c
index 05b2dc6..ad6320d 100644
--- a/tools/kvm/virtio/scsi.c
+++ b/tools/kvm/virtio/scsi.c
@@ -63,7 +63,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
 
queue   = sdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
+   p   = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);
 
vring_init(queue-vring, VIRTIO_SCSI_QUEUE_SIZE, p, align);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: add status notification hook to virtio-mmio

2013-05-26 Thread Sasha Levin

On 05/21/2013 08:29 AM, Marc Zyngier wrote:
 Patch e03b449cbddf (kvm tools: add status notification hook for
 virtio) converted virtio-net to use a new notification mechanism,
 but unfortunately left virtio-mmio behind.
 
 The direct consequence is that both arm and arm64 guests are left
 without any form of networking. Not good.
 
 Bring virtio-mmio into this brave new notified world, and feel the
 power of being able to ping again.
 
 Cc: Sasha Levin sasha.le...@oracle.com
 Cc: Will Deacon will.dea...@arm.com
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  tools/kvm/virtio/mmio.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c
 index a4e4855..afa2692 100644
 --- a/tools/kvm/virtio/mmio.c
 +++ b/tools/kvm/virtio/mmio.c
 @@ -144,16 +144,21 @@ static void virtio_mmio_config_out(u64 addr, void 
 *data, u32 len,
  struct virtio_device *vdev)
  {
   struct virtio_mmio *vmmio = vdev-virtio;
 + struct kvm *kvm = vmmio-kvm;
   u32 val = 0;
  
   switch (addr) {
   case VIRTIO_MMIO_HOST_FEATURES_SEL:
   case VIRTIO_MMIO_GUEST_FEATURES_SEL:
   case VIRTIO_MMIO_QUEUE_SEL:
 - case VIRTIO_MMIO_STATUS:
   val = ioport__read32(data);
   *(u32 *)(((void *)vmmio-hdr) + addr) = val;
   break;
 + case VIRTIO_MMIO_STATUS:
 + vmmio-hdr.status = ioport__read32(data);
 + if (vdev-ops-notify_status)
 + vdev-ops-notify_status(kvm, vmmio-dev, 
 vmmio-hdr.status);
 + break;
   case VIRTIO_MMIO_GUEST_FEATURES:
   if (vmmio-hdr.guest_features_sel == 0) {
   val = ioport__read32(data);
 

Pekka, could you apply please?

Acked-by: Sasha Levin sasha.le...@oracle.com


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 0/2] virtio_balloon: auto-ballooning support

2013-05-16 Thread Sasha Levin

On 05/09/2013 10:53 AM, Luiz Capitulino wrote:
 Hi,
 
 This series is a respin of automatic ballooning support I started
 working on last year. Patch 2/2 contains all relevant technical
 details and performance measurements results.
 
 This is in RFC state because it's a work in progress.

Hi Luiz,

Is there a virtio spec patch I could use to get it implemented on
kvmtool?


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 04/11] kvm tools: console: unconditionally output to any console

2013-05-06 Thread Sasha Levin

On 05/03/2013 12:09 PM, Will Deacon wrote:
 On Fri, May 03, 2013 at 05:02:14PM +0100, Sasha Levin wrote:
 On 05/03/2013 05:19 AM, Pekka Enberg wrote:
 On Wed, May 1, 2013 at 6:50 PM, Will Deacon will.dea...@arm.com wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 Kvmtool suppresses any output to a console that has not been elected
 as *the* console.

 While this makes sense on the input side (we want the input to be sent
 to one console driver only), it seems to be the wrong thing to do on
 the output side, as it effectively prevents the guest from switching
 from one console to another (think earlyprintk using 8250 to virtio
 console).

 After all, the guest *does* poke this device and outputs something
 there.

 Just remove the kvm-cfg.active_console test from the output paths.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Will Deacon will.dea...@arm.com

 Seems reasonable. Asias, Sasha?


 I remember at trying it some time ago but dropped it for a reason I don't
 remember at the moment.

 Can I have the weekend to play with it to try and figure out why?
 
 There's no rush from my point of view (hence the RFC) so take as long as you
 need!

Looks good to me!


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 04/11] kvm tools: console: unconditionally output to any console

2013-05-03 Thread Sasha Levin

On 05/03/2013 05:19 AM, Pekka Enberg wrote:
 On Wed, May 1, 2013 at 6:50 PM, Will Deacon will.dea...@arm.com wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 Kvmtool suppresses any output to a console that has not been elected
 as *the* console.

 While this makes sense on the input side (we want the input to be sent
 to one console driver only), it seems to be the wrong thing to do on
 the output side, as it effectively prevents the guest from switching
 from one console to another (think earlyprintk using 8250 to virtio
 console).

 After all, the guest *does* poke this device and outputs something
 there.

 Just remove the kvm-cfg.active_console test from the output paths.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Will Deacon will.dea...@arm.com
 
 Seems reasonable. Asias, Sasha?
 

I remember at trying it some time ago but dropped it for a reason I don't
remember at the moment.

Can I have the weekend to play with it to try and figure out why?


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/8] kvm tools: fix vhost interaction with ctrl vq

2013-05-03 Thread Sasha Levin

We broke networking using vhost with the introduction of a ctrl vq,
make sure that that queue get treated like a special case.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/net.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 2de9222..bef0039 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -357,6 +357,11 @@ static void set_guest_features(struct kvm *kvm, void *dev, 
u32 features)
ndev-features = features;
 }
 
+static bool is_ctrl_vq(struct net_dev *ndev, u32 vq)
+{
+   return vq == (u32)(ndev-queue_pairs * 2);
+}
+
 static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 
align,
   u32 pfn)
 {
@@ -377,10 +382,12 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 page_size, u32 align,
 
mutex_init(ndev-io_lock[vq]);
pthread_cond_init(ndev-io_cond[vq], NULL);
-   if (ndev-vhost_fd == 0) {
-   if (vq == (u32)(ndev-queue_pairs * 2))
-   pthread_create(ndev-io_thread[vq], NULL, 
virtio_net_ctrl_thread, ndev);
-   else if (vq  1)
+   if (is_ctrl_vq(ndev, vq)) {
+   pthread_create(ndev-io_thread[vq], NULL, 
virtio_net_ctrl_thread, ndev);
+
+   return 0;
+   } else if (ndev-vhost_fd == 0 ) {
+   if (vq  1)
pthread_create(ndev-io_thread[vq], NULL, 
virtio_net_tx_thread, ndev);
else
pthread_create(ndev-io_thread[vq], NULL, 
virtio_net_rx_thread, ndev);
@@ -453,7 +460,7 @@ static void notify_vq_eventfd(struct kvm *kvm, void *dev, 
u32 vq, u32 efd)
};
int r;
 
-   if (ndev-vhost_fd == 0)
+   if (ndev-vhost_fd == 0 || is_ctrl_vq(ndev, vq))
return;
 
r = ioctl(ndev-vhost_fd, VHOST_SET_VRING_KICK, file);
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/8] kvm tools: pass virtio header size to uip_init

2013-05-03 Thread Sasha Levin

We want to make the size of the virtio net header opaque to
uip.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/uip.h | 1 +
 tools/kvm/net/uip/core.c| 8 
 tools/kvm/net/uip/tcp.c | 2 +-
 tools/kvm/net/uip/udp.c | 2 +-
 tools/kvm/virtio/net.c  | 1 +
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/include/kvm/uip.h b/tools/kvm/include/kvm/uip.h
index ac248d2..338582d 100644
--- a/tools/kvm/include/kvm/uip.h
+++ b/tools/kvm/include/kvm/uip.h
@@ -205,6 +205,7 @@ struct uip_info {
u32 dns_ip[UIP_DHCP_MAX_DNS_SERVER_NR];
char *domain_name;
u32 buf_nr;
+   u32 vnet_hdr_len;
 };
 
 struct uip_buf {
diff --git a/tools/kvm/net/uip/core.c b/tools/kvm/net/uip/core.c
index 4e5bb82..7a74261 100644
--- a/tools/kvm/net/uip/core.c
+++ b/tools/kvm/net/uip/core.c
@@ -172,10 +172,10 @@ int uip_init(struct uip_info *info)
}
 
list_for_each_entry(buf, buf_head, list) {
-   buf-vnet   = malloc(sizeof(struct virtio_net_hdr));
-   buf-vnet_len   = sizeof(struct virtio_net_hdr);
-   buf-eth= malloc(1024*64 + sizeof(struct 
uip_pseudo_hdr));
-   buf-eth_len= 1024*64 + sizeof(struct uip_pseudo_hdr);
+   buf-vnet_len   = info-vnet_hdr_len;
+   buf-vnet   = malloc(buf-vnet_len);
+   buf-eth_len= 1024*64 + sizeof(struct uip_pseudo_hdr);
+   buf-eth= malloc(buf-eth_len);
 
memset(buf-vnet, 0, buf-vnet_len);
memset(buf-eth, 0, buf-eth_len);
diff --git a/tools/kvm/net/uip/tcp.c b/tools/kvm/net/uip/tcp.c
index 9044f40..3c30ade 100644
--- a/tools/kvm/net/uip/tcp.c
+++ b/tools/kvm/net/uip/tcp.c
@@ -153,7 +153,7 @@ static int uip_tcp_payload_send(struct uip_tcp_socket *sk, 
u8 flag, u16 payload_
/*
 * virtio_net_hdr
 */
-   buf-vnet_len   = sizeof(struct virtio_net_hdr);
+   buf-vnet_len   = info-vnet_hdr_len;
memset(buf-vnet, 0, buf-vnet_len);
 
buf-eth_len= ntohs(ip2-len) + uip_eth_hdrlen(ip2-eth);
diff --git a/tools/kvm/net/uip/udp.c b/tools/kvm/net/uip/udp.c
index 31c417c..dd288c5 100644
--- a/tools/kvm/net/uip/udp.c
+++ b/tools/kvm/net/uip/udp.c
@@ -142,7 +142,7 @@ int uip_udp_make_pkg(struct uip_info *info, struct 
uip_udp_socket *sk, struct ui
/*
 * virtio_net_hdr
 */
-   buf-vnet_len   = sizeof(struct virtio_net_hdr);
+   buf-vnet_len   = info-vnet_hdr_len;
memset(buf-vnet, 0, buf-vnet_len);
 
buf-eth_len= ntohs(ip2-len) + uip_eth_hdrlen(ip2-eth);
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index c0a8f12..2de9222 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -668,6 +668,7 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
ndev-info.guest_ip = 
ntohl(inet_addr(params-guest_ip));
ndev-info.guest_netmask= 
ntohl(inet_addr(255.255.255.0));
ndev-info.buf_nr   = 20,
+   ndev-info.vnet_hdr_len = sizeof(struct virtio_net_hdr);
uip_init(ndev-info);
ndev-ops = uip_ops;
}
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] kvm tools: add status notification hook for virtio

2013-05-03 Thread Sasha Levin

Some devices want to know their status, use this hook to allow them to
get that notification.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/virtio.h | 1 +
 tools/kvm/virtio/pci.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index 924279b..269ea4a 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -80,6 +80,7 @@ struct virtio_ops {
void (*notify_vq_eventfd)(struct kvm *kvm, void *dev, u32 vq, u32 efd);
int (*signal_vq)(struct kvm *kvm, struct virtio_device *vdev, u32 
queueid);
int (*signal_config)(struct kvm *kvm, struct virtio_device *vdev);
+   void (*notify_status)(struct kvm *kvm, void *dev, u8 status);
int (*init)(struct kvm *kvm, void *dev, struct virtio_device *vdev,
int device_id, int subsys_id, int class);
int (*exit)(struct kvm *kvm, struct virtio_device *vdev);
diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index 227d567..fec8ce0 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -210,6 +210,8 @@ static bool virtio_pci__io_out(struct ioport *ioport, 
struct kvm *kvm, u16 port,
break;
case VIRTIO_PCI_STATUS:
vpci-status = ioport__read8(data);
+   if (vdev-ops-notify_status)
+   vdev-ops-notify_status(kvm, vpci-dev, vpci-status);
break;
default:
ret = virtio_pci__specific_io_out(kvm, vdev, port, data, size, 
offset);
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] kvm tools: use iovec functions in uip_rx

2013-05-03 Thread Sasha Levin

Simplifies the code a lot.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/uip.h |  4 ++--
 tools/kvm/net/uip/core.c| 47 +++--
 2 files changed, 5 insertions(+), 46 deletions(-)

diff --git a/tools/kvm/include/kvm/uip.h b/tools/kvm/include/kvm/uip.h
index 338582d..cb79e94 100644
--- a/tools/kvm/include/kvm/uip.h
+++ b/tools/kvm/include/kvm/uip.h
@@ -214,8 +214,8 @@ struct uip_buf {
int vnet_len;
int eth_len;
int status;
-   char *vnet;
-   char *eth;
+   unsigned char *vnet;
+   unsigned char *eth;
int id;
 };
 
diff --git a/tools/kvm/net/uip/core.c b/tools/kvm/net/uip/core.c
index 7a74261..e31efc2 100644
--- a/tools/kvm/net/uip/core.c
+++ b/tools/kvm/net/uip/core.c
@@ -4,6 +4,7 @@
 #include linux/virtio_net.h
 #include linux/kernel.h
 #include linux/list.h
+#include kvm/iovec.h
 
 int uip_tx(struct iovec *iov, u16 out, struct uip_info *info)
 {
@@ -76,61 +77,19 @@ int uip_tx(struct iovec *iov, u16 out, struct uip_info 
*info)
 
 int uip_rx(struct iovec *iov, u16 in, struct uip_info *info)
 {
-   struct virtio_net_hdr *vnet;
-   struct uip_eth *eth;
struct uip_buf *buf;
-   int vnet_len;
-   int eth_len;
-   char *p;
int len;
-   int cnt;
-   int i;
 
/*
 * Sleep until there is a buffer for guest
 */
buf = uip_buf_get_used(info);
 
-   /*
-* Fill device to guest buffer, vnet hdr fisrt
-*/
-   vnet_len = iov[0].iov_len;
-   vnet = iov[0].iov_base;
-   if (buf-vnet_len  vnet_len) {
-   len = -1;
-   goto out;
-   }
-   memcpy(vnet, buf-vnet, buf-vnet_len);
-
-   /*
-* Then, the real eth data
-* Note: Be sure buf-eth_len is not bigger than the buffer len that 
guest provides
-*/
-   cnt = buf-eth_len;
-   p = buf-eth;
-   for (i = 1; i  in; i++) {
-   eth_len = iov[i].iov_len;
-   eth = iov[i].iov_base;
-   if (cnt  eth_len) {
-   memcpy(eth, p, eth_len);
-   cnt -= eth_len;
-   p += eth_len;
-   } else {
-   memcpy(eth, p, cnt);
-   cnt -= cnt;
-   break;
-   }
-   }
-
-   if (cnt) {
-   pr_warning(uip_rx error);
-   len = -1;
-   goto out;
-   }
+   memcpy_toiovecend(iov, buf-vnet, 0, buf-vnet_len);
+   memcpy_toiovecend(iov, buf-eth, buf-vnet_len, buf-eth_len);
 
len = buf-vnet_len + buf-eth_len;
 
-out:
uip_buf_set_free(info, buf);
return len;
 }
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] kvm tools: use correct vnet header size for mergable rx buffers

2013-05-03 Thread Sasha Levin

vnet header size depends on whether we use mergable rx buffers.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/net.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 2dbca09..15dbde3 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -243,7 +243,9 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
goto fail;
}
 
-   hdr_len = sizeof(struct virtio_net_hdr);
+   hdr_len = (ndev-features  (1  VIRTIO_NET_F_MRG_RXBUF)) ?
+   sizeof(struct virtio_net_hdr_mrg_rxbuf) :
+   sizeof(struct virtio_net_hdr);
if (ioctl(ndev-tap_fd, TUNSETVNETHDRSZ, hdr_len)  0)
pr_warning(Config tap device TUNSETVNETHDRSZ error);
 
@@ -679,7 +681,6 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
ndev-info.guest_ip = 
ntohl(inet_addr(params-guest_ip));
ndev-info.guest_netmask= 
ntohl(inet_addr(255.255.255.0));
ndev-info.buf_nr   = 20,
-   ndev-info.vnet_hdr_len = sizeof(struct virtio_net_hdr);
ndev-ops = uip_ops;
}
 
@@ -710,6 +711,9 @@ static void notify_status(struct kvm *kvm, void *dev, u8 
status)
if (!virtio_net__tap_init(ndev))
die_perror(You have requested a TAP device, but 
creation of one has failed because);
} else {
+   ndev-info.vnet_hdr_len = (ndev-features  (1  
VIRTIO_NET_F_MRG_RXBUF)) ?
+   sizeof(struct 
virtio_net_hdr_mrg_rxbuf) :
+   sizeof(struct virtio_net_hdr);
uip_init(ndev-info);
}
 }
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/8] kvm tools: virtio-net mergable rx buffers

2013-05-03 Thread Sasha Levin

Support mergable rx buffers for virtio-net. This helps reduce the amount
of memory the guest kernel has to allocate per rx vq.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/uip.h |  2 +-
 tools/kvm/net/uip/core.c|  2 +-
 tools/kvm/virtio/net.c  | 42 --
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/tools/kvm/include/kvm/uip.h b/tools/kvm/include/kvm/uip.h
index cb79e94..4e63808 100644
--- a/tools/kvm/include/kvm/uip.h
+++ b/tools/kvm/include/kvm/uip.h
@@ -253,7 +253,7 @@ struct uip_tcp_socket {
 };
 
 struct uip_tx_arg {
-   struct virtio_net_hdr *vnet;
+   void *vnet;
struct uip_info *info;
struct uip_eth *eth;
int vnet_len;
diff --git a/tools/kvm/net/uip/core.c b/tools/kvm/net/uip/core.c
index e31efc2..789b814 100644
--- a/tools/kvm/net/uip/core.c
+++ b/tools/kvm/net/uip/core.c
@@ -8,7 +8,7 @@
 
 int uip_tx(struct iovec *iov, u16 out, struct uip_info *info)
 {
-   struct virtio_net_hdr *vnet;
+   void *vnet;
struct uip_tx_arg arg;
int eth_len, vnet_len;
struct uip_eth *eth;
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 15dbde3..7855cfc 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -8,6 +8,7 @@
 #include kvm/irq.h
 #include kvm/uip.h
 #include kvm/guest_compat.h
+#include kvm/iovec.h
 
 #include linux/vhost.h
 #include linux/virtio_net.h
@@ -65,6 +66,13 @@ struct net_dev {
 static LIST_HEAD(ndevs);
 static int compat_id = -1;
 
+#define MAX_PACKET_SIZE 65550
+
+static bool has_virtio_feature(struct net_dev *ndev, u32 feature)
+{
+   return ndev-features  (1  feature);
+}
+
 static void *virtio_net_rx_thread(void *p)
 {
struct iovec iov[VIRTIO_NET_QUEUE_SIZE];
@@ -73,7 +81,7 @@ static void *virtio_net_rx_thread(void *p)
struct net_dev *ndev = p;
u16 out, in;
u16 head;
-   int len;
+   size_t len, copied;
u32 id;
 
mutex_lock(ndev-mutex);
@@ -92,10 +100,31 @@ static void *virtio_net_rx_thread(void *p)
mutex_unlock(ndev-io_lock[id]);
 
while (virt_queue__available(vq)) {
+   unsigned char buffer[MAX_PACKET_SIZE + sizeof(struct 
virtio_net_hdr_mrg_rxbuf)];
+   struct iovec dummy_iov = {
+   .iov_base = buffer,
+   .iov_len  = sizeof(buffer),
+   };
+   struct virtio_net_hdr_mrg_rxbuf *hdr;
+
+   len = ndev-ops-rx(dummy_iov, 1, ndev);
+   copied = 0;
head = virt_queue__get_iov(vq, iov, out, in, kvm);
-   len = ndev-ops-rx(iov, in, ndev);
-   virt_queue__set_used_elem(vq, head, len);
-
+   hdr = (void *)iov[0].iov_base;
+   while (copied  len) {
+   size_t iovsize = min(len - copied, 
iov_size(iov, in));
+
+   memcpy_toiovecend(iov, buffer, copied, iovsize);
+   copied += iovsize;
+   if (has_virtio_feature(ndev, 
VIRTIO_NET_F_MRG_RXBUF))
+   hdr-num_buffers++;
+   virt_queue__set_used_elem(vq, head, iovsize);
+   if (copied == len)
+   break;
+   while (!virt_queue__available(vq))
+   sleep(0);
+   head = virt_queue__get_iov(vq, iov, out, in, 
kvm);
+   }
/* We should interrupt guest right now, otherwise 
latency is huge. */
if (virtio_queue__should_signal(vq))
ndev-vdev.ops-signal_vq(kvm, ndev-vdev, id);
@@ -243,7 +272,7 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
goto fail;
}
 
-   hdr_len = (ndev-features  (1  VIRTIO_NET_F_MRG_RXBUF)) ?
+   hdr_len = has_virtio_feature(ndev, VIRTIO_NET_F_MRG_RXBUF) ?
sizeof(struct virtio_net_hdr_mrg_rxbuf) :
sizeof(struct virtio_net_hdr);
if (ioctl(ndev-tap_fd, TUNSETVNETHDRSZ, hdr_len)  0)
@@ -351,6 +380,7 @@ static u32 get_host_features(struct kvm *kvm, void *dev)
| 1UL  VIRTIO_RING_F_EVENT_IDX
| 1UL  VIRTIO_RING_F_INDIRECT_DESC
| 1UL  VIRTIO_NET_F_CTRL_VQ
+   | 1UL  VIRTIO_NET_F_MRG_RXBUF
| 1UL  (ndev-queue_pairs  1 ? VIRTIO_NET_F_MQ : 0);
 }
 
@@ -711,7 +741,7 @@ static void notify_status(struct kvm *kvm, void *dev, u8 
status)
if (!virtio_net__tap_init(ndev))
die_perror(You have requested a TAP device, but 
creation of one has failed because);
} else

[PATCH 6/8] kvm tools: steal iovec handling routines from the kernel

2013-05-03 Thread Sasha Levin

They're hidden inside net/core/iovec.c. It'd be nice to just link to that
but they're not too generic and come with tons of net/ specific code we
don't want. So we just copy over the relevant parts.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/Makefile   |   1 +
 tools/kvm/include/kvm/iovec.h|  21 +++
 tools/kvm/include/linux/kernel.h |  10 
 tools/kvm/util/iovec.c   | 126 +++
 4 files changed, 158 insertions(+)
 create mode 100644 tools/kvm/include/kvm/iovec.h
 create mode 100644 tools/kvm/util/iovec.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index a0a0a9b..70accaa 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -86,6 +86,7 @@ OBJS  += net/uip/csum.o
 OBJS   += net/uip/dhcp.o
 OBJS   += kvm-cmd.o
 OBJS   += util/init.o
+OBJS+= util/iovec.o
 OBJS   += util/rbtree.o
 OBJS   += util/threadpool.o
 OBJS   += util/parse-options.o
diff --git a/tools/kvm/include/kvm/iovec.h b/tools/kvm/include/kvm/iovec.h
new file mode 100644
index 000..fe79dd4
--- /dev/null
+++ b/tools/kvm/include/kvm/iovec.h
@@ -0,0 +1,21 @@
+#ifndef KVM_UTIL_IOVEC_H_
+#define KVM_UTIL_IOVEC_H_
+
+extern int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len);
+extern int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
+   size_t offset, int len);
+extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
+extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
+   size_t offset, int len);
+
+static inline size_t iov_size(const struct iovec *iovec, size_t len)
+{
+   size_t size = 0, i;
+
+   for (i = 0; i  len; i++)
+   size += iovec[i].iov_len;
+
+   return size;
+}
+
+#endif
diff --git a/tools/kvm/include/linux/kernel.h b/tools/kvm/include/linux/kernel.h
index 1e9abe9..f2bff5f 100644
--- a/tools/kvm/include/linux/kernel.h
+++ b/tools/kvm/include/linux/kernel.h
@@ -36,6 +36,16 @@
(void) (_max1 == _max2);  \
_max1  _max2 ? _max1 : _max2; })
 
+#define min_t(type, x, y) ({\
+   type __min1 = (x);  \
+   type __min2 = (y);  \
+   __min1  __min2 ? __min1: __min2; })
+
+#define max_t(type, x, y) ({\
+   type __max1 = (x);  \
+   type __max2 = (y);  \
+   __max1  __max2 ? __max1: __max2; })
+
 #define true 1
 
 #endif
diff --git a/tools/kvm/util/iovec.c b/tools/kvm/util/iovec.c
new file mode 100644
index 000..0c8b9cf
--- /dev/null
+++ b/tools/kvm/util/iovec.c
@@ -0,0 +1,126 @@
+/*
+ * iovec manipulation routines.
+ *
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Fixes:
+ * Andrew Lunn :   Errors in iovec copying.
+ * Pedro Roque :   Added memcpy_fromiovecend and
+ * csum_..._fromiovecend.
+ * Andi Kleen  :   fixed error handling for 2.1
+ * Alexey Kuznetsov:   2.1 optimisations
+ * Andi Kleen  :   Fix csum*fromiovecend for IPv6.
+ */
+
+#include linux/errno.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/compiler.h
+#include sys/uio.h
+#include kvm/iovec.h
+#include string.h
+
+/*
+ * Copy kernel to iovec. Returns -EFAULT on error.
+ *
+ * Note: this modifies the original iovec.
+ */
+
+int memcpy_toiovec(struct iovec *iov, unsigned char *kdata, int len)
+{
+   while (len  0) {
+   if (iov-iov_len) {
+   int copy = min_t(unsigned int, iov-iov_len, len);
+   memcpy(iov-iov_base, kdata, copy);
+   kdata += copy;
+   len -= copy;
+   iov-iov_len -= copy;
+   iov-iov_base += copy;
+   }
+   iov++;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(memcpy_toiovec);
+
+/*
+ * Copy kernel to iovec. Returns -EFAULT on error.
+ */
+
+int memcpy_toiovecend(const struct iovec *iov, unsigned char *kdata,
+ size_t offset, int len)
+{
+   int copy;
+   for (; len  0; ++iov) {
+   /* Skip over the finished iovecs */
+   if (unlikely(offset = iov-iov_len)) {
+   offset -= iov-iov_len;
+   continue;
+   }
+   copy = min_t(unsigned int, iov-iov_len - offset, len);
+   memcpy(iov-iov_base + offset, kdata, copy);
+   offset = 0;
+   kdata += copy;
+   len -= copy

[PATCH 4/8] kvm tools: init network devices only when the virtio driver is ready to go

2013-05-03 Thread Sasha Levin

We may need to know what features are supported before we can init the
network on the host side.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/net.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index bef0039..2dbca09 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -58,6 +58,8 @@ struct net_dev {
struct uip_info info;
struct net_dev_operations   *ops;
struct kvm  *kvm;
+
+   struct virtio_net_params*params;
 };
 
 static LIST_HEAD(ndevs);
@@ -207,13 +209,13 @@ static void virtio_net_handle_callback(struct kvm *kvm, 
struct net_dev *ndev, in
mutex_unlock(ndev-io_lock[queue]);
 }
 
-static bool virtio_net__tap_init(const struct virtio_net_params *params,
-   struct net_dev *ndev)
+static bool virtio_net__tap_init(struct net_dev *ndev)
 {
int sock = socket(AF_INET, SOCK_STREAM, 0);
int pid, status, offload, hdr_len;
struct sockaddr_in sin = {0};
struct ifreq ifr;
+   const struct virtio_net_params *params = ndev-params;
 
/* Did the user already gave us the FD? */
if (params-fd) {
@@ -496,6 +498,8 @@ static int set_size_vq(struct kvm *kvm, void *dev, u32 vq, 
int size)
return size;
 }
 
+static void notify_status(struct kvm *kvm, void *dev, u8 status);
+
 static struct virtio_ops net_dev_virtio_ops = (struct virtio_ops) {
.get_config = get_config,
.get_host_features  = get_host_features,
@@ -507,6 +511,7 @@ static struct virtio_ops net_dev_virtio_ops = (struct 
virtio_ops) {
.notify_vq  = notify_vq,
.notify_vq_gsi  = notify_vq_gsi,
.notify_vq_eventfd  = notify_vq_eventfd,
+   .notify_status  = notify_status,
 };
 
 static void virtio_net__vhost_init(struct kvm *kvm, struct net_dev *ndev)
@@ -652,6 +657,7 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
list_add_tail(ndev-list, ndevs);
 
ndev-kvm = params-kvm;
+   ndev-params = params;
 
mutex_init(ndev-mutex);
ndev-queue_pairs = max(1, min(VIRTIO_NET_NUM_QUEUES, params-mq));
@@ -667,8 +673,6 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
 
ndev-mode = params-mode;
if (ndev-mode == NET_MODE_TAP) {
-   if (!virtio_net__tap_init(params, ndev))
-   die_perror(You have requested a TAP device, but 
creation of one has failed because);
ndev-ops = tap_ops;
} else {
ndev-info.host_ip  = 
ntohl(inet_addr(params-host_ip));
@@ -676,7 +680,6 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
ndev-info.guest_netmask= 
ntohl(inet_addr(255.255.255.0));
ndev-info.buf_nr   = 20,
ndev-info.vnet_hdr_len = sizeof(struct virtio_net_hdr);
-   uip_init(ndev-info);
ndev-ops = uip_ops;
}
 
@@ -696,6 +699,21 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
return 0;
 }
 
+static void notify_status(struct kvm *kvm, void *dev, u8 status)
+{
+   struct net_dev *ndev = dev;
+
+   if (!(status  VIRTIO_CONFIG_S_DRIVER_OK))
+   return;
+
+   if (ndev-mode == NET_MODE_TAP) {
+   if (!virtio_net__tap_init(ndev))
+   die_perror(You have requested a TAP device, but 
creation of one has failed because);
+   } else {
+   uip_init(ndev-info);
+   }
+}
+
 int virtio_net__init(struct kvm *kvm)
 {
int i;
@@ -706,7 +724,7 @@ int virtio_net__init(struct kvm *kvm)
}
 
if (kvm-cfg.num_net_devices == 0  kvm-cfg.no_net == 0) {
-   struct virtio_net_params net_params;
+   static struct virtio_net_params net_params;
 
net_params = (struct virtio_net_params) {
.guest_ip   = kvm-cfg.guest_ip,
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: virtio-net mergable rx buffers

2013-04-28 Thread Sasha Levin

On 04/28/2013 08:44 PM, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Wed, Apr 24, 2013 at 09:51:57AM +0300, Pekka Enberg wrote:
  Hi,
  
  On 04/23/2013 12:35 PM, Eric Northup wrote:
   Do you care about guests with drivers that don't negotiate
   VIRTIO_NET_F_MRG_RXBUF?
  
  On Wed, Apr 24, 2013 at 5:51 AM, Sasha Levin sasha.le...@oracle.com 
  wrote:
   We usually try to keep backward compatibility, but in this case
   mergable RX buffers are about 5 years old now, so it's safe to
   assume they'll be running in any guest.
  
   Unless there is a specific reason to allow working without them
   I'd rather keep the code simple in this case.
  
  Are there such guests around? What's the failure scenario for them
  after this patch?
  
  Pekka
 
  Warning: have not looked at the patch, just a general comment.
 
  I think it's reasonable to assume embedded guests such as PXE won't
  negotiate any features.  And, running old guests is one of the reasons
  people use virtualization at all. So 5 years is not a lot.
 
  In any case, stick to the device spec please, if you want it changed
  please send a spec patch, don't deviate from it randomly.
 Supporting old guests is an quality of implementation issue.  It's like
 any ABI: if noone will notice, you can remove stuff.
 
 But the case of I can receive GSO packets but I don't support mergeable
 buffers is a trivial one: you can support it by pretending the guest
 can't handle GSO :)
 
 If you want to support non-Linux guests (eg. bootloaders), you probably
 want to keep support for very dumb drivers with no mergable rxbufs
 though.

Yup, I'm planning on sending a version that supports older guests soonish.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: remove arbitrary minimum RAM limitation

2013-04-24 Thread Sasha Levin

On 04/24/2013 04:25 AM, Michael Ellerman wrote:
 On Tue, 2013-04-23 at 10:57 -0400, Sasha Levin wrote:
 We don't really need 64MB of RAM to boot, it's a nice default if we don't
 have anything else - but it's not actually required for anything:
 
 Nice, I am carrying something similar locally so I can boot in 4K :)

Feel free to send stuff you carry locally over! :)


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-net: fill only rx queues which are being used

2013-04-23 Thread Sasha Levin

On 04/23/2013 03:08 AM, Michael S. Tsirkin wrote:
 On Mon, Apr 22, 2013 at 08:35:36PM -0400, Sasha Levin wrote:
 Due to MQ support we may allocate a whole bunch of rx queues but
 never use them. With this patch we'll safe the space used by
 the receive buffers until they are actually in use:

 sh-4.2# free -h
  total   used   free sharedbuffers cached
 Mem:  490M35M   455M 0B 0B   4.1M
 -/+ buffers/cache:31M   459M
 Swap:   0B 0B 0B
 sh-4.2# ethtool -L eth0 combined 8
 sh-4.2# free -h
  total   used   free sharedbuffers cached
 Mem:  490M   162M   327M 0B 0B   4.1M
 -/+ buffers/cache:   158M   331M
 Swap:   0B 0B 0B

 Signed-off-by: Sasha Levin sasha.le...@oracle.com
 
 Overall the idea looks fine to me.
 
 I also ask myself whether we should enable multiqueue capability
 with big buffers. 130M extra memory seems excessive.
 Want to try on the kvmtools version that has mergeable buffers?
 Memory use should be much lower.

It is indeed, with mergable buffers:

sh-4.2# free -h
 total   used   free sharedbuffers cached
Mem:  490M18M   471M 0B 0B   4.1M
-/+ buffers/cache:14M   476M
Swap:   0B 0B 0B
sh-4.2# ethtool -L eth0 combined 8
sh-4.2# free -h
 total   used   free sharedbuffers cached
Mem:  490M26M   464M 0B 0B   4.1M
-/+ buffers/cache:22M   468M
Swap:   0B 0B 0B

(18MB? Nice! :) )


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: remove arbitrary minimum RAM limitation

2013-04-23 Thread Sasha Levin

We don't really need 64MB of RAM to boot, it's a nice default if we don't
have anything else - but it's not actually required for anything:

sh-4.2# free -h
 total   used   free sharedbuffers cached
Mem:   20M15M   4.2M 0B 0B   4.2M
-/+ buffers/cache:11M   8.3M
Swap:   0B 0B 0B

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/builtin-run.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index c6f5862..4d7fbf9d 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -563,9 +563,6 @@ static struct kvm *kvm_cmd_run_init(int argc, const char 
**argv)
if (!kvm-cfg.ram_size)
kvm-cfg.ram_size = get_ram_size(kvm-cfg.nrcpus);
 
-   if (kvm-cfg.ram_size  MIN_RAM_SIZE_MB)
-   die(Not enough memory specified: %lluMB (min %lluMB), 
kvm-cfg.ram_size, MIN_RAM_SIZE_MB);
-
if (kvm-cfg.ram_size  host_ram_size())
pr_warning(Guest memory size %lluMB exceeds host physical RAM 
size %lluMB, kvm-cfg.ram_size, host_ram_size());
 
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: virtio-net mergable rx buffers

2013-04-23 Thread Sasha Levin

(off list)

On 04/22/2013 08:32 PM, Sasha Levin wrote:
 Support mergable rx buffers for virtio-net. This helps reduce the amount
 of memory the guest kernel has to allocate per rx vq.

One of the benefits of this is that even one virtio-net device with just
one pair of vqs will require much less memory - so you could try booting
smaller guests than before!


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] virtio-net: fill only rx queues which are being used

2013-04-23 Thread Sasha Levin

Due to MQ support we may allocate a whole bunch of rx queues but
never use them. With this patch we'll safe the space used by
the receive buffers until they are actually in use:

sh-4.2# free -h
 total   used   free sharedbuffers cached
Mem:  490M35M   455M 0B 0B   4.1M
-/+ buffers/cache:31M   459M
Swap:   0B 0B 0B
sh-4.2# ethtool -L eth0 combined 8
sh-4.2# free -h
 total   used   free sharedbuffers cached
Mem:  490M   162M   327M 0B 0B   4.1M
-/+ buffers/cache:   158M   331M
Swap:   0B 0B 0B

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 drivers/net/virtio_net.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6bfc511..196e721 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -581,7 +581,7 @@ static void refill_work(struct work_struct *work)
bool still_empty;
int i;
 
-   for (i = 0; i  vi-max_queue_pairs; i++) {
+   for (i = 0; i  vi-curr_queue_pairs; i++) {
struct receive_queue *rq = vi-rq[i];
 
napi_disable(rq-napi);
@@ -636,7 +636,7 @@ static int virtnet_open(struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int i;
 
-   for (i = 0; i  vi-max_queue_pairs; i++) {
+   for (i = 0; i  vi-curr_queue_pairs; i++) {
/* Make sure we have some buffers: if oom use wq. */
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
schedule_delayed_work(vi-refill, 0);
@@ -900,6 +900,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 
queue_pairs)
struct scatterlist sg;
struct virtio_net_ctrl_mq s;
struct net_device *dev = vi-dev;
+   int i;
 
if (!vi-has_cvq || !virtio_has_feature(vi-vdev, VIRTIO_NET_F_MQ))
return 0;
@@ -912,8 +913,12 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 
queue_pairs)
dev_warn(dev-dev, Fail to set num of queue pairs to %d\n,
 queue_pairs);
return -EINVAL;
-   } else
+   } else {
+   for (i = vi-curr_queue_pairs; i  queue_pairs; i++)
+   if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
+   schedule_delayed_work(vi-refill, 0);
vi-curr_queue_pairs = queue_pairs;
+   }
 
return 0;
 }
@@ -1568,7 +1573,7 @@ static int virtnet_probe(struct virtio_device *vdev)
}
 
/* Last of all, set up some receive buffers. */
-   for (i = 0; i  vi-max_queue_pairs; i++) {
+   for (i = 0; i  vi-curr_queue_pairs; i++) {
try_fill_recv(vi-rq[i], GFP_KERNEL);
 
/* If we didn't even get one input buffer, we're useless. */
@@ -1692,7 +1697,7 @@ static int virtnet_restore(struct virtio_device *vdev)
 
netif_device_attach(vi-dev);
 
-   for (i = 0; i  vi-max_queue_pairs; i++)
+   for (i = 0; i  vi-curr_queue_pairs; i++)
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
schedule_delayed_work(vi-refill, 0);
 
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: virtio-net mergable rx buffers

2013-04-23 Thread Sasha Levin

On 04/23/2013 12:35 PM, Eric Northup wrote:
 Do you care about guests with drivers that don't negotiate
 VIRTIO_NET_F_MRG_RXBUF?

We usually try to keep backward compatibility, but in this case
mergable RX buffers are about 5 years old now, so it's safe to
assume they'll be running in any guest.

Unless there is a specific reason to allow working without them
I'd rather keep the code simple in this case.

 On Mon, Apr 22, 2013 at 5:32 PM, Sasha Levin sasha.le...@oracle.com wrote:
 +   copied = memcpy_toiovecend(iov, in, buffer, 
 len);
 +   len -= copied;
 +   hdr-num_buffers++;
 +   virt_queue__set_used_elem(vq, head, copied);
 +   if (len == 0)
 +   break;
 +   head = virt_queue__get_iov(vq, iov, out, 
 in, kvm);
 
 Need to check that virt_queue__available(vq) first?

Yup. I wonder why it didn't blow up running 'ping -f' with a huge packet size.


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio-net mq vq initialization

2013-04-22 Thread Sasha Levin

On 04/15/2013 01:58 AM, Jason Wang wrote:
 Initializing them only when they're actually needed will do the trick here.
 I don't see so much memory allocation with qemu, the main problem I
 guess here is kvmtool does not support mergeable rx buffers ( does it?
 ). So guest must allocate 64K per packet which would be even worse in
 multiqueue case. If mergeable rx buffer was used, the receive buffer
 will only occupy about 256 * 4K (1M) which seems pretty acceptable here.

Yup, kvmtool doesn't do mergable RX at the moment.

I'll add support for that in, thanks for pointing it out!


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: virtio-net mergable rx buffers

2013-04-22 Thread Sasha Levin

Support mergable rx buffers for virtio-net. This helps reduce the amount
of memory the guest kernel has to allocate per rx vq.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/uip.h  |  4 ++--
 tools/kvm/include/kvm/util.h |  3 +++
 tools/kvm/net/uip/core.c | 54 +---
 tools/kvm/net/uip/tcp.c  |  2 +-
 tools/kvm/net/uip/udp.c  |  2 +-
 tools/kvm/util/util.c| 15 
 tools/kvm/virtio/net.c   | 37 ++
 7 files changed, 55 insertions(+), 62 deletions(-)

diff --git a/tools/kvm/include/kvm/uip.h b/tools/kvm/include/kvm/uip.h
index ac248d2..fa82f10 100644
--- a/tools/kvm/include/kvm/uip.h
+++ b/tools/kvm/include/kvm/uip.h
@@ -252,7 +252,7 @@ struct uip_tcp_socket {
 };
 
 struct uip_tx_arg {
-   struct virtio_net_hdr *vnet;
+   struct virtio_net_hdr_mrg_rxbuf *vnet;
struct uip_info *info;
struct uip_eth *eth;
int vnet_len;
@@ -332,7 +332,7 @@ static inline u16 uip_eth_hdrlen(struct uip_eth *eth)
 }
 
 int uip_tx(struct iovec *iov, u16 out, struct uip_info *info);
-int uip_rx(struct iovec *iov, u16 in, struct uip_info *info);
+int uip_rx(unsigned char *buffer, u32 length, struct uip_info *info);
 int uip_init(struct uip_info *info);
 
 int uip_tx_do_ipv4_udp_dhcp(struct uip_tx_arg *arg);
diff --git a/tools/kvm/include/kvm/util.h b/tools/kvm/include/kvm/util.h
index 0df9f0d..6f8ac83 100644
--- a/tools/kvm/include/kvm/util.h
+++ b/tools/kvm/include/kvm/util.h
@@ -22,6 +22,7 @@
 #include sys/param.h
 #include sys/types.h
 #include linux/types.h
+#include sys/uio.h
 
 #ifdef __GNUC__
 #define NORETURN __attribute__((__noreturn__))
@@ -94,4 +95,6 @@ struct kvm;
 void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 
size);
 
+int memcpy_toiovecend(const struct iovec *iov, int iovlen, unsigned char 
*kdata, size_t len);
+
 #endif /* KVM__UTIL_H */
diff --git a/tools/kvm/net/uip/core.c b/tools/kvm/net/uip/core.c
index 4e5bb82..d9e9993 100644
--- a/tools/kvm/net/uip/core.c
+++ b/tools/kvm/net/uip/core.c
@@ -7,7 +7,7 @@
 
 int uip_tx(struct iovec *iov, u16 out, struct uip_info *info)
 {
-   struct virtio_net_hdr *vnet;
+   struct virtio_net_hdr_mrg_rxbuf *vnet;
struct uip_tx_arg arg;
int eth_len, vnet_len;
struct uip_eth *eth;
@@ -74,63 +74,21 @@ int uip_tx(struct iovec *iov, u16 out, struct uip_info 
*info)
return vnet_len + eth_len;
 }
 
-int uip_rx(struct iovec *iov, u16 in, struct uip_info *info)
+int uip_rx(unsigned char *buffer, u32 length, struct uip_info *info)
 {
-   struct virtio_net_hdr *vnet;
-   struct uip_eth *eth;
struct uip_buf *buf;
-   int vnet_len;
-   int eth_len;
-   char *p;
int len;
-   int cnt;
-   int i;
 
/*
 * Sleep until there is a buffer for guest
 */
buf = uip_buf_get_used(info);
 
-   /*
-* Fill device to guest buffer, vnet hdr fisrt
-*/
-   vnet_len = iov[0].iov_len;
-   vnet = iov[0].iov_base;
-   if (buf-vnet_len  vnet_len) {
-   len = -1;
-   goto out;
-   }
-   memcpy(vnet, buf-vnet, buf-vnet_len);
-
-   /*
-* Then, the real eth data
-* Note: Be sure buf-eth_len is not bigger than the buffer len that 
guest provides
-*/
-   cnt = buf-eth_len;
-   p = buf-eth;
-   for (i = 1; i  in; i++) {
-   eth_len = iov[i].iov_len;
-   eth = iov[i].iov_base;
-   if (cnt  eth_len) {
-   memcpy(eth, p, eth_len);
-   cnt -= eth_len;
-   p += eth_len;
-   } else {
-   memcpy(eth, p, cnt);
-   cnt -= cnt;
-   break;
-   }
-   }
-
-   if (cnt) {
-   pr_warning(uip_rx error);
-   len = -1;
-   goto out;
-   }
+   memcpy(buffer, buf-vnet, buf-vnet_len);
+   memcpy(buffer + buf-vnet_len, buf-eth, buf-eth_len);
 
len = buf-vnet_len + buf-eth_len;
 
-out:
uip_buf_set_free(info, buf);
return len;
 }
@@ -172,8 +130,8 @@ int uip_init(struct uip_info *info)
}
 
list_for_each_entry(buf, buf_head, list) {
-   buf-vnet   = malloc(sizeof(struct virtio_net_hdr));
-   buf-vnet_len   = sizeof(struct virtio_net_hdr);
+   buf-vnet   = malloc(sizeof(struct 
virtio_net_hdr_mrg_rxbuf));
+   buf-vnet_len   = sizeof(struct virtio_net_hdr_mrg_rxbuf);
buf-eth= malloc(1024*64 + sizeof(struct 
uip_pseudo_hdr));
buf-eth_len= 1024*64 + sizeof(struct uip_pseudo_hdr);
 
diff --git a/tools/kvm/net/uip/tcp.c b/tools/kvm/net/uip/tcp.c
index 9044f40..a99b95e 100644
--- a/tools/kvm/net/uip/tcp.c
+++ b

[PATCH] virtio-net: fill only rx queues which are being used

2013-04-22 Thread Sasha Levin

Due to MQ support we may allocate a whole bunch of rx queues but
never use them. With this patch we'll safe the space used by
the receive buffers until they are actually in use:

sh-4.2# free -h
 total   used   free sharedbuffers cached
Mem:  490M35M   455M 0B 0B   4.1M
-/+ buffers/cache:31M   459M
Swap:   0B 0B 0B
sh-4.2# ethtool -L eth0 combined 8
sh-4.2# free -h
 total   used   free sharedbuffers cached
Mem:  490M   162M   327M 0B 0B   4.1M
-/+ buffers/cache:   158M   331M
Swap:   0B 0B 0B

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 drivers/net/virtio_net.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6bfc511..4d82d17 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -581,7 +581,7 @@ static void refill_work(struct work_struct *work)
bool still_empty;
int i;
 
-   for (i = 0; i  vi-max_queue_pairs; i++) {
+   for (i = 0; i  vi-curr_queue_pairs; i++) {
struct receive_queue *rq = vi-rq[i];
 
napi_disable(rq-napi);
@@ -636,7 +636,7 @@ static int virtnet_open(struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int i;
 
-   for (i = 0; i  vi-max_queue_pairs; i++) {
+   for (i = 0; i  vi-curr_queue_pairs; i++) {
/* Make sure we have some buffers: if oom use wq. */
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
schedule_delayed_work(vi-refill, 0);
@@ -900,6 +900,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 
queue_pairs)
struct scatterlist sg;
struct virtio_net_ctrl_mq s;
struct net_device *dev = vi-dev;
+   int i;
 
if (!vi-has_cvq || !virtio_has_feature(vi-vdev, VIRTIO_NET_F_MQ))
return 0;
@@ -912,8 +913,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 
queue_pairs)
dev_warn(dev-dev, Fail to set num of queue pairs to %d\n,
 queue_pairs);
return -EINVAL;
-   } else
+   } else {
+   if (queue_pairs  vi-curr_queue_pairs)
+   for (i = 0; i  queue_pairs; i++)
+   if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
+   schedule_delayed_work(vi-refill, 0);
vi-curr_queue_pairs = queue_pairs;
+   }
 
return 0;
 }
@@ -1568,7 +1574,7 @@ static int virtnet_probe(struct virtio_device *vdev)
}
 
/* Last of all, set up some receive buffers. */
-   for (i = 0; i  vi-max_queue_pairs; i++) {
+   for (i = 0; i  vi-curr_queue_pairs; i++) {
try_fill_recv(vi-rq[i], GFP_KERNEL);
 
/* If we didn't even get one input buffer, we're useless. */
@@ -1692,7 +1698,7 @@ static int virtnet_restore(struct virtio_device *vdev)
 
netif_device_attach(vi-dev);
 
-   for (i = 0; i  vi-max_queue_pairs; i++)
+   for (i = 0; i  vi-curr_queue_pairs; i++)
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
schedule_delayed_work(vi-refill, 0);
 
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-net: fill only rx queues which are being used

2013-04-22 Thread Sasha Levin

On 04/23/2013 12:13 AM, Rusty Russell wrote:
 Sasha Levin sasha.le...@oracle.com writes:
 Due to MQ support we may allocate a whole bunch of rx queues but
 never use them. With this patch we'll safe the space used by
 the receive buffers until they are actually in use:
 
 Idea is good, implementation needs a tiny tweak:
 
 @@ -912,8 +913,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, 
 u16 queue_pairs)
  dev_warn(dev-dev, Fail to set num of queue pairs to %d\n,
   queue_pairs);
  return -EINVAL;
 -} else
 +} else {
 +if (queue_pairs  vi-curr_queue_pairs)
 +for (i = 0; i  queue_pairs; i++)
 +if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
 +schedule_delayed_work(vi-refill, 0);
  vi-curr_queue_pairs = queue_pairs;
 +}
  
  return 0;
  }
 
 You don't want to refill existing queues, so you don't need the if.
 
 for (i = vi-curr_queue_pairs; i  queue_pairs; i++) {
   if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
   schedule_delayed_work(vi-refill, 0);

That makes more sense, I'll resend.

 We don't free up buffers when we're reducing queues, but I consider that
 a corner case.

It didn't bother anyone up until now, and the spec doesn't state anything
about it - so I preferred to just leave that alone. Unless the ARM folks
would find it useful?


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: Add a Kconfig shortcut for a kvm-bootable kernel

2013-04-16 Thread Sasha Levin

On 04/16/2013 12:18 PM, Borislav Petkov wrote:
 On Sun, Apr 14, 2013 at 01:03:20PM +0200, Borislav Petkov wrote:
 On Sun, Apr 14, 2013 at 12:31:12PM +0300, Pekka Enberg wrote:
 I obviously support having something like this in mainline. I wonder
 though if we could just call this default standalone KVM guest
 config instead of emphasizing testing angle.

 /me nods agreeingly...

 And it should be unter HYPERVISOR_GUEST where the rest of this stuff
 resides. Good point.
 
 Sanity check question:
 
 Why not add the select stuff, i.e. this:
 
   select NET
   select NETDEVICES
   select PCI
   select BLOCK
   select BLK_DEV
   select NETWORK_FILESYSTEMS
   select INET
   select EXPERIMENTAL
   select TTY
   select SERIAL_8250
   select SERIAL_8250_CONSOLE
   select IP_PNP
   select IP_PNP_DHCP
   select BINFMT_ELF
   select PCI_MSI
   select HAVE_ARCH_KGDB
   select DEBUG_KERNEL
   select KGDB
   select KGDB_SERIAL_CONSOLE
   select VIRTUALIZATION
   select VIRTIO
   select VIRTIO_RING
   select VIRTIO_PCI
   select VIRTIO_BLK
   select VIRTIO_CONSOLE
   select VIRTIO_NET
   select 9P_FS
   select NET_9P
   select NET_9P_VIRTIO
 
 to the option below which we already have. It is in the same sense a KVM
 guest support deal.
 
 Hmm.
 
 KVM people, any objections?
 
 config KVM_GUEST
 bool KVM Guest support (including kvmclock)
 depends on PARAVIRT
 select PARAVIRT_CLOCK
 default y
 ---help---
   This option enables various optimizations for running under the KVM
   hypervisor. It includes a paravirtualized clock, so that instead
   of relying on a PIT (or probably other) emulation by the
   underlying device model, the host provides the guest with
   timing infrastructure such as time of day, and system time

KVM guests don't need a serial device, KGDB, DEBUG_KERNEL or 9p in particular.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio-net mq vq initialization

2013-04-14 Thread Sasha Levin

On 04/14/2013 06:01 AM, Michael S. Tsirkin wrote:
 On Sat, Apr 13, 2013 at 05:23:41PM -0400, Sasha Levin wrote:
 On 04/12/2013 07:36 AM, Rusty Russell wrote:
 Sasha Levin sasha.le...@oracle.com writes:
 On 04/11/2013 12:36 PM, Will Deacon wrote:
 Hello folks,

 Here's the latest round of ARM fixes and updates for kvmtool. Most of
 this is confined to the arm/ subdirectory, with the exception of a fix
 to the virtio-mmio vq definitions due to the multi-queue work from
 Sasha. I'm not terribly happy about that code though, since it seriously
 increases the memory footprint of the guest.

 Without multi-queue, we can boot Debian Wheezy to a prompt in 38MB. With
 the new changes, that increases to 170MB! Any chance we can try and tackle
 this regression please? I keep getting bitten by the OOM killer :(

 (cc Rusty, MST)

 The spec defines the operation of a virtio-net device with regards to 
 multiple
 queues as follows:

 
 Device Initialization

1. The initialization routine should identify the receive and 
 transmission
 virtqueues, up to N+1 of each kind. If VIRTIO_NET_F_MQ feature
 bit is negotiated, N=max_virtqueue_pairs-1, otherwise identify N=0.

[...]

5. Only receiveq0, transmitq0 and controlq are used by default. To use 
 more
 queues driver must negotiate the VIRTIO_NET_F_MQ feature; initialize
 up to max_virtqueue_pairs of each of transmit and receive queues; execute_
 VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command specifying
 the number of the transmit and receive queues that is going to be
 used and wait until the device consumes the controlq buffer and acks this
 command.
 

 And kvmtool follows that to the letter: It will initialize the maximum 
 amount of
 queues it can support during initialization and will start using them only 
 when
 the device tells it it should use them.

 As Will has stated, this causes a memory issue since all the data 
 structures that hold
 all possible queues get initialized regardless of whether we actually need 
 them or not,
 which is quite troublesome for systems with small RAM.


 Rusty, MST, would you be open to a spec and code change that would 
 initialize the
 RX/TX vqs on demand instead of on device initialization? Or is there an 
 easier way
 to work around this issue?

 I'm confused.  kvmtool is using too much memory, or the guest?  If
 kvmtool, the Device Initialization section above applies to the driver,
 not the device.  If the guest, well, the language says UP TO N+1.  You
 want a small guest, don't use them all.  Or any...

 What am I missing?

 It's in the guest - sorry. I was only trying to say that kvmtool doesn't do 
 anything
 odd with regards to initializing virtio-net.

 The thing is that there should be a difference between just allowing a 
 larger number
 of queues and actually using them (i.e. enabling them with ethtool). Right 
 now I see
 the kernel lose 130MB just by having kvmtool offer 8 queue pairs, without 
 actually
 using those queues.

 Yes, we can make it configurable in kvmtool (and I will make it so so the 
 arm folks
 could continue working with tiny guests) but does it make sense that you 
 have to do
 this configuration in *2* places? First in the hypervisor and then inside 
 the guest?

 Memory usage should ideally depend on whether you are actually using 
 multiple queues,
 not on whether you just allow using those queues.


 Thanks,
 Sasha
 
 8 queues eat up 130MB?  Most of the memory is likely for the buffers?  I
 think we could easily allocate these lazily as queues are enabled,
 without protocol changes. It's harder to clean them as there's no way to
 reset a specific queue, but maybe that' good enough for your purposes?
 

Yup, this is how it looks in the guest right after booting:

Without virtio-net mq:

# free
 total   used   free sharedbuffers cached
Mem:918112 158052 760060  0  0   4308
-/+ buffers/cache: 153744 764368

With queue pairs = 8:

# free
 total   used   free sharedbuffers cached
Mem:918112 289168 628944  0  0   4244
-/+ buffers/cache: 284924 633188


Initializing them only when they're actually needed will do the trick here.

We could also expose the facility to delete a single vq, and add a note to
the spec saying that if the amount of actual vq pairs was reduced below what
it was before, the now inactive queues become invalid and would need to be
re-initialized. It's not pretty but it would let both device and driver free
up those vqs.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio-net mq vq initialization

2013-04-14 Thread Sasha Levin

On 04/14/2013 11:53 AM, Michael S. Tsirkin wrote:
  
  Initializing them only when they're actually needed will do the trick here.
 Not initializing, adding the buffers. In the current spec, initialization
 is always done before DRIVER_OK.

Yeah, that's better, but we're going to need a spec change either way since even
adding the buffers is specifically stated to happen before DRIVER_OK:


6. The receive virtqueues should be filled with receive buffers. This 
is described
in detail below in “Setting Up Receive Buffers”.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] virtio-spec: document virtio-9p

2013-04-14 Thread Sasha Levin

Add basic documentation for virtio-9p. I can expand more on device operation,
but I don't think there's anything significant enough for the spec to be
mentioned there. Please let me know if I'm wrong.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 virtio-spec.lyx | 206 
 1 file changed, 206 insertions(+)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index c23a345..73a0567 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -57,6 +57,7 @@
 \html_css_as_file 0
 \html_be_strict false
 \author -1930653948 Amos Kong 
+\author -1728740334 Sasha Levin sasha.le...@oracle.com
 \author -875410574 Pawel Moll 
 \author -608949062 Rusty Russell,,, 
 \author -385801441 Cornelia Huck cornelia.h...@de.ibm.com
@@ -619,7 +620,13 @@ Appendix I
 \begin_inset Text
 
 \begin_layout Plain Layout
+
+\change_inserted -1728740334 1365526571
+Appendix J
+\change_deleted -1728740334 1365526568
 -
+\change_unchanged
+
 \end_layout
 
 \end_inset
@@ -9678,6 +9685,204 @@ For MMC devices (inquiry type 5) there would be some 
overlap between this
 \end_layout
 
 \end_deeper
+\end_deeper
+\begin_layout Chapter*
+
+\change_inserted -1728740334 1365526640
+Appendix J: 9p Transport
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1728740334 1365526873
+The virtio transport for 9p provides an interface to transfer 9p messages
+ between the device and the driver.
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted -1728740334 1365526612
+Configuration
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1728740334 1365526876
+Subsystem
+\begin_inset space ~
+\end_inset
+
+Device
+\begin_inset space ~
+\end_inset
+
+ID 9
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1728740334 1365526891
+Virtqueues 0:requests.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1728740334 1365527214
+Feature
+\begin_inset space ~
+\end_inset
+
+bits
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted -1728740334 1365529363
+VIRTIO_9P_MOUNT_TAG(0) The mount point is specified in the device 
configuration.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted -1728740334 1365527300
+Device
+\begin_inset space ~
+\end_inset
+
+configuration
+\begin_inset space ~
+\end_inset
+
+layout Both fields of this configuration are available only if 
VIRTIO_9P_MOUNT_T
+AG was negotiated:
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted -1728740334 1365527046
+
+struct virtio_9p_config {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1728740334 1365527050
+
+   /* length of the tag name */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1728740334 1365527055
+
+   __u16 tag_len;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1728740334 1365527056
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1728740334 1365527065
+
+   /* non-NULL terminated tag name */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1728740334 1365527072
+
+   __u8 tag[0];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted -1728740334 1365527072
+
+} __attribute__((packed));
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted -1728740334 1365527670
+tag_len is the length of the mount tag string.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted -1728740334 1365527700
+tag is the name of the moung tag.
+ The tag should not be terminated by NULL.
+\end_layout
+
+\end_deeper
+\begin_layout Section*
+
+\change_inserted -1728740334 1365526612
+Device Initialization
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted -1728740334 1365527786
+The requests virtqueue is initialized.
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted -1728740334 1365527828
+If VIRTIO_9P_MOUNT_TAG was negotiated:
+\end_layout
+
+\begin_deeper
+\begin_layout Enumerate
+
+\change_inserted -1728740334 1365527872
+the tag_len field is read from the configuration to acertain the length
+ of the mount tag.
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted -1728740334 1365527876
+tag_len bytes are read from the tag field and are used as the name of the
+ mount tag.
+\end_layout
+
+\end_deeper
+\begin_layout Enumerate
+
+\change_inserted -1728740334 1365527948
+The device is now ready to process requests.
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted -1728740334 1365526612
+Device Operation
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted -1728740334 1365529176
+When the driver wantes to send a 9P request it places the descriptor of
+ the PDU in the queue.
+ It will be processed by the device according to the 9P protocol specifications
+ (available at http://ericvh.github.io/9p-rfc/) and returned to the driver.
+\end_layout
+
 \begin_layout Chapter*
 Appendix X: virtio-mmio
 \end_layout
-- 
1.8.1.5

Re: virtio-net mq vq initialization

2013-04-13 Thread Sasha Levin

On 04/12/2013 07:36 AM, Rusty Russell wrote:
 Sasha Levin sasha.le...@oracle.com writes:
 On 04/11/2013 12:36 PM, Will Deacon wrote:
 Hello folks,

 Here's the latest round of ARM fixes and updates for kvmtool. Most of
 this is confined to the arm/ subdirectory, with the exception of a fix
 to the virtio-mmio vq definitions due to the multi-queue work from
 Sasha. I'm not terribly happy about that code though, since it seriously
 increases the memory footprint of the guest.

 Without multi-queue, we can boot Debian Wheezy to a prompt in 38MB. With
 the new changes, that increases to 170MB! Any chance we can try and tackle
 this regression please? I keep getting bitten by the OOM killer :(

 (cc Rusty, MST)

 The spec defines the operation of a virtio-net device with regards to 
 multiple
 queues as follows:

 
 Device Initialization

  1. The initialization routine should identify the receive and 
 transmission
 virtqueues, up to N+1 of each kind. If VIRTIO_NET_F_MQ feature
 bit is negotiated, N=max_virtqueue_pairs-1, otherwise identify N=0.

  [...]

  5. Only receiveq0, transmitq0 and controlq are used by default. To use 
 more
 queues driver must negotiate the VIRTIO_NET_F_MQ feature; initialize
 up to max_virtqueue_pairs of each of transmit and receive queues; execute_
 VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command specifying
 the number of the transmit and receive queues that is going to be
 used and wait until the device consumes the controlq buffer and acks this
 command.
 

 And kvmtool follows that to the letter: It will initialize the maximum 
 amount of
 queues it can support during initialization and will start using them only 
 when
 the device tells it it should use them.

 As Will has stated, this causes a memory issue since all the data structures 
 that hold
 all possible queues get initialized regardless of whether we actually need 
 them or not,
 which is quite troublesome for systems with small RAM.


 Rusty, MST, would you be open to a spec and code change that would 
 initialize the
 RX/TX vqs on demand instead of on device initialization? Or is there an 
 easier way
 to work around this issue?
 
 I'm confused.  kvmtool is using too much memory, or the guest?  If
 kvmtool, the Device Initialization section above applies to the driver,
 not the device.  If the guest, well, the language says UP TO N+1.  You
 want a small guest, don't use them all.  Or any...
 
 What am I missing?

It's in the guest - sorry. I was only trying to say that kvmtool doesn't do 
anything
odd with regards to initializing virtio-net.

The thing is that there should be a difference between just allowing a larger 
number
of queues and actually using them (i.e. enabling them with ethtool). Right now 
I see
the kernel lose 130MB just by having kvmtool offer 8 queue pairs, without 
actually
using those queues.

Yes, we can make it configurable in kvmtool (and I will make it so so the arm 
folks
could continue working with tiny guests) but does it make sense that you have 
to do
this configuration in *2* places? First in the hypervisor and then inside the 
guest?

Memory usage should ideally depend on whether you are actually using multiple 
queues,
not on whether you just allow using those queues.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: make virtio-net mq max queues configurable

2013-04-13 Thread Sasha Levin

This patch makes the maximum amount of vqs configurable. To use it pass a 'mq'
option to network device configuration. For example:

vm run -n mode=tap,mq=4

Will allow up to 4 queue pairs for that network device.

Note that not specifiying mq, or setting mq=0 will disable virtio-net
multiqueuing.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/virtio-net.h |  1 +
 tools/kvm/virtio/net.c | 19 +--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio-net.h 
b/tools/kvm/include/kvm/virtio-net.h
index db43d98..0f4d1e5 100644
--- a/tools/kvm/include/kvm/virtio-net.h
+++ b/tools/kvm/include/kvm/virtio-net.h
@@ -16,6 +16,7 @@ struct virtio_net_params {
int mode;
int vhost;
int fd;
+   int mq;
 };
 
 int virtio_net__init(struct kvm *kvm);
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 9911d77..c0a8f12 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -43,7 +43,7 @@ struct net_dev {
 
struct virt_queue   vqs[VIRTIO_NET_NUM_QUEUES * 2 + 1];
struct virtio_net_configconfig;
-   u32 features, rx_vqs, tx_vqs;
+   u32 features, rx_vqs, tx_vqs, queue_pairs;
 
pthread_t   io_thread[VIRTIO_NET_NUM_QUEUES * 2 + 
1];
struct mutexio_lock[VIRTIO_NET_NUM_QUEUES * 2 + 1];
@@ -159,7 +159,7 @@ static void *virtio_net_ctrl_thread(void *p)
u16 out, in, head;
struct net_dev *ndev = p;
struct kvm *kvm = ndev-kvm;
-   u32 id = ndev-config.max_virtqueue_pairs * 2;
+   u32 id = ndev-queue_pairs * 2;
struct virt_queue *vq = ndev-vqs[id];
struct virtio_net_ctrl_hdr *ctrl;
virtio_net_ctrl_ack *ack;
@@ -197,7 +197,7 @@ static void *virtio_net_ctrl_thread(void *p)
 
 static void virtio_net_handle_callback(struct kvm *kvm, struct net_dev *ndev, 
int queue)
 {
-   if (queue = (ndev-config.max_virtqueue_pairs * 2 + 1)) {
+   if ((u32)queue = (ndev-queue_pairs * 2 + 1)) {
pr_warning(Unknown queue index %u, queue);
return;
}
@@ -334,6 +334,8 @@ static u8 *get_config(struct kvm *kvm, void *dev)
 
 static u32 get_host_features(struct kvm *kvm, void *dev)
 {
+   struct net_dev *ndev = dev;
+
return 1UL  VIRTIO_NET_F_MAC
| 1UL  VIRTIO_NET_F_CSUM
| 1UL  VIRTIO_NET_F_HOST_UFO
@@ -345,7 +347,7 @@ static u32 get_host_features(struct kvm *kvm, void *dev)
| 1UL  VIRTIO_RING_F_EVENT_IDX
| 1UL  VIRTIO_RING_F_INDIRECT_DESC
| 1UL  VIRTIO_NET_F_CTRL_VQ
-   | 1UL  VIRTIO_NET_F_MQ;
+   | 1UL  (ndev-queue_pairs  1 ? VIRTIO_NET_F_MQ : 0);
 }
 
 static void set_guest_features(struct kvm *kvm, void *dev, u32 features)
@@ -376,7 +378,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
mutex_init(ndev-io_lock[vq]);
pthread_cond_init(ndev-io_cond[vq], NULL);
if (ndev-vhost_fd == 0) {
-   if (vq == (u32)(ndev-config.max_virtqueue_pairs * 2))
+   if (vq == (u32)(ndev-queue_pairs * 2))
pthread_create(ndev-io_thread[vq], NULL, 
virtio_net_ctrl_thread, ndev);
else if (vq  1)
pthread_create(ndev-io_thread[vq], NULL, 
virtio_net_tx_thread, ndev);
@@ -574,6 +576,8 @@ static int set_net_param(struct kvm *kvm, struct 
virtio_net_params *p,
p-vhost = atoi(val);
} else if (strcmp(param, fd) == 0) {
p-fd = atoi(val);
+   } else if (strcmp(param, mq) == 0) {
+   p-mq = atoi(val);
} else
die(Unknown network parameter %s, param);
 
@@ -643,8 +647,11 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
ndev-kvm = params-kvm;
 
mutex_init(ndev-mutex);
+   ndev-queue_pairs = max(1, min(VIRTIO_NET_NUM_QUEUES, params-mq));
ndev-config.status = VIRTIO_NET_S_LINK_UP;
-   ndev-config.max_virtqueue_pairs = VIRTIO_NET_NUM_QUEUES;
+   if (ndev-queue_pairs  1)
+   ndev-config.max_virtqueue_pairs = ndev-queue_pairs;
+
for (i = 0 ; i  6 ; i++) {
ndev-config.mac[i] = params-guest_mac[i];
ndev-info.guest_mac.addr[i]= params-guest_mac[i];
-- 
1.8.1.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Usual batch of random ARM fixes for kvmtool

2013-04-11 Thread Sasha Levin

On 04/11/2013 12:36 PM, Will Deacon wrote:
 Without multi-queue, we can boot Debian Wheezy to a prompt in 38MB. With
 the new changes, that increases to 170MB! Any chance we can try and tackle
 this regression please? I keep getting bitten by the OOM killer :(

That's definitely unwanted.

I'll look into it and try sending something out today/tomorrow.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvmtool : [PATCH] PowerPc : Fix compilation for ppc64

2013-04-11 Thread Sasha Levin

On 04/11/2013 12:53 PM, Prerna Saxena wrote:
 On 04/10/2013 09:05 PM, Sasha Levin wrote:
 Hm, what would LD create before this patch? I thought that the default
 would be to create a binary that corresponds to the platform you're
 building in, so if you build on ppc64 you'd get ppc64 binaries, no?

 
 Hi Sasha,
 Thanks for the prompt response.
 Powerpc had historically supported 32 bit userspace on a 64 bit kernel,
 before everything moved 64 bit.
 
 I'd hit this issue since the default output of 'ld' was turning out to
 be 'elf32-powerpc' on my ppc64 build machine. This was running ld-2.22.
 I found that adding '--oformat=elf64-powerpc' to the Makefile helped me
 tide over it, so I sent a patch to that end.
 Today, I verified on another ppc64 machine that ld is automatically
 choosing 'elf64-powerpc'. This machine is running 'ld-2.23'
 
 So, this patch can be ignored, since it appears to be a toolchain
 dependency. Or, we could put it in place, to ensure kvmtool builds dont
 break even if the toolchain is not perfectly configured.
 As you suggest :)

What worries me with about this patch is that it will break build on 32bit
machines.

I don't know if those are even supported these days or not, but if they are -
we need something different to handle that.


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

virtio-net mq vq initialization (was: [PATCH 0/5] Usual batch of random ARM fixes for kvmtool)

2013-04-11 Thread Sasha Levin

On 04/11/2013 12:36 PM, Will Deacon wrote:
 Hello folks,
 
 Here's the latest round of ARM fixes and updates for kvmtool. Most of
 this is confined to the arm/ subdirectory, with the exception of a fix
 to the virtio-mmio vq definitions due to the multi-queue work from
 Sasha. I'm not terribly happy about that code though, since it seriously
 increases the memory footprint of the guest.
 
 Without multi-queue, we can boot Debian Wheezy to a prompt in 38MB. With
 the new changes, that increases to 170MB! Any chance we can try and tackle
 this regression please? I keep getting bitten by the OOM killer :(

(cc Rusty, MST)

The spec defines the operation of a virtio-net device with regards to multiple
queues as follows:


Device Initialization

1. The initialization routine should identify the receive and 
transmission
virtqueues, up to N+1 of each kind. If VIRTIO_NET_F_MQ feature
bit is negotiated, N=max_virtqueue_pairs-1, otherwise identify N=0.

[...]

5. Only receiveq0, transmitq0 and controlq are used by default. To use 
more
queues driver must negotiate the VIRTIO_NET_F_MQ feature; initialize
up to max_virtqueue_pairs of each of transmit and receive queues; execute_
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command specifying
the number of the transmit and receive queues that is going to be
used and wait until the device consumes the controlq buffer and acks this
command.


And kvmtool follows that to the letter: It will initialize the maximum amount of
queues it can support during initialization and will start using them only when
the device tells it it should use them.

As Will has stated, this causes a memory issue since all the data structures 
that hold
all possible queues get initialized regardless of whether we actually need them 
or not,
which is quite troublesome for systems with small RAM.


Rusty, MST, would you be open to a spec and code change that would initialize 
the
RX/TX vqs on demand instead of on device initialization? Or is there an easier 
way
to work around this issue?


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] kvm tools: Increase amount of possible interrupts per PCI device

2013-04-08 Thread Sasha Levin

We've limited the amount of interrupts per PCI device to be the maximum amount 
of VQs.

Now that the maximum amount of VQs has increased, time to increase this as well.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/include/kvm/virtio-pci.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/include/kvm/virtio-pci.h 
b/tools/kvm/include/kvm/virtio-pci.h
index 6d9a558..9b06392 100644
--- a/tools/kvm/include/kvm/virtio-pci.h
+++ b/tools/kvm/include/kvm/virtio-pci.h
@@ -6,7 +6,7 @@
 
 #include linux/types.h
 
-#define VIRTIO_PCI_MAX_VQ  3
+#define VIRTIO_PCI_MAX_VQ  32
 #define VIRTIO_PCI_MAX_CONFIG  1
 
 struct kvm;
-- 
1.8.1.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] kvm-tools: support virtio-net ctrl queue

2013-04-08 Thread Sasha Levin

We don't need much out of it at this point, but we will need it for virtio-net
mq patch.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/net.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 68bd107..253d167 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -27,9 +27,10 @@
 #include sys/eventfd.h
 
 #define VIRTIO_NET_QUEUE_SIZE  256
-#define VIRTIO_NET_NUM_QUEUES  2
+#define VIRTIO_NET_NUM_QUEUES  3
 #define VIRTIO_NET_RX_QUEUE0
 #define VIRTIO_NET_TX_QUEUE1
+#define VIRTIO_NET_CTRL_QUEUE  2
 
 struct net_dev;
 
@@ -144,6 +145,31 @@ static void *virtio_net_tx_thread(void *p)
 
 }
 
+static void virtio_net_handle_ctrl(struct kvm *kvm, struct net_dev *ndev)
+{
+   struct iovec iov[VIRTIO_NET_QUEUE_SIZE];
+   u16 out, in, head;
+   struct virtio_net_ctrl_hdr *ctrl;
+   virtio_net_ctrl_ack *ack;
+
+   head = virt_queue__get_iov(ndev-vqs[VIRTIO_NET_CTRL_QUEUE], iov, 
out, in, kvm);
+   ctrl = iov[0].iov_base;
+   ack = iov[out].iov_base;
+
+   switch (ctrl-class) {
+   default:
+   *ack = VIRTIO_NET_ERR;
+   break;
+   }
+
+   virt_queue__set_used_elem(ndev-vqs[VIRTIO_NET_CTRL_QUEUE], head, 
iov[out].iov_len);
+
+   if (virtio_queue__should_signal(ndev-vqs[VIRTIO_NET_CTRL_QUEUE]))
+   ndev-vdev.ops-signal_vq(kvm, ndev-vdev, 
VIRTIO_NET_CTRL_QUEUE);
+
+   return;
+}
+
 static void virtio_net_handle_callback(struct kvm *kvm, struct net_dev *ndev, 
int queue)
 {
switch (queue) {
@@ -157,6 +183,9 @@ static void virtio_net_handle_callback(struct kvm *kvm, 
struct net_dev *ndev, in
pthread_cond_signal(ndev-io_rx_cond);
mutex_unlock(ndev-io_rx_lock);
break;
+   case VIRTIO_NET_CTRL_QUEUE:
+   virtio_net_handle_ctrl(kvm, ndev);
+   break;
default:
pr_warning(Unknown queue index %u, queue);
}
@@ -310,7 +339,8 @@ static u32 get_host_features(struct kvm *kvm, void *dev)
| 1UL  VIRTIO_NET_F_GUEST_TSO4
| 1UL  VIRTIO_NET_F_GUEST_TSO6
| 1UL  VIRTIO_RING_F_EVENT_IDX
-   | 1UL  VIRTIO_RING_F_INDIRECT_DESC;
+   | 1UL  VIRTIO_RING_F_INDIRECT_DESC
+   | 1UL  VIRTIO_NET_F_CTRL_VQ;
 }
 
 static void set_guest_features(struct kvm *kvm, void *dev, u32 features)
-- 
1.8.1.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] kvm tools: virtio-net multiqueue support

2013-04-08 Thread Sasha Levin

This patch adds support for multiple virtio-net queues. Each queue gets
assigned to it's own corresponding rx/tx thread.

The max amount of queue pairs is 8 right now, it's up to the guest
to decide the actual amount of queues it wants using ethtool. For
example, if I want 2 queue pairs on eth0:

ethtool -L eth0 combined 2

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/net.c | 155 +++--
 1 file changed, 84 insertions(+), 71 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 253d167..9911d77 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -27,10 +27,7 @@
 #include sys/eventfd.h
 
 #define VIRTIO_NET_QUEUE_SIZE  256
-#define VIRTIO_NET_NUM_QUEUES  3
-#define VIRTIO_NET_RX_QUEUE0
-#define VIRTIO_NET_TX_QUEUE1
-#define VIRTIO_NET_CTRL_QUEUE  2
+#define VIRTIO_NET_NUM_QUEUES  8
 
 struct net_dev;
 
@@ -44,17 +41,13 @@ struct net_dev {
struct virtio_devicevdev;
struct list_headlist;
 
-   struct virt_queue   vqs[VIRTIO_NET_NUM_QUEUES];
+   struct virt_queue   vqs[VIRTIO_NET_NUM_QUEUES * 2 + 1];
struct virtio_net_configconfig;
-   u32 features;
+   u32 features, rx_vqs, tx_vqs;
 
-   pthread_t   io_rx_thread;
-   struct mutexio_rx_lock;
-   pthread_cond_t  io_rx_cond;
-
-   pthread_t   io_tx_thread;
-   struct mutexio_tx_lock;
-   pthread_cond_t  io_tx_cond;
+   pthread_t   io_thread[VIRTIO_NET_NUM_QUEUES * 2 + 
1];
+   struct mutexio_lock[VIRTIO_NET_NUM_QUEUES * 2 + 1];
+   pthread_cond_t  io_cond[VIRTIO_NET_NUM_QUEUES * 2 + 1];
 
int vhost_fd;
int tap_fd;
@@ -79,17 +72,22 @@ static void *virtio_net_rx_thread(void *p)
u16 out, in;
u16 head;
int len;
+   u32 id;
+
+   mutex_lock(ndev-mutex);
+   id = ndev-rx_vqs++ * 2;
+   mutex_unlock(ndev-mutex);
 
kvm__set_thread_name(virtio-net-rx);
 
kvm = ndev-kvm;
-   vq = ndev-vqs[VIRTIO_NET_RX_QUEUE];
+   vq = ndev-vqs[id];
 
while (1) {
-   mutex_lock(ndev-io_rx_lock);
+   mutex_lock(ndev-io_lock[id]);
if (!virt_queue__available(vq))
-   pthread_cond_wait(ndev-io_rx_cond, 
ndev-io_rx_lock.mutex);
-   mutex_unlock(ndev-io_rx_lock);
+   pthread_cond_wait(ndev-io_cond[id], 
ndev-io_lock[id].mutex);
+   mutex_unlock(ndev-io_lock[id]);
 
while (virt_queue__available(vq)) {
head = virt_queue__get_iov(vq, iov, out, in, kvm);
@@ -97,9 +95,8 @@ static void *virtio_net_rx_thread(void *p)
virt_queue__set_used_elem(vq, head, len);
 
/* We should interrupt guest right now, otherwise 
latency is huge. */
-   if 
(virtio_queue__should_signal(ndev-vqs[VIRTIO_NET_RX_QUEUE]))
-   ndev-vdev.ops-signal_vq(kvm, ndev-vdev,
-  VIRTIO_NET_RX_QUEUE);
+   if (virtio_queue__should_signal(vq))
+   ndev-vdev.ops-signal_vq(kvm, ndev-vdev, id);
}
}
 
@@ -117,17 +114,22 @@ static void *virtio_net_tx_thread(void *p)
u16 out, in;
u16 head;
int len;
+   u32 id;
+
+   mutex_lock(ndev-mutex);
+   id = ndev-tx_vqs++ * 2 + 1;
+   mutex_unlock(ndev-mutex);
 
kvm__set_thread_name(virtio-net-tx);
 
kvm = ndev-kvm;
-   vq = ndev-vqs[VIRTIO_NET_TX_QUEUE];
+   vq = ndev-vqs[id];
 
while (1) {
-   mutex_lock(ndev-io_tx_lock);
+   mutex_lock(ndev-io_lock[id]);
if (!virt_queue__available(vq))
-   pthread_cond_wait(ndev-io_tx_cond, 
ndev-io_tx_lock.mutex);
-   mutex_unlock(ndev-io_tx_lock);
+   pthread_cond_wait(ndev-io_cond[id], 
ndev-io_lock[id].mutex);
+   mutex_unlock(ndev-io_lock[id]);
 
while (virt_queue__available(vq)) {
head = virt_queue__get_iov(vq, iov, out, in, kvm);
@@ -135,8 +137,8 @@ static void *virtio_net_tx_thread(void *p)
virt_queue__set_used_elem(vq, head, len);
}
 
-   if 
(virtio_queue__should_signal(ndev-vqs[VIRTIO_NET_TX_QUEUE]))
-   ndev-vdev.ops-signal_vq(kvm, ndev-vdev, 
VIRTIO_NET_TX_QUEUE);
+   if (virtio_queue__should_signal(vq))
+   ndev

Re: [PATCH v4 3/6] KVM: Initialize irqfd from kvm_init().

2013-04-02 Thread Sasha Levin

On 02/28/2013 04:22 AM, Cornelia Huck wrote:
 Currently, eventfd introduces module_init/module_exit functions
 to initialize/cleanup the irqfd workqueue. This only works, however,
 if no other module_init/module_exit functions are built into the
 same module.
 
 Let's just move the initialization and cleanup to kvm_init and kvm_exit.
 This way, it is also clearer where kvm startup may fail.
 
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com

I'm seeing this during boot:

[6.763302] [ cut here ]
[6.763763] WARNING: at kernel/workqueue.c:4204 
destroy_workqueue+0x1df/0x3d0()
[6.764507] Modules linked in:
[6.764792] Pid: 1, comm: swapper/0 Tainted: GW
3.9.0-rc5-next-20130402-sasha-00015-g3522ec5 #324
[6.765654] Call Trace:
[6.765875]  [811074fb] warn_slowpath_common+0x8b/0xc0
[6.766436]  [81107545] warn_slowpath_null+0x15/0x20
[6.766947]  [8112ca7f] destroy_workqueue+0x1df/0x3d0
[6.768631]  [8100d880] kvm_irqfd_exit+0x10/0x20
[6.77]  [81004dbb] kvm_init+0x2ab/0x310
[6.770607]  [86183dc0] ? cpu_has_kvm_support+0x4d/0x4d
[6.771241]  [86183fb4] vmx_init+0x1f4/0x437
[6.771709]  [86183dc0] ? cpu_has_kvm_support+0x4d/0x4d
[6.772266]  [810020f2] do_one_initcall+0xb2/0x1b0
[6.772995]  [86180021] kernel_init_freeable+0x15d/0x1ef
[6.773857]  [8617f801] ? loglevel+0x31/0x31
[6.774609]  [83d51230] ? rest_init+0x140/0x140
[6.775551]  [83d51239] kernel_init+0x9/0xf0
[6.776162]  [83dbf37c] ret_from_fork+0x7c/0xb0
[6.776662]  [83d51230] ? rest_init+0x140/0x140
[6.777241] ---[ end trace 10bba684ced4346a ]---

And I think it has something to do with this patch.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: remove unneeded checks in qcow code

2012-12-20 Thread Sasha Levin

We already know q!=NULL at that point, no need to retest.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/disk/qcow.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index ee2992e..64a2550 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -1361,8 +1361,7 @@ free_header:
if (q-header)
free(q-header);
 free_qcow:
-   if (q)
-   free(q);
+   free(q);
 
return NULL;
 }
@@ -1492,8 +1491,7 @@ free_header:
if (q-header)
free(q-header);
 free_qcow:
-   if (q)
-   free(q);
+   free(q);
 
return NULL;
 }
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: correctly compare pointer with NULL instead of 0

2012-12-20 Thread Sasha Levin

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/hw/pci-shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c
index 00e5d93..ec3f771 100644
--- a/tools/kvm/hw/pci-shmem.c
+++ b/tools/kvm/hw/pci-shmem.c
@@ -356,7 +356,7 @@ int pci_shmem__init(struct kvm *kvm)
char *mem;
int r;
 
-   if (shmem_region == 0)
+   if (shmem_region == NULL)
return 0;
 
/* Register good old INTx */
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: remove unneeded check from disk code

2012-12-20 Thread Sasha Levin

We already know 'disk' is non-null.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/disk/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c
index dd59751..4e9bda0 100644
--- a/tools/kvm/disk/core.c
+++ b/tools/kvm/disk/core.c
@@ -105,7 +105,7 @@ struct disk_image *disk_image__new(int fd, u64 size,
}
 
 #ifdef CONFIG_HAS_AIO
-   if (disk) {
+   {
pthread_t thread;
 
disk-evt = eventfd(0, 0);
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/8] rbtree: include linux/compiler.h for definition of __always_inline

2012-11-22 Thread Sasha Levin

On 11/22/2012 10:58 AM, Will Deacon wrote:
 Commit 29fc7c5a4f516d388fb6e1f6d24bfb04b8093e54 upstream.
 
 rb_erase_augmented() is a static function annotated with
 __always_inline.  This causes a compile failure when attempting to use
 the rbtree implementation as a library (e.g.  kvm tool):
 
   rbtree_augmented.h:125:24: error: expected `=', `,', `;', `asm' or 
 `__attribute__' before `void'

On a side note, our rbtree-interval is broken at the moment due to kernel side
changing the implementation and (IMO) breaking augmented rbtrees, followed
by several patches in our own code that tried to fix the breakage but haven't
identified the problem correctly - leading to more subtle breakage.

If you see things broken with mmio, that might be the reason.

I have a fix for that which goes to both kernel and our code, but I'm
waiting to verify that the kernel side is indeed broken.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 13/16] kvm tools: keep track of registered memory banks in struct kvm

2012-11-20 Thread Sasha Levin

On 11/20/2012 12:15 PM, Will Deacon wrote:
 Hi Sasha,
 
 On Tue, Nov 13, 2012 at 04:37:38AM +, Sasha Levin wrote:
 On 11/12/2012 06:57 AM, Will Deacon wrote:
  struct kvm {
 struct kvm_arch arch;
 struct kvm_config   cfg;
 @@ -49,6 +56,7 @@ struct kvm {
 u64 ram_size;
 void*ram_start;
 u64 ram_pagesize;
 +   struct list_headmem_banks;

 These memory banks actually look like a perfect example to use our augmented 
 interval rb-tree,
 can we switch them to use it, or is it a list on purpose?
 
 I found some time to look at this today but unfortunately they're not as
 ideally suited to the interval tree as they look: the problem being that we
 need to search for banks by both host virtual address *and* guest physical
 address depending on the translation that we're doing.
 
 We could have two separate tress, but that seems like overkill given the
 likely number of banks.

Makes sense. We can convert it later if we need to as well.


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: disable LTO by default

2012-11-19 Thread Sasha Levin

A bug only seen with LTO enabled was reported by Ron Minnich. Since the issue
appears to be a linker issue, we disable LTO by default until it's more stable.

We can still run LTO builds by setting LTO=1.

Reported-by: Ron Minnich rminn...@gmail.com
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/Makefile | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index c105de1..de11060 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -221,9 +221,11 @@ ifeq ($(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO) -static),y)
LIBS_STATOPT+= -laio
 endif
 
-FLAGS_LTO := -flto
-ifeq ($(call try-cc,$(SOURCE_HELLO),$(FLAGS_LTO)),y)
-   CFLAGS  += $(FLAGS_LTO)
+ifeq ($(LTO),1)
+   FLAGS_LTO := -flto
+   ifeq ($(call try-cc,$(SOURCE_HELLO),$(FLAGS_LTO)),y)
+   CFLAGS  += $(FLAGS_LTO)
+   endif
 endif
 
 ifneq ($(call try-build,$(SOURCE_STATIC),-static,),y)
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 13/16] kvm tools: keep track of registered memory banks in struct kvm

2012-11-13 Thread Sasha Levin

On 11/13/2012 07:16 AM, Will Deacon wrote:
 On Tue, Nov 13, 2012 at 04:37:38AM +, Sasha Levin wrote:
 On 11/12/2012 06:57 AM, Will Deacon wrote:
 +struct kvm_mem_bank {
 +   struct list_headlist;
 +   unsigned long   guest_phys_addr;
 +   void*host_addr;
 +   unsigned long   size;
 +};

 Can we just reuse struct kvm_userspace_memory_region here? We're also using 
 different
 data types for guest_phys_addr and size than whats in 
 kvm_userspace_memory_region - that
 can't be good.
 
 I looked briefly at doing that when I wrote the multi-bank stuff, but I hit
 a couple of issues:
 
   - kvmtool itself tends to use void * for host addresses, rather than
 the __u64 userspace_addr in kvm_userspace_memory_region
 
   - kvm_userspace_memory_region is a superset of what we need (not the
 end of the world I guess)
 
 so you end up casting address types a fair amount. Still, I'll revisit it
 and see if I can come up with something cleaner.

That's a good point. We used void* while the kernel side is using u64, which
looks odd.

In that case, let's get everything moved to u64 (obviously not in the scope of
this patch series).


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 08/16] kvm tools: add generic device registration mechanism

2012-11-12 Thread Sasha Levin

On 11/12/2012 06:57 AM, Will Deacon wrote:
 PCI devices are currently registered into the pci_devices array via the
 pci__register function, which can then be indexed later by architecture
 code to construct device tree nodes. For MMIO devices, there is no such
 utility.
 
 Rather than invent a similar mechanism for MMIO, this patch creates a
 global device registration mechanism, which allows the device type to be
 specified when registered or indexing a device. Current users of the pci
 registration code are migrated to the new infrastructure and virtio MMIO
 devices are registered at init time.
 
 Signed-off-by: Will Deacon will.dea...@arm.com
 ---
  tools/kvm/Makefile  |1 +
  tools/kvm/devices.c |   24 +
  tools/kvm/hw/pci-shmem.c|8 ++-
  tools/kvm/hw/vesa.c |8 ++-
  tools/kvm/include/kvm/devices.h |   21 ++
  tools/kvm/include/kvm/pci.h |2 -
  tools/kvm/include/kvm/virtio-mmio.h |1 +
  tools/kvm/include/kvm/virtio-pci.h  |2 +
  tools/kvm/pci.c |   40 --
  tools/kvm/powerpc/irq.c |3 +-
  tools/kvm/powerpc/spapr_pci.c   |2 +-
  tools/kvm/virtio/mmio.c |7 ++
  tools/kvm/virtio/pci.c  |7 +-
  13 files changed, 93 insertions(+), 33 deletions(-)
  create mode 100644 tools/kvm/devices.c
  create mode 100644 tools/kvm/include/kvm/devices.h
 
 diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
 index c105de1..5da416f 100644
 --- a/tools/kvm/Makefile
 +++ b/tools/kvm/Makefile
 @@ -50,6 +50,7 @@ OBJS+= builtin-run.o
  OBJS += builtin-setup.o
  OBJS += builtin-stop.o
  OBJS += builtin-version.o
 +OBJS += devices.o
  OBJS += disk/core.o
  OBJS += framebuffer.o
  OBJS += guest_compat.o
 diff --git a/tools/kvm/devices.c b/tools/kvm/devices.c
 new file mode 100644
 index 000..f9666b9
 --- /dev/null
 +++ b/tools/kvm/devices.c
 @@ -0,0 +1,24 @@
 +#include kvm/devices.h
 +#include kvm/kvm.h
 +
 +#include linux/err.h
 +
 +static struct device_header *devices[KVM_MAX_DEVICES];

Does it really have a hard limit at KVM_MAX_DEVICES? Or can we turn it into
something more dynamic (list/tree/whatever)?


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 13/16] kvm tools: keep track of registered memory banks in struct kvm

2012-11-12 Thread Sasha Levin

On 11/12/2012 06:57 AM, Will Deacon wrote:
 When registering memory banks for a guest, it is useful to keep the
 range information around for translating between guest and host address
 spaces.
 
 This patch adds a list of kvm_mem_bank structures to struct kvm, which
 is updated when a new bank is registered.
 
 Signed-off-by: Will Deacon will.dea...@arm.com
 ---
  tools/kvm/include/kvm/kvm.h |8 
  tools/kvm/kvm.c |   23 ++-
  2 files changed, 30 insertions(+), 1 deletions(-)
 
 diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
 index cf959ea..96dd81b 100644
 --- a/tools/kvm/include/kvm/kvm.h
 +++ b/tools/kvm/include/kvm/kvm.h
 @@ -35,6 +35,13 @@ struct kvm_ext {
   int code;
  };
  
 +struct kvm_mem_bank {
 + struct list_headlist;
 + unsigned long   guest_phys_addr;
 + void*host_addr;
 + unsigned long   size;
 +};

Can we just reuse struct kvm_userspace_memory_region here? We're also using 
different
data types for guest_phys_addr and size than whats in 
kvm_userspace_memory_region - that
can't be good.

  struct kvm {
   struct kvm_arch arch;
   struct kvm_config   cfg;
 @@ -49,6 +56,7 @@ struct kvm {
   u64 ram_size;
   void*ram_start;
   u64 ram_pagesize;
 + struct list_headmem_banks;

These memory banks actually look like a perfect example to use our augmented 
interval rb-tree,
can we switch them to use it, or is it a list on purpose?


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 14/16] kvm tools: teach guest_flat_to_host about memory banks starting above 0

2012-11-12 Thread Sasha Levin

On 11/12/2012 06:57 AM, Will Deacon wrote:
 Running a guest with multiple banks of memory based above 0 causes the
 guest_flat_to_host address conversion to fail, as it is assumed that
 guest memory addresses are offset linearly from 0.
 
 This patch changes the translation function so that the kvm_mem_bank
 structures registered by kvm__register_mem are used to translate guest
 addresses, rather than use an offset from the start of host memory.
 
 Signed-off-by: Will Deacon will.dea...@arm.com
 ---

Hmm... we probably need something to teach E820 about this new memory layout
thingie if we start mapping multiple memory banks.

Not urgent for now though - I guess.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11] kvm: notify host when the guest is panicked

2012-11-09 Thread Sasha Levin

On Mon, Nov 5, 2012 at 8:58 PM, Hu Tao hu...@cn.fujitsu.com wrote:
 But in the case of panic notification, more dependency means more
 chances of failure of panic notification. Say, if we use a virtio device
 to do panic notification, then we will fail if: virtio itself has
 problems, virtio for some reason can't be deployed(neither built-in or
 as a module), or guest doesn't support virtio, etc.

Add polling to your virtio device. If it didn't notify of a panic but
taking more than 20 sec to answer your poll request you can assume
it's dead.

Actually, just use virtio-serial and something in userspace on the guest.

 We choose IO because compared to virtio device, it is not that heavy and
 less problematic.

Less problematic? Heavy? Are there any known issues with virtio that
should be fixed? You make virtio sound like an old IDE drive or
something.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11] kvm: notify host when the guest is panicked

2012-10-30 Thread Sasha Levin

On Tue, Oct 30, 2012 at 9:48 PM, Wen Congyang we...@cn.fujitsu.com wrote:
 At 10/31/2012 09:12 AM, Marcelo Tosatti Wrote:
 It has been asked earlier why a simple virtio device is not usable
 for this (with no response IIRC).

 1. We can't use virtio device when the kernel is booting.

So the issue here is the small window between the point the guest
becomes self aware and to the point virtio drivers are loaded,
right?

I agree that if something happens during that interval, a
virtio-notifier driver won't catch that, but anything beyond that is
better done with a virtio driver, so how is the generic infrastructure
added in this patch useful to anything beyond detecting panics in that
initial interval?

 2. The virtio's driver can be built as a module, and if it is not loaded
and the kernel is panicked, there is no way to notify the host.

Even if the suggested virtio-notifier driver is built as a module, it
would get auto-loaded when the guest is booting, so I'm not sure about
this point?

 3. I/O port is more reliable than virtio device.
If virtio's driver has some bug, and it cause kernel panicked, we can't
use it. The I/O port is more reliable because it only depends on notifier
chain(If we use virtio device, it also depends on notifier chain).

This is like suggesting that we let KVM emulate virtio-blk on it's
own, parallel to the virtio implementation, so that even if there's a
problem with virtio-blk, KVM can emulate a virtio-blk on it's own.

Furthermore, why stop at virtio? What if the KVM code has a bug and it
doesn't pass IO properly? Or the x86 code? we still want panic
notifications if that happens...


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: don't crash on virtio MSI-X reset

2012-10-29 Thread Sasha Levin

Handle VIRTIO_MSI_NO_VECTOR by not trying to use it as a valid vector.

We still need to remove the GSI and everything, but this is enough
to prevent crashes and keep everything working properly for now.

Reported-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/pci.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index 3acaa3a..adc8efc 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -146,6 +146,8 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, 
struct virtio_device *v
switch (offset) {
case VIRTIO_MSI_CONFIG_VECTOR:
vec = vpci-config_vector = ioport__read16(data);
+   if (vec == VIRTIO_MSI_NO_VECTOR)
+   break;
 
gsi = irq__add_msix_route(kvm, 
vpci-msix_table[vec].msg);
 
@@ -154,6 +156,9 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, 
struct virtio_device *v
case VIRTIO_MSI_QUEUE_VECTOR:
vec = vpci-vq_vector[vpci-queue_selector] = 
ioport__read16(data);
 
+   if (vec == VIRTIO_MSI_NO_VECTOR)
+   break;
+
gsi = irq__add_msix_route(kvm, 
vpci-msix_table[vec].msg);
vpci-gsis[vpci-queue_selector] = gsi;
if (vdev-ops-notify_vq_gsi)
@@ -253,7 +258,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct 
virtio_device *vdev, u32 vq)
struct virtio_pci *vpci = vdev-virtio;
int tbl = vpci-vq_vector[vq];
 
-   if (virtio_pci__msix_enabled(vpci)) {
+   if (virtio_pci__msix_enabled(vpci)  tbl != VIRTIO_MSI_NO_VECTOR) {
if (vpci-pci_hdr.msix.ctrl  
cpu_to_le16(PCI_MSIX_FLAGS_MASKALL) ||
vpci-msix_table[tbl].ctrl  
cpu_to_le16(PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
 
@@ -277,7 +282,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct 
virtio_device *vdev)
struct virtio_pci *vpci = vdev-virtio;
int tbl = vpci-config_vector;
 
-   if (virtio_pci__msix_enabled(vpci)) {
+   if (virtio_pci__msix_enabled(vpci)  tbl != VIRTIO_MSI_NO_VECTOR) {
if (vpci-pci_hdr.msix.ctrl  
cpu_to_le16(PCI_MSIX_FLAGS_MASKALL) ||
vpci-msix_table[tbl].ctrl  
cpu_to_le16(PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
 
@@ -286,7 +291,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct 
virtio_device *vdev)
}
 
if (vpci-features  VIRTIO_PCI_F_SIGNAL_MSI)
-   virtio_pci__signal_msi(kvm, vpci, vpci-config_vector);
+   virtio_pci__signal_msi(kvm, vpci, tbl);
else
kvm__irq_trigger(kvm, vpci-config_gsi);
} else {
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 2/3] lockdep: be nice about compiling from userspace

2012-10-25 Thread Sasha Levin

On 10/25/2012 04:05 AM, Ingo Molnar wrote:
 
 * Sasha Levin sasha.le...@oracle.com wrote:
 
 We can rather easily make lockdep work from userspace, although 3 issues
 remain which I'm not sure about:

  - Kernel naming - we can just wrap init_utsname() to return kvmtool related
 utsname, is that what we want though?

  - static_obj() - I don't have a better idea than calling mprobe(), which 
 sounds
 wrong as well.

  - debug_show_all_locks() - we don't actually call it from userspace yet, 
 but I think
 we might want to, so I'm not sure how to make it pretty using existing 
 kernel code.

 Signed-off-by: Sasha Levin sasha.le...@oracle.com
 ---
  kernel/lockdep.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

 diff --git a/kernel/lockdep.c b/kernel/lockdep.c
 index 7981e5b..fdd3670 100644
 --- a/kernel/lockdep.c
 +++ b/kernel/lockdep.c
 @@ -567,10 +567,12 @@ static void lockdep_print_held_locks(struct 
 task_struct *curr)
  
  static void print_kernel_ident(void)
  {
 +#ifdef __KERNEL__
  printk(%s %.*s %s\n, init_utsname()-release,
  (int)strcspn(init_utsname()-version,  ),
  init_utsname()-version,
  print_tainted());
 +#endif
 
 I guess wrapping init_utsname() is not worth it. Although 
 kvmtool could provide the host system's utsname - kernel 
 identity is useful for debugging info.
 
 You could generate a Git hash version string like tools/perf/ 
 does (see PERF_VERSION and tools/perf/util/PERF-VERSION-GEN), 
 and put that into the -version field.
 
 -release could be the kvmtool version, and print_tainted() 
 could return an empty string.
 
 That way you could provide init_utsname() and could remove this 
 #ifdef.

Yeah, we already generate the version string for
'lkvm version' anyways, so I guess I'll just add init_utsname().


  }
  
  static int very_verbose(struct lock_class *class)
 @@ -586,6 +588,7 @@ static int very_verbose(struct lock_class *class)
   */
  static int static_obj(void *obj)
  {
 +#ifdef __KERNEL__
  unsigned long start = (unsigned long) _stext,
end   = (unsigned long) _end,
addr  = (unsigned long) obj;
 @@ -609,6 +612,8 @@ static int static_obj(void *obj)
   * module static or percpu var?
   */
  return is_module_address(addr) || is_module_percpu_address(addr);
 +#endif
 +return 1;
 
 Could you put an:
 
 #ifndef static_obj
 
 around it? Then kvmtool could define its own trivial version of 
 static_obj():
 
   #define static_obj(x) 1U
 
 or so.
 
 @@ -4108,7 +4113,7 @@ void debug_check_no_locks_held(struct task_struct 
 *task)
  if (unlikely(task-lockdep_depth  0))
  print_held_locks_bug(task);
  }
 -
 +#ifdef __KERNEL__
  void debug_show_all_locks(void)
  {
  struct task_struct *g, *p;
 
 I guess a show-all-locks functionality would be useful to 
 kvmtool as well?

Regarding the above two,

Yes, we can wrap both static_obj() and debug_show_all_locks() with #ifndefs
and let kvmtool provide it's own version of those two.

The question is here more of a would lockdep maintainers be ok with adding
that considering there's no in-kernel justification for those?


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] kvm tools: check ioctl return value for msi signal

2012-10-25 Thread Sasha Levin

On Thu, Oct 25, 2012 at 5:15 AM, William Dauchy will...@gandi.net wrote:
 On Oct25 10:01, Pekka Enberg wrote:
 Is this something that happens on your machine? I wonder if it makes more
 sense to return the error from virtio_pci__signal_vq() and
 virtio_pci__signal_config() and make the callers deal with it.

 no but I thought it was worth not keeping a system call unchecked;
 particularly when you are debugging something (ballooning in my case),
 just wanted to make sure the call was indeed ok.

I wonder if we should revert to using good old IRQ injection in case
this one fails.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 2/3] lockdep: be nice about compiling from userspace

2012-10-25 Thread Sasha Levin

On 10/25/2012 01:06 PM, Ingo Molnar wrote:
 * Sasha Levin sasha.le...@oracle.com wrote:
 Yes, we can wrap both static_obj() and debug_show_all_locks() 
 with #ifndefs and let kvmtool provide it's own version of 
 those two.
 
 Only static_obj() - I see no immediate reason why you shouldn't 
 be able to utilize debug_show_all_locks(). 'vm debug -a' already 
 lists all backtraces on all vcpus - so 'vm debug lockdep -a' 
 could list all current locks and indicate which one is held and 
 by whom.

I'm not sure how we'd make debug_show_all_locks() work in userspace
since it would require us to wrap do_each_thread()  friends to iterate
over all our task_structs.

I was thinking about writing a corresponding debug_show_all_locks() that
would simply iterate a list of our dummy task_structs.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] kvm tools: use the correct config vector interrupt

2012-10-25 Thread Sasha Levin

On Thu, Oct 25, 2012 at 3:03 AM, Pekka Enberg penb...@kernel.org wrote:
 On Wed, 24 Oct 2012, William Dauchy wrote:
 when registering the config interrupt, the later is registered in
 vcpi-config_vector and not in vpci-vq_vector

 introduced in:
 a841f15 kvm tools: Use the new KVM_SIGNAL_MSI ioctl to inject
 interrupts directly.

 Signed-off-by: William Dauchy will...@gandi.net
 ---
  tools/kvm/virtio/pci.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
 index ab1119a..f4ea3c9 100644
 --- a/tools/kvm/virtio/pci.c
 +++ b/tools/kvm/virtio/pci.c
 @@ -288,7 +288,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct 
 virtio_device *vdev)
   }

   if (vpci-features  VIRTIO_PCI_F_SIGNAL_MSI)
 - virtio_pci__signal_msi(kvm, vpci, 
 vpci-vq_vector[vpci-config_vector]);
 + virtio_pci__signal_msi(kvm, vpci, vpci-config_vector);
   else
   kvm__irq_trigger(kvm, vpci-config_gsi);
   } else {

 Sasha?

Indeed, we tried signaling the config vector by signaling vq0, woops.

Acked-by: Sasha Levin levinsasha...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] lkvm crash on crashkernel boot

2012-10-25 Thread Sasha Levin

On Thu, Oct 25, 2012 at 8:16 AM, Kirill A. Shutemov
kirill.shute...@linux.intel.com wrote:
 On Thu, Oct 25, 2012 at 10:17:27AM +0300, Pekka Enberg wrote:
 On Wed, Oct 24, 2012 at 6:27 PM, Kirill A. Shutemov
 kirill.shute...@linux.intel.com wrote:
  Hi,
 
  I've tried to play with kexec using lkvm. Unfortunately, lkvm crashes when
  I try to switch to crashkernel.
 
  I use Linus tree + penberg/kvmtool/next + one x86 mm patch[1].
 
  Kernel is defconfig + kvmconfig. I use the same kernel image for system and
  crash env.
 
  Host:
 
  % lkvm run --cpus 1 -m 1024 --params 'crashkernel=256M loglevel=8'
 
  Guest:
 
  # kexec -p bzImage --reuse-cmdline
  # echo c  /proc/sysrq-trigger
  ...
  [0.947984] loop: module loaded
  [0.950078] virtio-pci :00:01.0: irq 40 for MSI/MSI-X
  [0.950925] virtio-pci :00:01.0: irq 41 for MSI/MSI-X
  [0.952944] virtio-pci :00:01.0: irq 42 for MSI/MSI-X
  zsh: segmentation fault (core dumped)  lkvm run --cpus 1 -m 1024 --params 
  'crashkernel=256M loglevel=8'

 This seems to work OK on my machine.

  Guest kernel is somewhere in virtio_net initialization (for the second
  time). I'm too lazy to find exact line.
 
  Backtrace:
 
  0  irq__add_msix_route (kvm=kvm@entry=0xbf8010, msg=0xe3d090) at 
  x86/irq.c:210
  #1  0x0041b3bf in virtio_pci__specific_io_out.isra.5 
  (offset=optimized out,
  data=optimized out, kvm=0xbf8010) at virtio/pci.c:150
  #2  virtio_pci__io_out.9406 (ioport=optimized out, kvm=0xbf8010, 
  port=optimized out,
  data=optimized out, size=optimized out) at virtio/pci.c:208
  #3  0x0040f8c3 in kvm__emulate_io (count=optimized out, size=2, 
  direction=1,
  data=optimized out, port=25108, kvm=0xbf8010) at ioport.c:165
  #4  kvm_cpu__start (cpu=optimized out) at 
  x86/include/kvm/kvm-cpu-arch.h:41
  #5  0x00416ca2 in kvm_cpu_thread.2824 (arg=optimized out) at 
  builtin-run.c:176
  #6  0x7f701ebd0b50 in start_thread (arg=optimized out) at 
  pthread_create.c:304
  #7  0x7f701e1fe70d in clone () at 
  ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
  #8  0x in ?? ()

 Looks like vpci-msix_table might not be initialized properly. Sasha,
 Asias, care to take a look at this?

 vec is 0x in virtio_pci__specific_io_out() on crash.

 Let's add proper bounds checking there. It doesn't not solves the issue
 with booting crashkernel, but fix lkvm crash.

 With the patch below I've got:

 [0.988004] NET: Registered protocol family 17
 [0.988550] 9pnet: Installing 9P2000 support
 [0.989006] virtio-pci :00:02.0: irq 40 for MSI/MSI-X
 [0.989889] virtio-pci :00:02.0: irq 41 for MSI/MSI-X
 [0.991117] virtio-pci :00:02.0: irq 40 for MSI/MSI-X
 [0.991716] virtio-pci :00:02.0: irq 41 for MSI/MSI-X
 [0.993028] 9pnet_virtio: probe of virtio1 failed with error -2
 [0.993811] virtio-pci :00:03.0: irq 40 for MSI/MSI-X
 [0.993895] virtio-pci :00:03.0: irq 41 for MSI/MSI-X
 [0.995186] virtio-pci :00:03.0: irq 40 for MSI/MSI-X
 [0.995899] virtio-pci :00:03.0: irq 41 for MSI/MSI-X
 [0.997030] 9pnet_virtio: probe of virtio2 failed with error -2
 [0.997891] Key type dns_resolver registered
 [0.998536] PM: Hibernation image not present or could not be loaded.
 [0.998902] registered taskstats version 1
 [1.001163]   Magic number: 0:241:128
 [1.001887] console [netcon0] enabled
 [1.002881] netconsole: network logging started
 [1.175863] Switching to clocksource tsc
 [   13.017445] ALSA device list:
 [   13.017834]   No soundcards found.
 [   13.018382] md: Waiting for all devices to be available before
 autodetect
 [   13.019090] md: If you don't use raid, use raid=noautodetect
 [   13.019867] md: Autodetecting RAID arrays.
 [   13.020280] md: Scanned 0 and added 0 devices.
 [   13.020728] md: autorun ...
 [   13.021008] md: ... autorun DONE.
 [   13.021405] 9pnet_virtio: no channels available
 [   13.021958] VFS: Cannot open root device root or unknown-block(0,0):
 error -2
 [   13.022749] Please append a correct root= boot option; here are the
 available partitions:
 [   13.023641] Kernel panic - not syncing: VFS: Unable to mount root fs on
 unknown-block(0,0)
 [   13.024462] Pid: 1, comm: swapper/0 Not tainted 3.7.0-rc2+ #20
 [   13.024638] Call Trace:
 [   13.024638]  [8174ae94] panic+0xb6/0x1b5
 [   13.024638]  [81cc7e0c] mount_block_root+0x183/0x221
 [   13.024638]  [81cc7fa4] mount_root+0xfa/0x105
 [   13.024638]  [81cc80ec] prepare_namespace+0x13d/0x16a
 [   13.024638]  [81729ee6] kernel_init+0x1c6/0x2e0
 [   13.024638]  [81cc75af] ? do_early_param+0x8c/0x8c
 [   13.024638]  [81729d20] ? rest_init+0x70/0x70
 [   13.024638]  [8175db2c] ret_from_fork+0x7c/0xb0
 [   13.024638]  [81729d20] ? rest_init+0x70/0x70
 [   13.024638] Rebooting in 1 seconds..  Warning: serial8250__exit failed.


   # KVM session ended normally.

[PATCH 2/2] kvm tools: name threads

2012-10-24 Thread Sasha Levin

Give threads a meaningful name. This makes debugging much easier, and
everything else much prettier.

Suggested-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/disk/core.c   | 2 ++
 tools/kvm/include/kvm/kvm.h | 7 +++
 tools/kvm/ioeventfd.c   | 2 ++
 tools/kvm/kvm-ipc.c | 2 ++
 tools/kvm/net/uip/tcp.c | 3 +++
 tools/kvm/net/uip/udp.c | 3 +++
 tools/kvm/ui/sdl.c  | 2 ++
 tools/kvm/ui/vnc.c  | 2 ++
 tools/kvm/util/threadpool.c | 2 ++
 tools/kvm/virtio/blk.c  | 2 ++
 tools/kvm/virtio/net.c  | 4 
 11 files changed, 31 insertions(+)

diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c
index b313b28..dd59751 100644
--- a/tools/kvm/disk/core.c
+++ b/tools/kvm/disk/core.c
@@ -63,6 +63,8 @@ static void *disk_image__thread(void *param)
int nr, i;
u64 dummy;
 
+   kvm__set_thread_name(disk-image-io);
+
while (read(disk-evt, dummy, sizeof(dummy))  0) {
nr = io_getevents(disk-ctx, 1, ARRAY_SIZE(event), event, 
notime);
for (i = 0; i  nr; i++)
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index b460656..1c7fab7 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -10,6 +10,7 @@
 #include linux/types.h
 #include time.h
 #include signal.h
+#include sys/prctl.h
 
 #define SIGKVMEXIT (SIGRTMIN + 0)
 #define SIGKVMPAUSE(SIGRTMIN + 1)
@@ -118,4 +119,10 @@ static inline void *guest_flat_to_host(struct kvm *kvm, 
unsigned long offset)
 
 bool kvm__supports_extension(struct kvm *kvm, unsigned int extension);
 
+static inline void kvm__set_thread_name(const char *name)
+{
+   prctl(PR_SET_NAME, name);
+}
+
 #endif /* KVM__KVM_H */
+
diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c
index a68d8d0..ff665d4 100644
--- a/tools/kvm/ioeventfd.c
+++ b/tools/kvm/ioeventfd.c
@@ -24,6 +24,8 @@ static void *ioeventfd__thread(void *param)
 {
u64 tmp = 1;
 
+   kvm__set_thread_name(ioeventfd-worker);
+
for (;;) {
int nfds, i;
 
diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
index d23edd2..bdcc0d1 100644
--- a/tools/kvm/kvm-ipc.c
+++ b/tools/kvm/kvm-ipc.c
@@ -260,6 +260,8 @@ static void *kvm_ipc__thread(void *param)
struct epoll_event event;
struct kvm *kvm = param;
 
+   kvm__set_thread_name(kvm-ipc);
+
for (;;) {
int nfds;
 
diff --git a/tools/kvm/net/uip/tcp.c b/tools/kvm/net/uip/tcp.c
index 711a716..830aa3f 100644
--- a/tools/kvm/net/uip/tcp.c
+++ b/tools/kvm/net/uip/tcp.c
@@ -1,5 +1,6 @@
 #include kvm/uip.h
 
+#include kvm/kvm.h
 #include linux/virtio_net.h
 #include linux/kernel.h
 #include linux/list.h
@@ -176,6 +177,8 @@ static void *uip_tcp_socket_thread(void *p)
int len, left, ret;
u8 *payload, *pos;
 
+   kvm__set_thread_name(uip-tcp);
+
sk = p;
 
payload = malloc(UIP_MAX_TCP_PAYLOAD);
diff --git a/tools/kvm/net/uip/udp.c b/tools/kvm/net/uip/udp.c
index d4518b2..5b6ec1c 100644
--- a/tools/kvm/net/uip/udp.c
+++ b/tools/kvm/net/uip/udp.c
@@ -1,5 +1,6 @@
 #include kvm/uip.h
 
+#include kvm/kvm.h
 #include linux/virtio_net.h
 #include linux/kernel.h
 #include linux/list.h
@@ -160,6 +161,8 @@ static void *uip_udp_socket_thread(void *p)
int nfds;
int i;
 
+   kvm__set_thread_name(uip-udp);
+
info = p;
 
do {
diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c
index 172a12b..9994490 100644
--- a/tools/kvm/ui/sdl.c
+++ b/tools/kvm/ui/sdl.c
@@ -206,6 +206,8 @@ static void *sdl__thread(void *p)
SDL_Event ev;
Uint32 flags;
 
+   kvm__set_thread_name(kvm-sdl-worker);
+
if (SDL_Init(SDL_INIT_VIDEO) != 0)
die(Unable to initialize SDL);
 
diff --git a/tools/kvm/ui/vnc.c b/tools/kvm/ui/vnc.c
index efdc0f4..12e4bd5 100644
--- a/tools/kvm/ui/vnc.c
+++ b/tools/kvm/ui/vnc.c
@@ -185,6 +185,8 @@ static void *vnc__thread(void *p)
char argv[1][1] = {{0}};
int argc = 1;
 
+   kvm__set_thread_name(kvm-vnc-worker);
+
server = rfbGetScreen(argc, (char **) argv, fb-width, fb-height, 8, 
3, 4);
server-frameBuffer = fb-mem;
server-alwaysShared= TRUE;
diff --git a/tools/kvm/util/threadpool.c b/tools/kvm/util/threadpool.c
index 85ac7e7..a363831 100644
--- a/tools/kvm/util/threadpool.c
+++ b/tools/kvm/util/threadpool.c
@@ -78,6 +78,8 @@ static void *thread_pool__threadfunc(void *param)
 {
pthread_cleanup_push(thread_pool__threadfunc_cleanup, NULL);
 
+   kvm__set_thread_name(threadpool-worker);
+
while (running) {
struct thread_pool__job *curjob = NULL;
 
diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index cff38aa..f76342c 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -179,6 +179,8 @@ static void *virtio_blk_thread(void *dev)
u64 data;
int r

[PATCH 1/2] kvm tools: use correct init group for the PCI controller

2012-10-24 Thread Sasha Levin

PCI controller is what deals with PCI devices, and it depends on
vcpus being there, so it should be in the dev_base group.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c
index a28b5e2..c77d3cc 100644
--- a/tools/kvm/pci.c
+++ b/tools/kvm/pci.c
@@ -203,7 +203,7 @@ int pci__init(struct kvm *kvm)
 
return 0;
 }
-base_init(pci__init);
+dev_base_init(pci__init);
 
 int pci__exit(struct kvm *kvm)
 {
@@ -212,4 +212,4 @@ int pci__exit(struct kvm *kvm)
 
return 0;
 }
-base_exit(pci__exit);
+dev_base_exit(pci__exit);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 1/3] kvm tools: use mutex abstraction instead of pthread mutex

2012-10-24 Thread Sasha Levin

We already have something to wrap pthread with mutex_[init,lock,unlock]
calls. This patch creates a new struct mutex abstraction and moves
everything to work with it.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/hw/serial.c  | 10 +-
 tools/kvm/include/kvm/mutex.h  | 22 ++
 tools/kvm/include/kvm/qcow.h   |  2 +-
 tools/kvm/include/kvm/threadpool.h |  4 ++--
 tools/kvm/include/kvm/uip.h| 10 +-
 tools/kvm/net/uip/buf.c|  4 ++--
 tools/kvm/net/uip/core.c   |  6 +++---
 tools/kvm/net/uip/tcp.c|  6 +++---
 tools/kvm/net/uip/udp.c|  2 +-
 tools/kvm/util/threadpool.c|  8 
 tools/kvm/virtio/blk.c |  4 ++--
 tools/kvm/virtio/console.c |  4 ++--
 tools/kvm/virtio/net.c | 14 +++---
 13 files changed, 51 insertions(+), 45 deletions(-)

diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c
index a177a7f..53b684a 100644
--- a/tools/kvm/hw/serial.c
+++ b/tools/kvm/hw/serial.c
@@ -22,7 +22,7 @@
 #define UART_IIR_TYPE_BITS 0xc0
 
 struct serial8250_device {
-   pthread_mutex_t mutex;
+   struct mutexmutex;
u8  id;
 
u16 iobase;
@@ -55,7 +55,7 @@ struct serial8250_device {
 static struct serial8250_device devices[] = {
/* ttyS0 */
[0] = {
-   .mutex  = PTHREAD_MUTEX_INITIALIZER,
+   .mutex  = MUTEX_INITIALIZER,
 
.id = 0,
.iobase = 0x3f8,
@@ -65,7 +65,7 @@ static struct serial8250_device devices[] = {
},
/* ttyS1 */
[1] = {
-   .mutex  = PTHREAD_MUTEX_INITIALIZER,
+   .mutex  = MUTEX_INITIALIZER,
 
.id = 1,
.iobase = 0x2f8,
@@ -75,7 +75,7 @@ static struct serial8250_device devices[] = {
},
/* ttyS2 */
[2] = {
-   .mutex  = PTHREAD_MUTEX_INITIALIZER,
+   .mutex  = MUTEX_INITIALIZER,
 
.id = 2,
.iobase = 0x3e8,
@@ -85,7 +85,7 @@ static struct serial8250_device devices[] = {
},
/* ttyS3 */
[3] = {
-   .mutex  = PTHREAD_MUTEX_INITIALIZER,
+   .mutex  = MUTEX_INITIALIZER,
 
.id = 3,
.iobase = 0x2e8,
diff --git a/tools/kvm/include/kvm/mutex.h b/tools/kvm/include/kvm/mutex.h
index 3286cea..4f31025 100644
--- a/tools/kvm/include/kvm/mutex.h
+++ b/tools/kvm/include/kvm/mutex.h
@@ -10,23 +10,29 @@
  * to write user-space code! :-)
  */
 
-#define DEFINE_MUTEX(mutex) pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER
-
-static inline void mutex_init(pthread_mutex_t *mutex)
+struct mutex {
+   pthread_mutex_t mutex;
+};
+#define MUTEX_INITIALIZER (struct mutex) { .mutex = PTHREAD_MUTEX_INITIALIZER }
+   
+#define DEFINE_MUTEX(mtx) struct mutex mtx = MUTEX_INITIALIZER
+
+static inline void mutex_init(struct mutex *lock)
 {
-   if (pthread_mutex_init(mutex, NULL) != 0)
+   if (pthread_mutex_init(lock-mutex, NULL) != 0)
die(unexpected pthread_mutex_init() failure!);
 }
 
-static inline void mutex_lock(pthread_mutex_t *mutex)
+static inline void mutex_lock(struct mutex *lock)
 {
-   if (pthread_mutex_lock(mutex) != 0)
+   if (pthread_mutex_lock(lock-mutex) != 0)
die(unexpected pthread_mutex_lock() failure!);
+
 }
 
-static inline void mutex_unlock(pthread_mutex_t *mutex)
+static inline void mutex_unlock(struct mutex *lock)
 {
-   if (pthread_mutex_unlock(mutex) != 0)
+   if (pthread_mutex_unlock(lock-mutex) != 0)
die(unexpected pthread_mutex_unlock() failure!);
 }
 
diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h
index e032a1e..f849246 100644
--- a/tools/kvm/include/kvm/qcow.h
+++ b/tools/kvm/include/kvm/qcow.h
@@ -74,7 +74,7 @@ struct qcow_header {
 };
 
 struct qcow {
-   pthread_mutex_t mutex;
+   struct mutexmutex;
struct qcow_header  *header;
struct qcow_l1_tabletable;
struct qcow_refcount_table  refcount_table;
diff --git a/tools/kvm/include/kvm/threadpool.h 
b/tools/kvm/include/kvm/threadpool.h
index abe46ea..bacb243 100644
--- a/tools/kvm/include/kvm/threadpool.h
+++ b/tools/kvm/include/kvm/threadpool.h
@@ -15,7 +15,7 @@ struct thread_pool__job {
void*data;
 
int signalcount;
-   pthread_mutex_t mutex;
+   struct mutexmutex

[RFC 2/3] lockdep: be nice about compiling from userspace

2012-10-24 Thread Sasha Levin

We can rather easily make lockdep work from userspace, although 3 issues
remain which I'm not sure about:

 - Kernel naming - we can just wrap init_utsname() to return kvmtool related
utsname, is that what we want though?

 - static_obj() - I don't have a better idea than calling mprobe(), which sounds
wrong as well.

 - debug_show_all_locks() - we don't actually call it from userspace yet, but I 
think
we might want to, so I'm not sure how to make it pretty using existing kernel 
code.

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 kernel/lockdep.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..fdd3670 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -567,10 +567,12 @@ static void lockdep_print_held_locks(struct task_struct 
*curr)
 
 static void print_kernel_ident(void)
 {
+#ifdef __KERNEL__
printk(%s %.*s %s\n, init_utsname()-release,
(int)strcspn(init_utsname()-version,  ),
init_utsname()-version,
print_tainted());
+#endif
 }
 
 static int very_verbose(struct lock_class *class)
@@ -586,6 +588,7 @@ static int very_verbose(struct lock_class *class)
  */
 static int static_obj(void *obj)
 {
+#ifdef __KERNEL__
unsigned long start = (unsigned long) _stext,
  end   = (unsigned long) _end,
  addr  = (unsigned long) obj;
@@ -609,6 +612,8 @@ static int static_obj(void *obj)
 * module static or percpu var?
 */
return is_module_address(addr) || is_module_percpu_address(addr);
+#endif
+   return 1;
 }
 
 /*
@@ -4108,7 +4113,7 @@ void debug_check_no_locks_held(struct task_struct *task)
if (unlikely(task-lockdep_depth  0))
print_held_locks_bug(task);
 }
-
+#ifdef __KERNEL__
 void debug_show_all_locks(void)
 {
struct task_struct *g, *p;
@@ -4166,7 +4171,7 @@ retry:
read_unlock(tasklist_lock);
 }
 EXPORT_SYMBOL_GPL(debug_show_all_locks);
-
+#endif
 /*
  * Careful: only use this function if you are sure that
  * the task cannot run in parallel!
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 3/3] kvm tools: use lockdep to detect locking issues

2012-10-24 Thread Sasha Levin

Not only the kernel can use lockdep to detect locking issues, so can we!

Just interface into lockdep and let it do exactly what it does in the kernel
to detect different locking issues.

For example, this stupid code:

mutex_lock(mutex_A);
mutex_lock(mutex_A);

Would result in a spew from lockdep:

=
[ INFO: possible recursive locking detected ]
-
kvm-main/3481 is trying to acquire lock:
 (mutex_A){..}, at: ./vm(kvm_cmd_run+0x5a) [0x426eba]

but task is already holding lock:
 (mutex_A){..}, at: ./vm(kvm_cmd_run+0x5a) [0x426eba]

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(mutex_A);
  lock(mutex_A);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by kvm-main/3481:
 #0:  (mutex_A){..}, at: ./vm(kvm_cmd_run+0x5a) [0x426eba]

stack backtrace:
./vm[0x40afe4]
./vm[0x40c4bc]
./vm(lock_acquire+0x3e)[0x426fce]
./vm[0x40e988]
./vm(kvm_cmd_run+0x5a)[0x426eba]
./vm(handle_command+0x41)[0x4206f1]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f73139e56c5]
./vm[0x40985d]

Or things like releasing a lock twice:

mutex_lock(mutex_A);
mutex_unlock(mutex_A);
mutex_unlock(mutex_A);

Would result in a spew:

=
[ BUG: bad unlock balance detected! ]
-
kvm-main/3643 is trying to release lock (mutex_A) at:
./vm(kvm_cmd_run+0x5a) [0x426e1a]
but there are no more locks to release!

other info that might help us debug this:
no locks held by kvm-main/3643.

stack backtrace:
./vm[0x40afe4]
./vm[0x412688]
./vm(lock_release+0xb0)[0x413f70]
./vm[0x40e8ee]
./vm(kvm_cmd_run+0x5a)[0x426e1a]
./vm(handle_command+0x41)[0x420651]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f2ed58446c5]
./vm[0x40985d]


And even a more complex ABBA issue such as:

mutex_lock(mutex_A);
mutex_lock(mutex_B);
mutex_unlock(mutex_B);
mutex_unlock(mutex_A);

mutex_lock(mutex_B);
mutex_lock(mutex_A);
mutex_unlock(mutex_A);
mutex_unlock(mutex_B);

Would result in a spew from lockdep:

==
[ INFO: possible circular locking dependency detected ]
---
kvm-main/2159 is trying to acquire lock:
 (mutex_A){..}, at: ./vm(kvm_cmd_run+0x5a) [0x426dda]

but task is already holding lock:
 (mutex_B){..}, at: ./vm(kvm_cmd_run+0x5a) [0x426dda]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

- #1 (mutex_B){..}:
./vm[0x415aa7]
./vm[0x40c6b6]
./vm(lock_acquire+0x3e)[0x426eee]
./vm[0x4120e0]
./vm(kvm_cmd_run+0x5a)[0x426dda]
./vm(handle_command+0x41)[0x420611]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd640ab76c5]
./vm[0x40985d]

- #0 (mutex_A){..}:
./vm[0x415aa7]
./vm[0x40c46a]
./vm(lock_acquire+0x3e)[0x426eee]
./vm[0x41219e]
./vm(kvm_cmd_run+0x5a)[0x426dda]
./vm(handle_command+0x41)[0x420611]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd640ab76c5]
./vm[0x40985d]

other info that might help us debug this:

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(mutex_B);
   lock(mutex_A);
   lock(mutex_B);
  lock(mutex_A);

 *** DEADLOCK ***

1 lock held by kvm-main/2159:
 #0:  (mutex_B){..}, at: ./vm(kvm_cmd_run+0x5a) [0x426dda]

stack backtrace:
./vm[0x40afe4]
./vm[0x40c411]
./vm(lock_acquire+0x3e)[0x426eee]
./vm[0x41219e]
./vm(kvm_cmd_run+0x5a)[0x426dda]
./vm(handle_command+0x41)[0x420611]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd640ab76c5]
./vm[0x40985d]

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/Makefile|  14 ++-
 tools/kvm/builtin-run.c   |  18 ++-
 tools/kvm/include/kvm/kvm.h   |   2 +
 tools/kvm/include/kvm/mutex.h |  19 +++-
 tools/kvm/include/linux/bitops.h  |   1 -
 tools/kvm/include/linux/debug_locks.h |   7 ++
 tools/kvm/include/linux/ftrace.h  | 132 +
 tools/kvm/include/linux/gfp.h | 144 +++
 tools/kvm/include/linux/hardirq.h | 209 ++
 tools/kvm/include/linux/hash.h|   7 ++
 tools/kvm/include/linux/irqflags.h|  50 
 tools/kvm/include/linux/kallsyms.h| 129 +
 tools/kvm/include/linux/kernel.h  |  20 +++-
 tools/kvm/include/linux/lockdep.h |  56 +
 tools/kvm/include/linux/module.h  |   2 +-
 tools/kvm/include/linux/moduleparam.h |   6 +
 tools/kvm/include/linux/rcu.h |  19 
 tools/kvm/include/linux/spinlock.h|  33 ++
 tools/kvm/include/linux/stacktrace.h  |  31 +
 tools/kvm/include/linux/types.h   |   8 ++
 tools/kvm/virtio/pci.c|   2 +-
 21 files changed

Re: [PATCH 3/5] kvm tools: correctly handle mem ballooning

2012-10-24 Thread Sasha Levin

This one was actually correct originally.

When we inflate (-i) the balloon we increase num_pages, which is a
counter of how many pages the balloon has.


Thanks,
Sasha

On Wed, Oct 24, 2012 at 8:51 AM, William Dauchy will...@gandi.net wrote:
 from the virtio_balloon.h:
 num_pages is the number of pages host wants Guest to give up.

 Our implementation is doing the reverse, resulting in decreasing pages
 when increasing needed and so on.
 Reverse the implementation.

 Signed-off-by: William Dauchy will...@gandi.net
 ---
  tools/kvm/virtio/balloon.c |8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
 index e08f628..2201b9f 100644
 --- a/tools/kvm/virtio/balloon.c
 +++ b/tools/kvm/virtio/balloon.c
 @@ -162,12 +162,12 @@ static void handle_mem(struct kvm *kvm, int fd, u32 
 type, u32 len, u8 *msg)

 mem = *(int *)msg;
 if (mem  0) {
 -   bdev.config.num_pages += 256 * mem;
 -   } else if (mem  0) {
 -   if (bdev.config.num_pages  (u32)(256 * (-mem)))
 +   if (bdev.config.num_pages  ((u32)(256 * mem)))
 return;

 -   bdev.config.num_pages += 256 * mem;
 +   bdev.config.num_pages -= 256 * mem;
 +   } else if (mem  0) {
 +   bdev.config.num_pages -= 256 * mem;
 }

 /* Notify that the configuration space has changed */
 --
 1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm, async_pf: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT

2012-10-19 Thread Sasha Levin

KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
marked that spot as an exit from idleness.

Not doing so can cause RCU warnings such as:

[  732.788386] ===
[  732.789803] [ INFO: suspicious RCU usage. ]
[  732.790032] 3.7.0-rc1-next-20121019-sasha-2-g6d8d02d-dirty #63 Tainted: 
GW
[  732.790032] ---
[  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally 
while idle!
[  732.790032]
[  732.790032] other info that might help us debug this:
[  732.790032]
[  732.790032]
[  732.790032] RCU used illegally from idle CPU!
[  732.790032] rcu_scheduler_active = 1, debug_locks = 1
[  732.790032] RCU used illegally from extended quiescent state!
[  732.790032] 2 locks held by trinity-child31/8252:
[  732.790032]  #0:  (rq-lock){-.-.-.}, at: [83a67528] 
__schedule+0x178/0x8f0
[  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [81152bde] 
cpuacct_charge+0xe/0x200
[  732.790032]
[  732.790032] stack backtrace:
[  732.790032] Pid: 8252, comm: trinity-child31 Tainted: GW
3.7.0-rc1-next-20121019-sasha-2-g6d8d02d-dirty #63
[  732.790032] Call Trace:
[  732.790032]  [8118266b] lockdep_rcu_suspicious+0x10b/0x120
[  732.790032]  [81152c60] cpuacct_charge+0x90/0x200
[  732.790032]  [81152bde] ? cpuacct_charge+0xe/0x200
[  732.790032]  [81158093] update_curr+0x1a3/0x270
[  732.790032]  [81158a6a] dequeue_entity+0x2a/0x210
[  732.790032]  [81158ea5] dequeue_task_fair+0x45/0x130
[  732.790032]  [8114ae29] dequeue_task+0x89/0xa0
[  732.790032]  [8114bb9e] deactivate_task+0x1e/0x20
[  732.790032]  [83a67c29] __schedule+0x879/0x8f0
[  732.790032]  [8117e20d] ? trace_hardirqs_off+0xd/0x10
[  732.790032]  [810a37a5] ? kvm_async_pf_task_wait+0x1d5/0x2b0
[  732.790032]  [83a67cf5] schedule+0x55/0x60
[  732.790032]  [810a37c4] kvm_async_pf_task_wait+0x1f4/0x2b0
[  732.790032]  [81139e50] ? abort_exclusive_wait+0xb0/0xb0
[  732.790032]  [81139c25] ? prepare_to_wait+0x25/0x90
[  732.790032]  [810a3a66] do_async_page_fault+0x56/0xa0
[  732.790032]  [83a6a6e8] async_page_fault+0x28/0x30

Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 arch/x86/kernel/kvm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b3e5e51..4180a87 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -247,7 +247,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long 
error_code)
break;
case KVM_PV_REASON_PAGE_NOT_PRESENT:
/* page is swapped out by the host. */
+   rcu_irq_enter();
+   exit_idle();
kvm_async_pf_task_wait((u32)read_cr2());
+   rcu_irq_exit();
break;
case KVM_PV_REASON_PAGE_READY:
rcu_irq_enter();
-- 
1.7.12.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1235 matches

Mail list logo