Re: sanitizing kvmtool

2015-10-25 Thread Dmitry Vyukov
On Thu, Oct 22, 2015 at 1:07 AM, Sasha Levin <sasha.le...@oracle.com> wrote:
> On 10/19/2015 11:15 AM, Dmitry Vyukov wrote:
>> On Mon, Oct 19, 2015 at 5:08 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
>>> > On 10/19/2015 10:47 AM, Dmitry Vyukov wrote:
>>>>> >>> Right, the memory areas that are accessed both by the hypervisor and 
>>>>> >>> the guest
>>>>>> >>> > should be treated as untrusted input, but the hypervisor is 
>>>>>> >>> > supposed to validate
>>>>>> >>> > the input carefully before using it - so I'm not sure how data 
>>>>>> >>> > races would
>>>>>> >>> > introduce anything new that we didn't catch during validation.
>>>> >>
>>>> >> One possibility would be: if result of a racy read is passed to guest,
>>>> >> that can leak arbitrary host data into guest. Does not sound good.
>>>> >> Also, without usage of proper atomic operations, it is basically
>>>> >> impossible to verify untrusted data, as it can be changing under your
>>>> >> feet. And storing data into a local variable does not prevent the data
>>>> >> from changing.
>>> >
>>> > What's missing here is that the guest doesn't directly read/write the 
>>> > memory:
>>> > every time it accesses a memory that is shared with the host it will 
>>> > trigger
>>> > an exit, which will stop the vcpu thread that made the access and kernel 
>>> > side
>>> > kvm will pass the hypervisor the value the guest wrote (or the memory 
>>> > address
>>> > it attempted to read). The value/address can't change under us in that 
>>> > scenario.
>> But still: if result of a racy read is passed to guest, that can leak
>> arbitrary host data into guest.
>
> I see what you're saying. I need to think about it a bit, maybe we do need 
> locking
> for each of the virtio devices we emulate.
>
>
> On an unrelated note, a few of the reports are pointing to 
> ioport__unregister():
>
> ==
> WARNING: ThreadSanitizer: data race (pid=109228)
>   Write of size 8 at 0x7d1cdf40 by main thread:
> #0 free tsan/rtl/tsan_interceptors.cc:570 (lkvm+0x00443376)
> #1 ioport__unregister ioport.c:138:2 (lkvm+0x004a9ff9)
> #2 pci__exit pci.c:247:2 (lkvm+0x004ac857)
> #3 init_list__exit util/init.c:59:8 (lkvm+0x004bca6e)
> #4 kvm_cmd_run_exit builtin-run.c:645:2 (lkvm+0x004a68a7)
> #5 kvm_cmd_run builtin-run.c:661 (lkvm+0x004a68a7)
> #6 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #7 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #8 main main.c:18 (lkvm+0x004ac0b4)
>
>   Previous read of size 8 at 0x7d1cdf40 by thread T55:
> #0 rb_int_search_single util/rbtree-interval.c:14:17 (lkvm+0x004bf968)
> #1 ioport_search ioport.c:41:9 (lkvm+0x004aa05f)
> #2 kvm__emulate_io ioport.c:186 (lkvm+0x004aa05f)
> #3 kvm_cpu__emulate_io x86/include/kvm/kvm-cpu-arch.h:41:9 
> (lkvm+0x004aa718)
> #4 kvm_cpu__start kvm-cpu.c:126 (lkvm+0x004aa718)
> #5 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)
>
>   Thread T55 'kvm-vcpu-2' (tid=109285, finished) created by main thread at:
> #0 pthread_create tsan/rtl/tsan_interceptors.cc:848 (lkvm+0x004478a3)
> #1 kvm_cmd_run_work builtin-run.c:633:7 (lkvm+0x004a683f)
> #2 kvm_cmd_run builtin-run.c:660 (lkvm+0x004a683f)
> #3 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #4 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #5 main main.c:18 (lkvm+0x004ac0b4)
>
> SUMMARY: ThreadSanitizer: data race ioport.c:138:2 in ioport__unregister
> ==
>
> I think this is because we don't perform locking using pthread, but rather 
> pause
> the vm entirely - so the cpu threads it's pointing to aren't actually running 
> when
> we unregister ioports. Is there a way to annotate that for tsan?


I've looked at brlock and I think should understand it. Reader threads
write to the eventfd to notify that they are stopped and writer reads
from the event fd and tsan considers this write->read as
synchronization. I suspect that this can be caused by the same
use-after-free on cpu array, probably kvm__pause takes fast path when
it should not.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network hangs when communicating with host

2015-10-20 Thread Dmitry Vyukov
I now have another issue. My binary fails to mmap a file within lkvm
sandbox. The same binary works fine on host and in qemu. I've added
strace into sandbox script, and here is the output:

[pid   837] openat(AT_FDCWD, "syzkaller-shm048878722", O_RDWR|O_CLOEXEC) = 5
[pid   837] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 5,
0) = -1 EINVAL (Invalid argument)

I don't see anything that can potentially cause EINVAL here. Is it
possible that lkvm somehow affects kernel behavior here?

I run lkvm as:

$ taskset 1 /kvmtool/lkvm sandbox --disk syz-0 --mem=2048 --cpus=2
--kernel /arch/x86/boot/bzImage --network mode=user --sandbox
/workdir/kvm/syz-0.sh








On Mon, Oct 19, 2015 at 4:20 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
> On 10/19/2015 05:28 AM, Dmitry Vyukov wrote:
>> On Mon, Oct 19, 2015 at 11:22 AM, Andre Przywara <andre.przyw...@arm.com> 
>> wrote:
>>> Hi Dmitry,
>>>
>>> On 19/10/15 10:05, Dmitry Vyukov wrote:
>>>> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin <sasha.le...@oracle.com> 
>>>> wrote:
>>>>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I am trying to run a program in lkvm sandbox so that it communicates
>>>>>> with a program on host. I run lkvm as:
>>>>>>
>>>>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>>>>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>>>>>
>>>>>> /my_prog then connects to a program on host over a tcp socket.
>>>>>> I see that host receives some data, sends some data back, but then
>>>>>> my_prog hangs on network read.
>>>>>>
>>>>>> To localize this I wrote 2 programs (attached). ping is run on host
>>>>>> and pong is run from lkvm sandbox. They successfully establish tcp
>>>>>> connection, but after some iterations both hang on read.
>>>>>>
>>>>>> Networking code in Go runtime is there for more than 3 years, widely
>>>>>> used in production and does not have any known bugs. However, it uses
>>>>>> epoll edge-triggered readiness notifications that known to be tricky.
>>>>>> Is it possible that lkvm contains some networking bug? Can it be
>>>>>> related to the data races in lkvm I reported earlier today?
>>>
>>> Just to let you know:
>>> I think we have seen networking issues in the past - root over NFS had
>>> issues IIRC. Will spent some time on debugging this and it looked like a
>>> race condition in kvmtool's virtio implementation. I think pinning
>>> kvmtool's virtio threads to one host core made this go away. However
>>> although he tried hard (even by Will's standards!) he couldn't find a
>>> the real root cause or a fix at the time he looked at it and we found
>>> other ways to work around the issues (using virtio-blk or initrd's).
>>>
>>> So it's quite possible that there are issues. I haven't had time yet to
>>> look at your sanitizer reports, but it looks like a promising approach
>>> to find the root cause.
>>
>>
>> Thanks, Andre!
>>
>> ping/pong does not hang within at least 5 minutes when I run lkvm
>> under taskset 1.
>>
>> And, yeah, this pretty strongly suggests a data race. ThreadSanitizer
>> can point you to the bug within a minute, so you just need to say
>> "aha! it is here". Or maybe not. There are no guarantees. But if you
>> already spent significant time on this, then checking the reports
>> definitely looks like a good idea.
>
> Okay, that's good to know.
>
> I have a few busy days, but I'll definitely try to clear up these reports
> as they seem to be pointing to real issues.
>
>
> Thanks,
> Sasha
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network hangs when communicating with host

2015-10-19 Thread Dmitry Vyukov
On Mon, Oct 19, 2015 at 11:22 AM, Andre Przywara <andre.przyw...@arm.com> wrote:
> Hi Dmitry,
>
> On 19/10/15 10:05, Dmitry Vyukov wrote:
>> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
>>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>>>> Hello,
>>>>
>>>> I am trying to run a program in lkvm sandbox so that it communicates
>>>> with a program on host. I run lkvm as:
>>>>
>>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>>>
>>>> /my_prog then connects to a program on host over a tcp socket.
>>>> I see that host receives some data, sends some data back, but then
>>>> my_prog hangs on network read.
>>>>
>>>> To localize this I wrote 2 programs (attached). ping is run on host
>>>> and pong is run from lkvm sandbox. They successfully establish tcp
>>>> connection, but after some iterations both hang on read.
>>>>
>>>> Networking code in Go runtime is there for more than 3 years, widely
>>>> used in production and does not have any known bugs. However, it uses
>>>> epoll edge-triggered readiness notifications that known to be tricky.
>>>> Is it possible that lkvm contains some networking bug? Can it be
>>>> related to the data races in lkvm I reported earlier today?
>
> Just to let you know:
> I think we have seen networking issues in the past - root over NFS had
> issues IIRC. Will spent some time on debugging this and it looked like a
> race condition in kvmtool's virtio implementation. I think pinning
> kvmtool's virtio threads to one host core made this go away. However
> although he tried hard (even by Will's standards!) he couldn't find a
> the real root cause or a fix at the time he looked at it and we found
> other ways to work around the issues (using virtio-blk or initrd's).
>
> So it's quite possible that there are issues. I haven't had time yet to
> look at your sanitizer reports, but it looks like a promising approach
> to find the root cause.


Thanks, Andre!

ping/pong does not hang within at least 5 minutes when I run lkvm
under taskset 1.

And, yeah, this pretty strongly suggests a data race. ThreadSanitizer
can point you to the bug within a minute, so you just need to say
"aha! it is here". Or maybe not. There are no guarantees. But if you
already spent significant time on this, then checking the reports
definitely looks like a good idea.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sanitizing kvmtool

2015-10-19 Thread Dmitry Vyukov
On Sat, Oct 17, 2015 at 4:16 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
> On 10/15/2015 06:21 AM, Dmitry Vyukov wrote:
>> Hello,
>>
>> I've run a set of sanitizers on
>> git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git commit
>> 3695adeb227813d96d9c41850703fb53a23845eb. I've just booted a VM and
>> shut it down.
>>
>> AddressSanitizer detected a heap-use-after-free:
>>
>> AddressSanitizer: heap-use-after-free on address 0x6040df90 at pc
>> 0x004f46d0 bp 0x7ffc79def2d0 sp 0x7ffc79def2c8
>> READ of size 8 at 0x6040df90 thread T0
>> #0 0x4f46cf in kvm__pause kvm.c:436:7
>> #1 0x4f0d5d in ioport__unregister ioport.c:129:2
>> #2 0x4efb2f in serial8250__exit hw/serial.c:446:7
>> #3 0x516204 in init_list__exit util/init.c:59:8
>> #4 0x4ea956 in kvm_cmd_run_exit builtin-run.c:645:2
>> #5 0x4ea956 in kvm_cmd_run builtin-run.c:661
>> #6 0x51596f in handle_command kvm-cmd.c:84:8
>> #7 0x7fa398101ec4 in __libc_start_main
>> /build/buildd/eglibc-2.19/csu/libc-start.c:287
>> #8 0x41a505 in _start (lkvm+0x41a505)
>>
>> 0x6040df90 is located 0 bytes inside of 40-byte region
>> [0x6040df90,0x6040dfb8)
>> freed by thread T0 here:
>> #0 0x4b75a0 in free
>> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cc:30
>> #1 0x4f29e6 in kvm_cpu__exit kvm-cpu.c:263:2
>> #2 0x5160c4 in init_list__exit util/init.c:59:8
>>
>> previously allocated by thread T0 here:
>> #0 0x4b7a2c in calloc
>> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56
>> #1 0x4f2491 in kvm_cpu__init kvm-cpu.c:221:14
>
> I'm sending a patch to fix this + another issue it uncovered. This was caused 
> by
> the kvm cpu exit function to be marked as late call rather than core call, so 
> it
> would free the vcpus before anything else had a chance to exit.
>
>>
>> ThreadSanitizer detected a whole bunch of data races, for example:
>>
>> WARNING: ThreadSanitizer: data race (pid=109228)
>>   Write of size 1 at 0x7d6c0001f384 by thread T55:
>> #0 memcpy 
>> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:608
>> (lkvm+0x0044b28b)
>> #1 virtio_pci__msix_mmio_callback virtio/pci.c:269:3 
>> (lkvm+0x004b3ee0)
>> #2 kvm__emulate_mmio mmio.c:131:3 (lkvm+0x004ac332)
>> #3 kvm_cpu__emulate_mmio x86/include/kvm/kvm-cpu-arch.h:46:9
>> (lkvm+0x004aa8c6)
>> #4 kvm_cpu__start kvm-cpu.c:147 (lkvm+0x004aa8c6)
>> #5 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)
>>
>>   Previous read of size 4 at 0x7d6c0001f384 by thread T58:
>> #0 virtio_pci__signal_vq virtio/pci.c:290:29 (lkvm+0x004b36b6)
>> #1 virtio_net_tx_thread virtio/net.c:210:4 (lkvm+0x004b1fb5)
>>
>>   Location is heap block of size 1648 at 0x7d6c0001f100 allocated by
>> main thread:
>> #0 calloc 
>> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:544
>> (lkvm+0x0043e812)
>> #1 virtio_init virtio/core.c:191:12 (lkvm+0x004afa48)
>> #2 virtio_net__init_one virtio/net.c:846:2 (lkvm+0x004b095d)
>> #3 virtio_net__init virtio/net.c:868:3 (lkvm+0x004b0296)
>> #4 init_list__init util/init.c:40:8 (lkvm+0x004bc7ee)
>> #5 kvm_cmd_run_init builtin-run.c:621:6 (lkvm+0x004a6799)
>> #6 kvm_cmd_run builtin-run.c:656 (lkvm+0x004a6799)
>> #7 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
>> #8 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
>> #9 main main.c:18 (lkvm+0x004ac0b4)
>>
>>   Thread T55 'kvm-vcpu-2' (tid=109285, running) created by main thread at:
>> #0 pthread_create
>> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848
>> (lkvm+0x004478a3)
>> #1 kvm_cmd_run_work builtin-run.c:633:7 (lkvm+0x004a683f)
>> #2 kvm_cmd_run builtin-run.c:660 (lkvm+0x004a683f)
>> #3 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
>> #4 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
>> #5 main main.c:18 (lkvm+0x004ac0b4)
>>
>>   Thread T58 'virtio-net-tx' (tid=109334, running) created by thread T53 at:
>> #0 pthread_create
>> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848
>> (lkvm+0x004478a3)
>&g

Re: sanitizing kvmtool

2015-10-19 Thread Dmitry Vyukov
On Mon, Oct 19, 2015 at 4:35 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
> On 10/19/2015 10:24 AM, Dmitry Vyukov wrote:
>> On Mon, Oct 19, 2015 at 4:19 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
>>> > On 10/19/2015 04:37 AM, Dmitry Vyukov wrote:
>>>>> >>> So in this case (and most of the other data race cases described in 
>>>>> >>> the full log) it
>>>>>> >>> > seems like ThreadSanitizer is mixing with different accesses by 
>>>>>> >>> > the guest to one underlying
>>>>>> >>> > block of memory on the host.
>>>>>> >>> >
>>>>>> >>> > Here, for example, T55 accesses the msix block of the virtio-net 
>>>>>> >>> > PCI device, and T58 is accessing
>>>>>> >>> > the virtqueue exposed by that device. While they both get to the 
>>>>>> >>> > same block of memory inside
>>>> >> I don't understand this.
>>>> >> Do you mean that this is a false positive? Or it is a real issue in 
>>>> >> lkvm?
>>> >
>>> > I suspect it's a false positive because ThreadSanitizer sees the memory 
>>> > as one big
>>> > block, but the logic that makes sure we don't race here is within the 
>>> > guest vm, which
>>> > ThreadSanitizer doesn't see.
>>
>> But isn't the task of a hypervisor to sandbox the guest OS and to not
>> trust it in any way, shape or form? What if the guest VM intentionally
>> omits the synchronization? Can it exploit the host?
>
> Right, the memory areas that are accessed both by the hypervisor and the guest
> should be treated as untrusted input, but the hypervisor is supposed to 
> validate
> the input carefully before using it - so I'm not sure how data races would
> introduce anything new that we didn't catch during validation.


One possibility would be: if result of a racy read is passed to guest,
that can leak arbitrary host data into guest. Does not sound good.
Also, without usage of proper atomic operations, it is basically
impossible to verify untrusted data, as it can be changing under your
feet. And storing data into a local variable does not prevent the data
from changing.

There are also some data races with free(), it does not looks good at all.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sanitizing kvmtool

2015-10-19 Thread Dmitry Vyukov
On Mon, Oct 19, 2015 at 4:19 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
> On 10/19/2015 04:37 AM, Dmitry Vyukov wrote:
>>> So in this case (and most of the other data race cases described in the 
>>> full log) it
>>> > seems like ThreadSanitizer is mixing with different accesses by the guest 
>>> > to one underlying
>>> > block of memory on the host.
>>> >
>>> > Here, for example, T55 accesses the msix block of the virtio-net PCI 
>>> > device, and T58 is accessing
>>> > the virtqueue exposed by that device. While they both get to the same 
>>> > block of memory inside
>> I don't understand this.
>> Do you mean that this is a false positive? Or it is a real issue in lkvm?
>
> I suspect it's a false positive because ThreadSanitizer sees the memory as 
> one big
> block, but the logic that makes sure we don't race here is within the guest 
> vm, which
> ThreadSanitizer doesn't see.


But isn't the task of a hypervisor to sandbox the guest OS and to not
trust it in any way, shape or form? What if the guest VM intentionally
omits the synchronization? Can it exploit the host?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sanitizing kvmtool

2015-10-19 Thread Dmitry Vyukov
On Mon, Oct 19, 2015 at 5:08 PM, Sasha Levin <sasha.le...@oracle.com> wrote:
> On 10/19/2015 10:47 AM, Dmitry Vyukov wrote:
>>> Right, the memory areas that are accessed both by the hypervisor and the 
>>> guest
>>> > should be treated as untrusted input, but the hypervisor is supposed to 
>>> > validate
>>> > the input carefully before using it - so I'm not sure how data races would
>>> > introduce anything new that we didn't catch during validation.
>>
>> One possibility would be: if result of a racy read is passed to guest,
>> that can leak arbitrary host data into guest. Does not sound good.
>> Also, without usage of proper atomic operations, it is basically
>> impossible to verify untrusted data, as it can be changing under your
>> feet. And storing data into a local variable does not prevent the data
>> from changing.
>
> What's missing here is that the guest doesn't directly read/write the memory:
> every time it accesses a memory that is shared with the host it will trigger
> an exit, which will stop the vcpu thread that made the access and kernel side
> kvm will pass the hypervisor the value the guest wrote (or the memory address
> it attempted to read). The value/address can't change under us in that 
> scenario.

But still: if result of a racy read is passed to guest, that can leak
arbitrary host data into guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sanitizing kvmtool

2015-10-15 Thread Dmitry Vyukov
removed bad email address

On Thu, Oct 15, 2015 at 12:21 PM, Dmitry Vyukov <dvyu...@google.com> wrote:
> Hello,
>
> I've run a set of sanitizers on
> git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git commit
> 3695adeb227813d96d9c41850703fb53a23845eb. I've just booted a VM and
> shut it down.
>
> AddressSanitizer detected a heap-use-after-free:
>
> AddressSanitizer: heap-use-after-free on address 0x6040df90 at pc
> 0x004f46d0 bp 0x7ffc79def2d0 sp 0x7ffc79def2c8
> READ of size 8 at 0x6040df90 thread T0
> #0 0x4f46cf in kvm__pause kvm.c:436:7
> #1 0x4f0d5d in ioport__unregister ioport.c:129:2
> #2 0x4efb2f in serial8250__exit hw/serial.c:446:7
> #3 0x516204 in init_list__exit util/init.c:59:8
> #4 0x4ea956 in kvm_cmd_run_exit builtin-run.c:645:2
> #5 0x4ea956 in kvm_cmd_run builtin-run.c:661
> #6 0x51596f in handle_command kvm-cmd.c:84:8
> #7 0x7fa398101ec4 in __libc_start_main
> /build/buildd/eglibc-2.19/csu/libc-start.c:287
> #8 0x41a505 in _start (lkvm+0x41a505)
>
> 0x6040df90 is located 0 bytes inside of 40-byte region
> [0x6040df90,0x6040dfb8)
> freed by thread T0 here:
> #0 0x4b75a0 in free
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cc:30
> #1 0x4f29e6 in kvm_cpu__exit kvm-cpu.c:263:2
> #2 0x5160c4 in init_list__exit util/init.c:59:8
>
> previously allocated by thread T0 here:
> #0 0x4b7a2c in calloc
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56
> #1 0x4f2491 in kvm_cpu__init kvm-cpu.c:221:14
>
>
> ThreadSanitizer detected a whole bunch of data races, for example:
>
> WARNING: ThreadSanitizer: data race (pid=109228)
>   Write of size 1 at 0x7d6c0001f384 by thread T55:
> #0 memcpy 
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:608
> (lkvm+0x0044b28b)
> #1 virtio_pci__msix_mmio_callback virtio/pci.c:269:3 (lkvm+0x004b3ee0)
> #2 kvm__emulate_mmio mmio.c:131:3 (lkvm+0x004ac332)
> #3 kvm_cpu__emulate_mmio x86/include/kvm/kvm-cpu-arch.h:46:9
> (lkvm+0x004aa8c6)
> #4 kvm_cpu__start kvm-cpu.c:147 (lkvm+0x004aa8c6)
> #5 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)
>
>   Previous read of size 4 at 0x7d6c0001f384 by thread T58:
> #0 virtio_pci__signal_vq virtio/pci.c:290:29 (lkvm+0x004b36b6)
> #1 virtio_net_tx_thread virtio/net.c:210:4 (lkvm+0x004b1fb5)
>
>   Location is heap block of size 1648 at 0x7d6c0001f100 allocated by
> main thread:
> #0 calloc 
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:544
> (lkvm+0x0043e812)
> #1 virtio_init virtio/core.c:191:12 (lkvm+0x004afa48)
> #2 virtio_net__init_one virtio/net.c:846:2 (lkvm+0x004b095d)
> #3 virtio_net__init virtio/net.c:868:3 (lkvm+0x004b0296)
> #4 init_list__init util/init.c:40:8 (lkvm+0x004bc7ee)
> #5 kvm_cmd_run_init builtin-run.c:621:6 (lkvm+0x004a6799)
> #6 kvm_cmd_run builtin-run.c:656 (lkvm+0x004a6799)
> #7 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #8 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #9 main main.c:18 (lkvm+0x004ac0b4)
>
>   Thread T55 'kvm-vcpu-2' (tid=109285, running) created by main thread at:
> #0 pthread_create
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848
> (lkvm+0x004478a3)
> #1 kvm_cmd_run_work builtin-run.c:633:7 (lkvm+0x004a683f)
> #2 kvm_cmd_run builtin-run.c:660 (lkvm+0x004a683f)
> #3 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #4 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #5 main main.c:18 (lkvm+0x004ac0b4)
>
>   Thread T58 'virtio-net-tx' (tid=109334, running) created by thread T53 at:
> #0 pthread_create
> /usr/local/google/home/dvyukov/src/llvm/build/../projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848
> (lkvm+0x004478a3)
> #1 init_vq virtio/net.c:526:4 (lkvm+0x004b1523)
> #2 virtio_pci__io_out virtio/pci.c:219:3 (lkvm+0x004b484c)
> #3 kvm__emulate_io ioport.c:196:11 (lkvm+0x004aa0f8)
> #4 virtio_pci__io_mmio_callback virtio/pci.c:340:2 (lkvm+0x004b3e55)
> #5 kvm__emulate_mmio mmio.c:131:3 (lkvm+0x004ac332)
> #6 kvm_cpu__emulate_mmio x86/include/kvm/kvm-cpu-arch.h:46:9
> (lkvm+0x004aa8c6)
> #7 kvm_cpu__start kvm-cpu.c:147 (lkvm+0x004aa8c6)
> #8 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)
>
>
> and mutex unlock in a wrong thread (not su

Network hangs when communicating with host

2015-10-15 Thread Dmitry Vyukov
Hello,

I am trying to run a program in lkvm sandbox so that it communicates
with a program on host. I run lkvm as:

./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
/arch/x86/boot/bzImage --network mode=user -- /my_prog

/my_prog then connects to a program on host over a tcp socket.
I see that host receives some data, sends some data back, but then
my_prog hangs on network read.

To localize this I wrote 2 programs (attached). ping is run on host
and pong is run from lkvm sandbox. They successfully establish tcp
connection, but after some iterations both hang on read.

Networking code in Go runtime is there for more than 3 years, widely
used in production and does not have any known bugs. However, it uses
epoll edge-triggered readiness notifications that known to be tricky.
Is it possible that lkvm contains some networking bug? Can it be
related to the data races in lkvm I reported earlier today?

I am on commit 3695adeb227813d96d9c41850703fb53a23845eb.

Thank you
package main

import (
	"log"
	"net"
)

func main() {
	c, err := net.Dial("tcp", "192.168.33.1:39921")
	if err != nil {
		log.Fatalf("failed to dial: %v", err)
	}
	log.Printf("connected")
	for {
		var buf [1]byte
		n, err := c.Write(buf[:])
		if err != nil || n != 1 {
			log.Fatalf("failed to write: %v", err)
		}
		log.Printf("write")
		n, err = c.Read(buf[:])
		if err != nil || n != 1 {
			log.Fatalf("failed to read: %v", err)
		}
		log.Printf("read")
	}
}
package main

import (
	"log"
	"net"
)

func main() {
	ln, err := net.Listen("tcp", "127.0.0.1:39921")
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}
	log.Printf("listening")
	c, err := ln.Accept()
	if err != nil {
		log.Fatalf("failed to accept: %v", err)
	}
	log.Printf("connected")
	for {
		var buf [1]byte
		n, err := c.Read(buf[:])
		if err != nil || n != 1 {
			log.Fatalf("failed to read: %v", err)
		}
		log.Printf("read")
		n, err = c.Write(buf[:])
		if err != nil || n != 1 {
			log.Fatalf("failed to write: %v", err)
		}
		log.Printf("write")
	}
}