Re: [Qemu-devel] [RFC 0/2] Reduce the VM downtime about 300us

2015-08-25 Thread Marcin Gibuła
W dniu 2015-08-25 o 07:52, Liang Li pisze: This patch is for kvm live migration optimization, it fixes the issue which commit 317b0a6d8ba tries to fix in another way, and it can reduce the live migration VM downtime about 300us. *This patch is not tested for the issue commit 317b0a6d8ba tries

Re: [Qemu-devel] about the patch kvmclock Ensure proper env-tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła
W dniu 2015-08-14 o 03:23, Li, Liang Z pisze: On Thu, Aug 13, 2015 at 01:25:29AM +, Li, Liang Z wrote: Hi Paolo Marcelo, Could please point out what issue the patch 317b0a6d8ba44e try to fix? I found in live migration the cpu_synchronize_all_states will be called twice, and it will

Re: [Qemu-devel] about the patch kvmclock Ensure proper env-tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła
Thanks for your reply, I have read the thread in your email, what's the mean of 'switching from old to new disk', could give a detail description? The test case was like that (using libvirt): 1. Get VM running (linux, using kvmclock), 2. Use blockcopy to copy disk data from one location

Re: [Qemu-devel] about the patch kvmclock Ensure proper env-tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła
So, the problem is cause by stop_vm(RUN_STATE_PAUSED), in this case the env-tsc is not updated, which lead to the issue. Is that right? I think so. If the cpu_clean_all_dirty() is needed just for the APIC status reason, I think we can do the cpu_synchronize_all_states() in do_vm_stop and

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-10-10 Thread Marcin Gibuła
Does anybody know why the APIC state loaded by the first call to kvm_arch_get_registers() is wrong, in the first place? What exactly is different in the APIC state in the second kvm_arch_get_registers() call, and when/why does it change? If cpu_synchronize_state() does the wrong thing if it is

Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls

2014-08-05 Thread Marcin Gibuła
Gibuła m.gib...@beyond.pl Reported-by: Marcin Gibuła m.gib...@beyond.pl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com Still hangs... Backtrace still looks like this: Thread 1 (Thread 0x7f3d5313a900 (LWP 17440)): #0 0x7f3d4f38f286 in ppoll () from /lib64/libc.so.6 #1 0x7f3d5347465b

Re: [Qemu-devel] [PATCH v2 0/2] thread-pool: avoid fd usage and fix nested aio_poll() deadlock

2014-08-05 Thread Marcin Gibuła
On 15.07.2014 17:17, Paolo Bonzini wrote: Il 15/07/2014 16:44, Stefan Hajnoczi ha scritto: v2: * Leave BH scheduled so that the code can be simplified [Paolo] These patches convert thread-pool.c from EventNotifier to QEMUBH. They then solve the deadlock when nested aio_poll() calls are made.

Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls

2014-08-05 Thread Marcin Gibuła
Gibuła m.gib...@beyond.pl Reported-by: Marcin Gibuła m.gib...@beyond.pl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com This patch fixes the block-commit hang when using linux-aio, so: Tested-by: Marcin Gibuła m.gib...@beyond.pl -- mg

Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls

2014-08-04 Thread Marcin Gibuła
@canonical.com Cc: Marcin Gibuła m.gib...@beyond.pl Reported-by: Marcin Gibuła m.gib...@beyond.pl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com I'll test it tomorrow. -- mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-08-04 Thread Marcin Gibuła
W dniu 2014-07-31 13:27, Marcin Gibuła pisze: Can you dump *env before and after the call to kvm_arch_get_registers? Yes, but it seems they are equal - I used memcmp() to compare them. Is there any other side effect that cpu_synchronize_all_states() may have? I think I found it. The reason

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-31 Thread Marcin Gibuła
Can you dump *env before and after the call to kvm_arch_get_registers? Yes, but it seems they are equal - I used memcmp() to compare them. Is there any other side effect that cpu_synchronize_all_states() may have? I think I found it. The reason for hang is, because when second call to

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-30 Thread Marcin Gibuła
On 29.07.2014 18:58, Paolo Bonzini wrote: Il 18/07/2014 10:48, Paolo Bonzini ha scritto: It is easy to find out if the fix is related to 1 or 2/3: just write if (cpu-kvm_vcpu_dirty) { printf (do_kvm_cpu_synchronize_state_always: look at 2/3\n);

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-30 Thread Marcin Gibuła
W dniu 2014-07-30 15:38, Paolo Bonzini pisze: Il 30/07/2014 14:02, Marcin Gibuła ha scritto: without it: s/without/with/ of course... called do_kvm_cpu_synchronize_state_always called do_kvm_cpu_synchronize_state_always called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła
could you try attached patch? It's an incredibly ugly workaround that calls cpu_synchronize_all_states() in a way that bypasses lazy execution logic. But it works for me. If that works for you as well, its somehow related to lazy execution of cpu_synchronize_all_states. -- mg Yes, it working

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła
Does it fix problem with libvirt migration timing out for you as well? Oh, forgot to mention - yes, all migration-related problems are fixed. Though release right now in a freeze phase, I`d like to ask maintainers to consider possibility of fixing the problem on top of the current tree instead

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła
The name of the hack^Wfunction is tricky, because compared to do_kvm_cpu_synchronize_state there are three things you change: 1) you always synchronize the state 2) the next call to do_kvm_cpu_synchronize_state will do kvm_arch_get_registers Yes. 3) the next CPU entry will call

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła
W dniu 2014-07-18 11:37, Paolo Bonzini pisze: Il 18/07/2014 11:32, Marcin Gibuła ha scritto: 3) the next CPU entry will call kvm_arch_put_registers: if (cpu-kvm_vcpu_dirty) { kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE); cpu-kvm_vcpu_dirty = false

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła
Yes, exactly. ISCSI-based setup can take some minutes to deploy, given prepared image, and I have one hundred percent hit rate for the original issue with it. I've reproduced your IO hang with 2.0 and both 9b1786829aefb83f37a8f3135e3ea91c56001b56 and a096b3a6732f846ec57dc28b47ee9435aa0609bf

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła
2.1-rc2 behaves exactly the same. Interestingly enough, reseting guest system causes I/O to work again. So it's not qemu that hangs on IO, rather it fails to notify guest about completed operations that were issued during migration. And its somehow caused by calling cpu_synchronize_all_states()

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła
W dniu 2014-07-17 21:18, Dr. David Alan Gilbert pisze: I don't know if this is the same case, but Gerd showed me a migration failure that might be related. 2.0 seems OK, 2.1-rc0 is broken (and I've not found another working point in between yet). The test case involves booting a fedora livecd

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcin Gibuła
Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. Marcelo, the

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcin Gibuła
Tested on iscsi pool, though there are no-cache requirement and rbd with disabled cache may survive one migration but iscsi backend hangs always. As it was before, just rolling back problematic commit fixes the problem and adding cpu_synchronize_all_states to migration.c has no difference at a

Re: [Qemu-devel] [PATCH uq/master] kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec calculation

2014-07-15 Thread Marcin Gibuła
@@ -65,6 +66,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time)); +assert(time.tsc_timestamp = migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift 0) { delta =

Re: [Qemu-devel] [PATCH v2 0/2] thread-pool: avoid fd usage and fix nested aio_poll() deadlock

2014-07-15 Thread Marcin Gibuła
W dniu 2014-07-15 17:17, Paolo Bonzini pisze: Il 15/07/2014 16:44, Stefan Hajnoczi ha scritto: v2: * Leave BH scheduled so that the code can be simplified [Paolo] These patches convert thread-pool.c from EventNotifier to QEMUBH. They then solve the deadlock when nested aio_poll() calls are

Re: [Qemu-devel] [PATCH v2] thread-pool: fix deadlock when callbacks depends on each other

2014-06-04 Thread Marcin Gibuła
On 04.06.2014 12:01, Stefan Hajnoczi wrote: On Mon, Jun 02, 2014 at 09:15:27AM +0200, Marcin Gibuła wrote: When two coroutines submit I/O and first coroutine depends on second to complete (by calling bdrv_drain_all), deadlock may occur. bdrv_drain_all() is a very heavy-weight operation

Re: [Qemu-devel] [PATCH v2] kvmclock: Ensure time in migration never goes backward

2014-06-03 Thread Marcin Gibuła
Can you give this patch a try? Should read the guest TSC values after stopping the VM. Yes, this patch fixes that. Thanks, -- mg

[Qemu-devel] [PATCH v2] thread-pool: fix deadlock when callbacks depends on each other

2014-06-02 Thread Marcin Gibuła
When two coroutines submit I/O and first coroutine depends on second to complete (by calling bdrv_drain_all), deadlock may occur. This is because both requests may have completed before thread pool notifier got called. Then, when notifier gets executed and first coroutine calls aio_pool() to

Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-02 Thread Marcin Gibuła
I'll test it tomorrow. I assume you want to avoid calling event_notifier_set() until function is reentered via aio_pool? Yes. But actually, I need to check if it's possible to fix bdrv_drain_all. If you're in coroutine context, you can defer the draining to a safe point using a bottom half.

Re: [Qemu-devel] [PATCH v2] kvmclock: Ensure time in migration never goes backward

2014-06-02 Thread Marcin Gibuła
+cpu_physical_memory_read(kvmclock_struct_pa, time, sizeof(time)); + +delta = migration_tsc - time.tsc_timestamp; Hi, when I was testing live storage migration with libvirt I found out that this patch can cause virtual machine to hang when completing mirror job. This is (probably)

Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-01 Thread Marcin Gibuła
Good catch! The main problem with the patch is that you need to use atomic_inc/atomic_dec to increment and decrement pool-pending_completions. Ok. Secondarily, event_notifier_set is pretty heavy-weight, does it work if you wrap the loop like this? restart: QLIST_FOREACH_SAFE(elem,

[Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-05-31 Thread Marcin Gibuła
When two coroutines submit I/O and first coroutine depends on second to complete (by calling bdrv_drain_all), deadlock may occur. This is because both requests may have completed before thread pool notifier got called. Then, when notifier gets executed and first coroutine calls aio_pool() to

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-30 Thread Marcin Gibuła
1. Debug bdrv_drain_all() and find out whether there are any I/O requests remaining. I believe that's what happens: Context 1: - commit_one_iteration makes write request (req A) - request A is handled to io thread, qemu_coroutine_yield() is called Context 2: - VM makes write request (req

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-29 Thread Marcin Gibuła
Please try disabling I/O limits on the drive and try again. Is there anything else I could try? I've captured trace of hanged VM with following events traced: bdrv_* paio_* thread_pool_* commit_* qcow2_* and debug code that prints requests from traced_requests in bdrv_requests_pending

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Marcin Gibuła
What happens if you omit #7 virDomainGetBlockJobInfo()? Does it still hang 1/10 times? Yes, it still hangs. Can you post the QEMU command-line so we know the precise VM configuration? (ps aux | grep qemu) /usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Marcin Gibuła
/usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536 -realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -no-user-config -nodefaults -chardev

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-26 Thread Marcin Gibuła
Two options for making progress on this bug: 1. Debug bdrv_drain_all() and find out whether there are any I/O requests remaining. Yes, there is one request pending on active layer of disk that is being commited (on bs-tracked_requests list). IO threads die off because they have nothing

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
On 23.05.2014 10:19, Paolo Bonzini wrote: Il 22/05/2014 23:05, Marcin Gibuła ha scritto: Some more info. VM was doing lot of write IO during this test. QEMU is waiting for librados to complete I/O. Can you reproduce it with a different driver? I'll try. However RBD is used only as read

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
On 23.05.2014 10:19, Paolo Bonzini wrote: Il 22/05/2014 23:05, Marcin Gibuła ha scritto: Some more info. VM was doing lot of write IO during this test. QEMU is waiting for librados to complete I/O. Can you reproduce it with a different driver? Hi, I've reproduced it without RBD. Backtrace

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
I see that you have a mix of aio=native and aio=threads. I can't say much about the aio=native disks (perhaps try to reproduce without them?), but there are definitely no worker threads for the other disks that bdrv_drain_all() would have to wait for. True. But I/O was being done only qcow2

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the function that determines for each of the disks in your VM if it still has requests in flight that need to be completed. This function must have returned true even though there is nothing to wait for. Can you check which of

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
The condition that is true is: if (!QLIST_EMPTY(bs-tracked_requests)) and it's returned for intermediate qcow2 which is being commited. Btw - it's also disk that is being pounded with writes during commit. -- mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
If you see a pending request on a RADOS block device (rbd) then it would be good to dig deeper into QEMU's block/rbd.c driver to see why it's not completing that request. Are you using qcow2 on top of rbd? Hi, I've already recreated this without rbd and with stock qemu 2.0. -- mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
W dniu 2014-05-23 15:14, Marcin Gibuła pisze: bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the function that determines for each of the disks in your VM if it still has requests in flight that need to be completed. This function must have returned true even though

[Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła
Hi, I've encountered deadlock in qemu during some stress testing. The test is making snapshots, committing them and constantly quering for block job info. The version of QEMU is 2.0.0 rc3 (backtrace below says rc2, but it's manualy patched to rc3), but there seems to be no changes in block

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła
W dniu 2014-05-22 22:49, Marcin Gibuła pisze: Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)): #0 0x7f6998020286 in ppoll () from /lib64/libc.so.6 #1 0x7f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0, __nfds=optimized out, __fds=optimized out) at /usr/include/bits/poll2.h:77 #2

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła
I've encountered deadlock in qemu during some stress testing. The test is making snapshots, committing them and constantly quering for block job info. What is the exact command you used for triggering the block-commit? Was it via direct HMP or QMP, or indirect via libvirt? Via libvirt.

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-06 Thread Marcin Gibuła
What is the host clocksource? (cat /sys/devices/system/clocksource/clocksource0/current_clocksource). tsc And kernel version? 3.12.17 But I've seen this problem on earlier versions as well (3.8, 3.10). -- mg

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-06 Thread Marcin Gibuła
Yes, and it isn't. Any ideas why it's not? This patch really just uses the guest visible kvmclock time rather than the host view of it on migration. There is definitely something very broken on the host's side since it does return a smaller time than the guest exposed interface indicates.

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcin Gibuła
W dniu 2014-05-05 15:51, Alexander Graf pisze: When we migrate we ask the kernel about its current belief on what the guest time would be. However, I've seen cases where the kvmclock guest structure indicates a time more recent than the kvm returned time. Hi, is it possible to have kvmclock

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcin Gibuła
is it possible to have kvmclock jumping forward? Because I've reproducible case when at about 1 per 20 vm restores, VM freezes for couple of hours and then resumes with date few hundreds years ahead. Happens only with kvmclock. And this patch seems to fix very similar issue so maybe it's all

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-15 Thread Marcin Gibuła
W dniu 2014-04-15 20:53, Dr. David Alan Gilbert pisze: * Marcus (shadow...@gmail.com) wrote: I can answer some of the questions. It's been 3 months or so since I looked into it. I ended up disabling kvmclock from the qemu command line and moving on. I saw it with CentOS 6.5 and Ubuntu 12.04

[Qemu-devel] qemu 2.0.0-rc2 crash

2014-04-10 Thread Marcin Gibuła
Hi, I've been playing with QEMU 2.0-rc2 and found a crash that isn't there in 1.7.1. Virtual machine is created via libvirt and when I query it with 'dommemstat' it crashes with following backtrace: Program received signal SIGSEGV, Segmentation fault. 0x7f5883655c0a in

Re: [Qemu-devel] qemu 2.0.0-rc2 crash

2014-04-10 Thread Marcin Gibuła
W dniu 2014-04-10 15:43, Marcel Apfelbaum pisze: On Thu, 2014-04-10 at 14:55 +0200, Marcin Gibuła wrote: Hi, I've been playing with QEMU 2.0-rc2 and found a crash that isn't there in 1.7.1. Hi Marcin, Thanks for reporting the bug! Do you have a development environment? If you do

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła
It's looking good so far, after a few migrations (it takes a while to test because I'm waiting at least 5 hours between migrations). I'll be happier once I've done a couple of weeks of this without any failures! Does anyone have any hints how to debug this thing? :( I've tried to put hanged

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła
Can you give: 1) A backtrace from the guest thread apply all bt full in gdb You mean from gdb attached to hanged guest? I'll try to get it. From what I remember it looks rather normal - busy executing guest code. 2) What's the earliest/newest qemu versions you've seen this

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła
On 02.04.2014 11:39, Dr. David Alan Gilbert wrote: * Marcin Gibu??a (m.gib...@beyond.pl) wrote: Can you give: 1) A backtrace from the guest thread apply all bt full in gdb You mean from gdb attached to hanged guest? I'll try to get it. From what I remember it looks rather

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła
Yes, that's where it gets weird. I've never seen this on fresh VM. It needs to be idle for couple of hours at least. And even then it doesn't always hang. So your OS is just sitting at a text console, running nothing special? When you reboot after the migration what's the last thing you see in

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-03-31 Thread Marcin Gibuła
I've seen very similar problem on our installation. Have you tried to run with kvm-clock explicitly disabled (either via no-kvmclock in guest kernel or with -kvm-clock in qemu) ? No, I haven't tried it yet (I've confirmed kvm-clock is currently being used). I'll have a look at it. Did it help

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-03-27 Thread Marcin Gibuła
W dniu 2014-03-27 23:52, Chris Dunlop pisze: Hi, I have a problem where I migrate a linux guest VM, and on the receiving side the guest goes to 100% cpu as seen by the host, and the guest itself is unresponsive, e.g. not responding to ping etc. The only way out I've found is to destroy the

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Marcin Gibuła
do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple

Re: [Qemu-devel] migration question: disk images on nfs server

2014-02-07 Thread Marcin Gibuła
For NFS you need to use the sync mount option to force the NFS client to sync to server on writes. Isn't opening with O_DIRECT enough? (for linux nfs client at least) -- mg

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Marcin Gibuła
You mean to reproduce? I'm more interested in seeing what type of corruption is happening; if you've got a test VM that corrupts memory and we can run a program in that vm that writes a known pattern into memory and checks it then see what changed after migration, it might give a clue. But

Re: [Qemu-devel] migration question: disk images on nfs server

2014-02-07 Thread Marcin Gibuła
It is more a NFS issue, if you have a file in NFS that two users in two different host are accessing (one at least write to it) you will need to enforce the sync option. Even if you flush all the data and close the file the NFS client can still have cached data that it didn't sync to the server.

Re: [Qemu-devel] migration question: disk images on nfs server

2014-02-07 Thread Marcin Gibuła
On 07.02.2014 14:36, Orit Wasserman wrote: Do you know if is applies to linux O_DIRECT writes as well? From the man of open: The behaviour of O_DIRECT with NFS will differ from local filesystems. Older kernels, or kernels configured in certain ways, may not support

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-06 Thread Marcin Gibuła
On 06.02.2014 15:03, Stefan Priebe - Profihost AG wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in

Re: [Qemu-devel] troubleshooting live migration

2014-01-16 Thread Marcin Gibuła
I tried -no-hpet, was still able to replicate the 'lapic' issue. I find it interesting that I can only trigger it if the vm has been running awhile. Hi, I've seen identical crashes with live migration in our environment. It looks identical - VM has to be idle for some time and after