Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe (s.pri...@profihost.ag) wrote: Am 13.02.2014 21:06, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan can you give me some more info on your hardware and migration setup; that stressapptest (which is a really nice find!) really batters the memory and it means the migration isn't converging for me, so I'm curious what your setup is. That one is devlopment by google and known to me since a few years. Google has detected that memtest and co are not good enough to stress test memory. Hi Stefan, I've just posted a patch to qemu-devel that fixes two bugs that we found; I've only tried a small stressapptest run and it seems to survive with them (where it didn't before); you might like to try it if you're up for rebuilding qemu. It's the one entitled ' [PATCH] Fix two XBZRLE corruption issues' I'll try and get a larger run done myself, but I'd be interested to hear if it fixes it for you (or anyone else who hit the problem). Yes works fine - now no crash but it's sower than without XBZRLE ;-) Without XBZRLE: i needed migrate_downtime 4 around 60s With XBZRLE: i needed migrate_downtime 16 and 240s Hmm; how did that compare with the previous (broken) with XBZRLE time? (i.e. was XBZRLE always slower for you?) If you're driving this from the hmp/command interface then the result of the info migrate command at the end of each of those runs would be interesting. Another thing you could try is changing the xbzrle_cache_zero_page in arch_init.c that I added so it reads as: static void xbzrle_cache_zero_page(ram_addr_t current_addr) { if (ram_bulk_stage || !migrate_use_xbzrle()) { return; } if (!cache_is_cached(XBZRLE.cache, current_addr)) { return; } /* We don't care if this fails to allocate a new cache page * as long as it updated an old one */ cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE); } Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe (s.pri...@profihost.ag) wrote: Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan can you give me some more info on your hardware and migration setup; that stressapptest (which is a really nice find!) really batters the memory and it means the migration isn't converging for me, so I'm curious what your setup is. That one is devlopment by google and known to me since a few years. Google has detected that memtest and co are not good enough to stress test memory. Hi Stefan, I've just posted a patch to qemu-devel that fixes two bugs that we found; I've only tried a small stressapptest run and it seems to survive with them (where it didn't before); you might like to try it if you're up for rebuilding qemu. It's the one entitled ' [PATCH] Fix two XBZRLE corruption issues' I'll try and get a larger run done myself, but I'd be interested to hear if it fixes it for you (or anyone else who hit the problem). Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 13.02.2014 21:06, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan can you give me some more info on your hardware and migration setup; that stressapptest (which is a really nice find!) really batters the memory and it means the migration isn't converging for me, so I'm curious what your setup is. That one is devlopment by google and known to me since a few years. Google has detected that memtest and co are not good enough to stress test memory. Hi Stefan, I've just posted a patch to qemu-devel that fixes two bugs that we found; I've only tried a small stressapptest run and it seems to survive with them (where it didn't before); you might like to try it if you're up for rebuilding qemu. It's the one entitled ' [PATCH] Fix two XBZRLE corruption issues' Thanks! Really would love to try them but nor google nor myself can find them. http://osdir.com/ml/qemu-devel/2014-02/ Stefan I'll try and get a larger run done myself, but I'd be interested to hear if it fixes it for you (or anyone else who hit the problem). Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
got it here: http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg02341.html will try asap Am 13.02.2014 21:06, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan can you give me some more info on your hardware and migration setup; that stressapptest (which is a really nice find!) really batters the memory and it means the migration isn't converging for me, so I'm curious what your setup is. That one is devlopment by google and known to me since a few years. Google has detected that memtest and co are not good enough to stress test memory. Hi Stefan, I've just posted a patch to qemu-devel that fixes two bugs that we found; I've only tried a small stressapptest run and it seems to survive with them (where it didn't before); you might like to try it if you're up for rebuilding qemu. It's the one entitled ' [PATCH] Fix two XBZRLE corruption issues' I'll try and get a larger run done myself, but I'd be interested to hear if it fixes it for you (or anyone else who hit the problem). Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 13.02.2014 21:06, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan can you give me some more info on your hardware and migration setup; that stressapptest (which is a really nice find!) really batters the memory and it means the migration isn't converging for me, so I'm curious what your setup is. That one is devlopment by google and known to me since a few years. Google has detected that memtest and co are not good enough to stress test memory. Hi Stefan, I've just posted a patch to qemu-devel that fixes two bugs that we found; I've only tried a small stressapptest run and it seems to survive with them (where it didn't before); you might like to try it if you're up for rebuilding qemu. It's the one entitled ' [PATCH] Fix two XBZRLE corruption issues' I'll try and get a larger run done myself, but I'd be interested to hear if it fixes it for you (or anyone else who hit the problem). Yes works fine - now no crash but it's sower than without XBZRLE ;-) Without XBZRLE: i needed migrate_downtime 4 around 60s With XBZRLE: i needed migrate_downtime 16 and 240s Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
On 02/08/2014 09:23 PM, Stefan Priebe wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. XBZRLE is disabled by default (actually all capabilities are off by default) What version of QEMU are you using that you need to disable it explicitly? Maybe you run migration with XBZRLE and canceled it, so it stays on? Orit Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 11.02.2014 14:32, schrieb Orit Wasserman: On 02/08/2014 09:23 PM, Stefan Priebe wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. XBZRLE is disabled by default (actually all capabilities are off by default) What version of QEMU are you using that you need to disable it explicitly? Maybe you run migration with XBZRLE and canceled it, so it stays on? No real idea why this happens - but yes this seems to be a problem for me. But the bug in XBZRLE is still there ;-) Stefan Orit Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
On 02/11/2014 03:33 PM, Stefan Priebe - Profihost AG wrote: Am 11.02.2014 14:32, schrieb Orit Wasserman: On 02/08/2014 09:23 PM, Stefan Priebe wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. XBZRLE is disabled by default (actually all capabilities are off by default) What version of QEMU are you using that you need to disable it explicitly? Maybe you run migration with XBZRLE and canceled it, so it stays on? No real idea why this happens - but yes this seems to be a problem for me. I checked upstream QEMU and it is still off by default (always been) But the bug in XBZRLE is still there ;-) We need to understand the exact scenario in order to understand the problem. What exact version of Qemu are you using? Can you try with the latest upstream version, there were some fixes to the XBZRLE code? Stefan Orit Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 11.02.2014 14:45, schrieb Orit Wasserman: On 02/11/2014 03:33 PM, Stefan Priebe - Profihost AG wrote: Am 11.02.2014 14:32, schrieb Orit Wasserman: On 02/08/2014 09:23 PM, Stefan Priebe wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. XBZRLE is disabled by default (actually all capabilities are off by default) What version of QEMU are you using that you need to disable it explicitly? Maybe you run migration with XBZRLE and canceled it, so it stays on? No real idea why this happens - but yes this seems to be a problem for me. I checked upstream QEMU and it is still off by default (always been) May be i had it on in the past and the VM was still running from an older migration. But the bug in XBZRLE is still there ;-) We need to understand the exact scenario in order to understand the problem. What exact version of Qemu are you using? Qemu 1.7.0 Can you try with the latest upstream version, there were some fixes to the XBZRLE code? Sadly not - i have some custom patches (not related to xbzrle) which won't apply to current upstream. But i could cherry-pick the ones you have in mind - if you give me the commit ids. Stefan Stefan Orit Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Ah right, yes that would make sense for the type of errors you're seeing, and does make it easier to tie down. Dave Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan can you give me some more info on your hardware and migration setup; that stressapptest (which is a really nice find!) really batters the memory and it means the migration isn't converging for me, so I'm curious what your setup is. What CPU have you got? How many cores are you giving each guest? What network technology are you migrating over? Other than xbzrle what else do you have enabled? How long is the migrate taking for you? Thanks, Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan can you give me some more info on your hardware and migration setup; that stressapptest (which is a really nice find!) really batters the memory and it means the migration isn't converging for me, so I'm curious what your setup is. That one is devlopment by google and known to me since a few years. Google has detected that memtest and co are not good enough to stress test memory. What CPU have you got? Dual Xeon E5-2695v2 How many cores are you giving each guest? 16 What network technology are you migrating over? 10Gb/s Other than xbzrle what else do you have enabled? nothing How long is the migrate taking for you? with migration_downtime = 4s around 10s Stefan Thanks, Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
do you use xbzrle for live migration ? - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Dr. David Alan Gilbert dgilb...@redhat.com Cc: Alexandre DERUMIER aderum...@odiso.com, qemu-devel qemu-devel@nongnu.org Envoyé: Jeudi 6 Février 2014 21:00:27 Objet: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hi, Am 06.02.2014 20:51, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] OK, so lets just assume some part of memory (or CPU state, or memory loaded off disk...) You said before that it was happening on a 32GB image - is it *only* happening on a 32GB or bigger VM, or is it just more likely? Not image, memory. I've only seen this with vms having more than 16GB or 32GB memory. But maybe this also indicates that just the migration takes longer. I think you also said you were using 1.7; have you tried an older version - i.e. is this a regression in 1.7 or don't we know? Don't know. Sadly i cannot reproduce this with test VMs only with production ones. Stefan Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 07.02.2014 09:15, schrieb Alexandre DERUMIER: do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Stefan - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Dr. David Alan Gilbert dgilb...@redhat.com Cc: Alexandre DERUMIER aderum...@odiso.com, qemu-devel qemu-devel@nongnu.org Envoyé: Jeudi 6 Février 2014 21:00:27 Objet: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hi, Am 06.02.2014 20:51, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] OK, so lets just assume some part of memory (or CPU state, or memory loaded off disk...) You said before that it was happening on a 32GB image - is it *only* happening on a 32GB or bigger VM, or is it just more likely? Not image, memory. I've only seen this with vms having more than 16GB or 32GB memory. But maybe this also indicates that just the migration takes longer. I think you also said you were using 1.7; have you tried an older version - i.e. is this a regression in 1.7 or don't we know? Don't know. Sadly i cannot reproduce this with test VMs only with production ones. Stefan Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Am 07.02.2014 09:15, schrieb Alexandre DERUMIER: do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple (userland) memory test? Dave Stefan - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Dr. David Alan Gilbert dgilb...@redhat.com Cc: Alexandre DERUMIER aderum...@odiso.com, qemu-devel qemu-devel@nongnu.org Envoyé: Jeudi 6 Février 2014 21:00:27 Objet: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hi, Am 06.02.2014 20:51, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] OK, so lets just assume some part of memory (or CPU state, or memory loaded off disk...) You said before that it was happening on a 32GB image - is it *only* happening on a 32GB or bigger VM, or is it just more likely? Not image, memory. I've only seen this with vms having more than 16GB or 32GB memory. But maybe this also indicates that just the migration takes longer. I think you also said you were using 1.7; have you tried an older version - i.e. is this a regression in 1.7 or don't we know? Don't know. Sadly i cannot reproduce this with test VMs only with production ones. Stefan Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 07.02.2014 10:15, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Am 07.02.2014 09:15, schrieb Alexandre DERUMIER: do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple (userland) memory test? You mean to reproduce? I already tried https://code.google.com/p/stressapptest/ while migrating on a test VM but this works fine. I also tried running mysql bench while migrating on a test vm and this works too ;-( Stefan Dave Stefan - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Dr. David Alan Gilbert dgilb...@redhat.com Cc: Alexandre DERUMIER aderum...@odiso.com, qemu-devel qemu-devel@nongnu.org Envoyé: Jeudi 6 Février 2014 21:00:27 Objet: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hi, Am 06.02.2014 20:51, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] OK, so lets just assume some part of memory (or CPU state, or memory loaded off disk...) You said before that it was happening on a 32GB image - is it *only* happening on a 32GB or bigger VM, or is it just more likely? Not image, memory. I've only seen this with vms having more than 16GB or 32GB memory. But maybe this also indicates that just the migration takes longer. I think you also said you were using 1.7; have you tried an older version - i.e. is this a regression in 1.7 or don't we know? Don't know. Sadly i cannot reproduce this with test VMs only with production ones. Stefan Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple (userland) memory test? You mean to reproduce? I already tried https://code.google.com/p/stressapptest/ while migrating on a test VM but this works fine. I also tried running mysql bench while migrating on a test vm and this works too ;-( Have you tried to let test VM run idle for some time before migrating? (like 18-24 hours) Having the same (or very similar) problem, I had bigger luck with reproducing it by not using freshly started VMs. -- mg
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, Am 07.02.2014 10:29, schrieb Marcin Gibuła: do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple (userland) memory test? You mean to reproduce? I already tried https://code.google.com/p/stressapptest/ while migrating on a test VM but this works fine. I also tried running mysql bench while migrating on a test vm and this works too ;-( Have you tried to let test VM run idle for some time before migrating? (like 18-24 hours) Having the same (or very similar) problem, I had bigger luck with reproducing it by not using freshly started VMs. no i haven't tried this will do so soon. Stefan
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Am 07.02.2014 10:15, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Am 07.02.2014 09:15, schrieb Alexandre DERUMIER: do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple (userland) memory test? You mean to reproduce? I'm more interested in seeing what type of corruption is happening; if you've got a test VM that corrupts memory and we can run a program in that vm that writes a known pattern into memory and checks it then see what changed after migration, it might give a clue. But obviously this would only be of any use if run on the VM that actually fails. I already tried https://code.google.com/p/stressapptest/ while migrating on a test VM but this works fine. I also tried running mysql bench while migrating on a test vm and this works too ;-( Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 07.02.2014 10:31, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Am 07.02.2014 10:15, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Am 07.02.2014 09:15, schrieb Alexandre DERUMIER: do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple (userland) memory test? You mean to reproduce? I'm more interested in seeing what type of corruption is happening; if you've got a test VM that corrupts memory and we can run a program in that vm that writes a known pattern into memory and checks it then see what changed after migration, it might give a clue. But obviously this would only be of any use if run on the VM that actually fails. Right that makes sense - sadly i still don't know how to reproduce? Any app ideas i can try? I already tried https://code.google.com/p/stressapptest/ while migrating on a test VM but this works fine. I also tried running mysql bench while migrating on a test vm and this works too ;-( Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
You mean to reproduce? I'm more interested in seeing what type of corruption is happening; if you've got a test VM that corrupts memory and we can run a program in that vm that writes a known pattern into memory and checks it then see what changed after migration, it might give a clue. But obviously this would only be of any use if run on the VM that actually fails. Hi, Seeing similar issue in my company I would be happy to run such tests. Do you have any test suite I could run or some leads how to write it? -- mg
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, i was able to reproduce with a longer running test VM running the google stress test. And it happens exacly when the migration finishes it does not happen while the migration is running. Google Stress Output displays Memory errors: Page Error: miscompare on CPU 5(0x) at 0x7f52431341c0(0x0:DIMM Unknown): read:0x004000bf, reread:0x004000bf expected:0x0040ffbf Report Error: miscompare : DIMM Unknown : 1 : 571s Page Error: miscompare on CPU 5(0x) at 0x7f52431341c8(0x0:DIMM Unknown): read:0x002000df, reread:0x002000df expected:0x0020ffdf Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34020(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34028(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34060(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34068(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34120(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34128(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34160(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34168(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34220(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34228(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34260(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34268(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c342a0(0x0:DIMM
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Hi, i was able to reproduce with a longer running test VM running the google stress test. Hmm that's quite a fun set of differences; I think I'd like to understand whether the pattern is related to the pattern of what the test is doing. Can you just give an explanation of exactly how you ran that test? What you installed, how exactly you ran it. Then Marcin and I can try and replicate it. Dave And it happens exacly when the migration finishes it does not happen while the migration is running. Google Stress Output displays Memory errors: Page Error: miscompare on CPU 5(0x) at 0x7f52431341c0(0x0:DIMM Unknown): read:0x004000bf, reread:0x004000bf expected:0x0040ffbf Report Error: miscompare : DIMM Unknown : 1 : 571s Page Error: miscompare on CPU 5(0x) at 0x7f52431341c8(0x0:DIMM Unknown): read:0x002000df, reread:0x002000df expected:0x0020ffdf Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34020(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34028(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34060(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34068(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34120(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34128(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34160(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34168(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34220(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34228(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, Am 07.02.2014 13:21, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Hi, i was able to reproduce with a longer running test VM running the google stress test. Hmm that's quite a fun set of differences; I think I'd like to understand whether the pattern is related to the pattern of what the test is doing. Can you just give an explanation of exactly how you ran that test? What you installed, how exactly you ran it. While migrating i've still no reliable way to reproduce but i'll try to. I can force the problem without migration when start with: bin/stressapptest -s 3600 -m 20 -i 20 -C 20 --force_errors = inject false errors to test error handling Stefan Then Marcin and I can try and replicate it. Dave And it happens exacly when the migration finishes it does not happen while the migration is running. Google Stress Output displays Memory errors: Page Error: miscompare on CPU 5(0x) at 0x7f52431341c0(0x0:DIMM Unknown): read:0x004000bf, reread:0x004000bf expected:0x0040ffbf Report Error: miscompare : DIMM Unknown : 1 : 571s Page Error: miscompare on CPU 5(0x) at 0x7f52431341c8(0x0:DIMM Unknown): read:0x002000df, reread:0x002000df expected:0x0020ffdf Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34020(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34028(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34060(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34068(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34120(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34128(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34160(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34168(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e0(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e8(0x0:DIMM Unknown): read:0x, reread:0x expected:0x Report Error: miscompare : DIMM Unknown : 1 : 571s Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34220(0x0:DIMM Unknown): read:0x, reread:0x
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Il 07/02/2014 13:30, Stefan Priebe - Profihost AG ha scritto: i was able to reproduce with a longer running test VM running the google stress test. Hmm that's quite a fun set of differences; I think I'd like to understand whether the pattern is related to the pattern of what the test is doing. Stefan, can you try to reproduce it: - with Unix migration between two QEMUs on the same host - with different hosts - with a different network (e.g. just a cross cable between two machines) Paolo
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, Am 07.02.2014 13:44, schrieb Paolo Bonzini: Il 07/02/2014 13:30, Stefan Priebe - Profihost AG ha scritto: i was able to reproduce with a longer running test VM running the google stress test. Hmm that's quite a fun set of differences; I think I'd like to understand whether the pattern is related to the pattern of what the test is doing. Stefan, can you try to reproduce it: first of all i've now a memory image of a VM where i can reproduce it. reproducing does NOT work if i boot the VM freshly i need to let it run for some hours. Then just when the migration finishes there is a short time frame where the google stress app reports memory errors than when the migration finishes it runs fine again. It seems to me it is related to pause and unpause/resume? - with Unix migration between two QEMUs on the same host now tested = same issue - with different hosts already tested = same issue - with a different network (e.g. just a cross cable between two machines) already tested = same issue Greets, Stefan
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Hi, Am 07.02.2014 13:44, schrieb Paolo Bonzini: Il 07/02/2014 13:30, Stefan Priebe - Profihost AG ha scritto: i was able to reproduce with a longer running test VM running the google stress test. Hmm that's quite a fun set of differences; I think I'd like to understand whether the pattern is related to the pattern of what the test is doing. Stefan, can you try to reproduce it: first of all i've now a memory image of a VM where i can reproduce it. reproducing does NOT work if i boot the VM freshly i need to let it run for some hours. Then just when the migration finishes there is a short time frame where the google stress app reports memory errors than when the migration finishes it runs fine again. It seems to me it is related to pause and unpause/resume? But do you have to pause/resume it to cause the error? Have you got cases where you boot it and then leave it running for a few hours and then it fails if you migrate it? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, Am 07.02.2014 14:08, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Hi, Am 07.02.2014 13:44, schrieb Paolo Bonzini: Il 07/02/2014 13:30, Stefan Priebe - Profihost AG ha scritto: i was able to reproduce with a longer running test VM running the google stress test. Hmm that's quite a fun set of differences; I think I'd like to understand whether the pattern is related to the pattern of what the test is doing. Stefan, can you try to reproduce it: first of all i've now a memory image of a VM where i can reproduce it. reproducing does NOT work if i boot the VM freshly i need to let it run for some hours. Then just when the migration finishes there is a short time frame where the google stress app reports memory errors than when the migration finishes it runs fine again. It seems to me it is related to pause and unpause/resume? But do you have to pause/resume it to cause the error? Have you got cases where you boot it and then leave it running for a few hours and then it fails if you migrate it? Yes but isn't migration always a pause / unpause at the end? I thought migration_downtime is the value a very small pause unpause is allowed. Stefan
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Hi, Am 07.02.2014 14:08, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: first of all i've now a memory image of a VM where i can reproduce it. reproducing does NOT work if i boot the VM freshly i need to let it run for some hours. Then just when the migration finishes there is a short time frame where the google stress app reports memory errors than when the migration finishes it runs fine again. It seems to me it is related to pause and unpause/resume? But do you have to pause/resume it to cause the error? Have you got cases where you boot it and then leave it running for a few hours and then it fails if you migrate it? Yes but isn't migration always a pause / unpause at the end? I thought migration_downtime is the value a very small pause unpause is allowed. There's a heck of a lot of other stuff that goes on in migration, and that downtime isn't quite the same. If it can be reproduced with just suspend/resume stuff then that's a different place to start looking than if it's migration only. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, Am 07.02.2014 14:15, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: Hi, Am 07.02.2014 14:08, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: first of all i've now a memory image of a VM where i can reproduce it. reproducing does NOT work if i boot the VM freshly i need to let it run for some hours. Then just when the migration finishes there is a short time frame where the google stress app reports memory errors than when the migration finishes it runs fine again. It seems to me it is related to pause and unpause/resume? But do you have to pause/resume it to cause the error? Have you got cases where you boot it and then leave it running for a few hours and then it fails if you migrate it? Yes but isn't migration always a pause / unpause at the end? I thought migration_downtime is the value a very small pause unpause is allowed. There's a heck of a lot of other stuff that goes on in migration, and that downtime isn't quite the same. If it can be reproduced with just suspend/resume stuff then that's a different place to start looking than if it's migration only. ah OK now i got it. No i can't reproduce with suspend resume. But while migrating it happens directly at the end when the switch from host a to b happens. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? That sounds good enough. Can you upload the image somewhere (doesn't have to be a public place, you can contact David or others offlist)? reproducing does NOT work if i boot the VM freshly i need to let it run for some hours. Then just when the migration finishes there is a short time frame where the google stress app reports memory errors than when the migration finishes it runs fine again. It seems to me it is related to pause and unpause/resume? - with Unix migration between two QEMUs on the same host now tested = same issue - with different hosts already tested = same issue - with a different network (e.g. just a cross cable between two machines) already tested = same issue Another test: - start the VM with -S, migrate, do errors appear on the destination? Thanks, Paolo
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
anything i could try or debug? to help to find the problem? Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same pattern there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the new host. Stefan
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Do you force rbd_cache=true in ceph.conf? if yes, do you use cache=writeback ? according to ceph doc: http://ceph.com/docs/next/rbd/qemu-rbd/ Important If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Mercredi 5 Février 2014 18:51:15 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hello, after live migrating machines with a lot of memory (32GB, 48GB, ...) i see pretty often crashing services after migration and the guest kernel prints: [1707620.031806] swap_free: Bad swap file entry 00377410 [1707620.031806] swap_free: Bad swap file entry 00593c48 [1707620.031807] swap_free: Bad swap file entry 03201430 [1707620.031807] swap_free: Bad swap file entry 01bc5900 [1707620.031807] swap_free: Bad swap file entry 0173ce40 [1707620.031808] swap_free: Bad swap file entry 011c0270 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 val:1536 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 val:-1536 Qemu is 1.7 Does anybody know a fix? Greets, Stefan ___ pve-devel mailing list pve-de...@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Am 06.02.2014 12:14, schrieb Alexandre DERUMIER: Do you force rbd_cache=true in ceph.conf? no if yes, do you use cache=writeback ? yes So this should be safe. PS: all my guests do not even have !!SWAP!! # free|grep Swap Swap:0 0 0 Stefan according to ceph doc: http://ceph.com/docs/next/rbd/qemu-rbd/ Important If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Mercredi 5 Février 2014 18:51:15 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hello, after live migrating machines with a lot of memory (32GB, 48GB, ...) i see pretty often crashing services after migration and the guest kernel prints: [1707620.031806] swap_free: Bad swap file entry 00377410 [1707620.031806] swap_free: Bad swap file entry 00593c48 [1707620.031807] swap_free: Bad swap file entry 03201430 [1707620.031807] swap_free: Bad swap file entry 01bc5900 [1707620.031807] swap_free: Bad swap file entry 0173ce40 [1707620.031808] swap_free: Bad swap file entry 011c0270 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 val:1536 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 val:-1536 Qemu is 1.7 Does anybody know a fix? Greets, Stefan ___ pve-devel mailing list pve-de...@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
PS: all my guests do not even have !!SWAP!! Not sure is related to swap file. I found an similar problem here, triggered with suspend/resume on ext4 http://lkml.indiana.edu/hypermail/linux/kernel/1106.3/01340.html Maybe is it a guest kernel bug ? - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Jeudi 6 Février 2014 12:19:36 Objet: Re: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Am 06.02.2014 12:14, schrieb Alexandre DERUMIER: Do you force rbd_cache=true in ceph.conf? no if yes, do you use cache=writeback ? yes So this should be safe. PS: all my guests do not even have !!SWAP!! # free|grep Swap Swap: 0 0 0 Stefan according to ceph doc: http://ceph.com/docs/next/rbd/qemu-rbd/ Important If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Mercredi 5 Février 2014 18:51:15 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hello, after live migrating machines with a lot of memory (32GB, 48GB, ...) i see pretty often crashing services after migration and the guest kernel prints: [1707620.031806] swap_free: Bad swap file entry 00377410 [1707620.031806] swap_free: Bad swap file entry 00593c48 [1707620.031807] swap_free: Bad swap file entry 03201430 [1707620.031807] swap_free: Bad swap file entry 01bc5900 [1707620.031807] swap_free: Bad swap file entry 0173ce40 [1707620.031808] swap_free: Bad swap file entry 011c0270 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 val:1536 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 val:-1536 Qemu is 1.7 Does anybody know a fix? Greets, Stefan ___ pve-devel mailing list pve-de...@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
May be, sadly i've no idea. Only using 3.10 Kernel with XFS. Stefan Am 06.02.2014 12:40, schrieb Alexandre DERUMIER: PS: all my guests do not even have !!SWAP!! Not sure is related to swap file. I found an similar problem here, triggered with suspend/resume on ext4 http://lkml.indiana.edu/hypermail/linux/kernel/1106.3/01340.html Maybe is it a guest kernel bug ? - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Jeudi 6 Février 2014 12:19:36 Objet: Re: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Am 06.02.2014 12:14, schrieb Alexandre DERUMIER: Do you force rbd_cache=true in ceph.conf? no if yes, do you use cache=writeback ? yes So this should be safe. PS: all my guests do not even have !!SWAP!! # free|grep Swap Swap: 0 0 0 Stefan according to ceph doc: http://ceph.com/docs/next/rbd/qemu-rbd/ Important If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Mercredi 5 Février 2014 18:51:15 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hello, after live migrating machines with a lot of memory (32GB, 48GB, ...) i see pretty often crashing services after migration and the guest kernel prints: [1707620.031806] swap_free: Bad swap file entry 00377410 [1707620.031806] swap_free: Bad swap file entry 00593c48 [1707620.031807] swap_free: Bad swap file entry 03201430 [1707620.031807] swap_free: Bad swap file entry 01bc5900 [1707620.031807] swap_free: Bad swap file entry 0173ce40 [1707620.031808] swap_free: Bad swap file entry 011c0270 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 val:1536 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 val:-1536 Qemu is 1.7 Does anybody know a fix? Greets, Stefan ___ pve-devel mailing list pve-de...@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] Stefan Am 06.02.2014 13:10, schrieb Stefan Priebe - Profihost AG: May be, sadly i've no idea. Only using 3.10 Kernel with XFS. Stefan Am 06.02.2014 12:40, schrieb Alexandre DERUMIER: PS: all my guests do not even have !!SWAP!! Not sure is related to swap file. I found an similar problem here, triggered with suspend/resume on ext4 http://lkml.indiana.edu/hypermail/linux/kernel/1106.3/01340.html Maybe is it a guest kernel bug ? - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Jeudi 6 Février 2014 12:19:36 Objet: Re: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Am 06.02.2014 12:14, schrieb Alexandre DERUMIER: Do you force rbd_cache=true in ceph.conf? no if yes, do you use cache=writeback ? yes So this should be safe. PS: all my guests do not even have !!SWAP!! # free|grep Swap Swap: 0 0 0 Stefan according to ceph doc: http://ceph.com/docs/next/rbd/qemu-rbd/ Important If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted. - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org Envoyé: Mercredi 5 Février 2014 18:51:15 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry Hello, after live migrating machines with a lot of memory (32GB, 48GB, ...) i see pretty often crashing services after migration and the guest kernel prints: [1707620.031806] swap_free: Bad swap file entry 00377410 [1707620.031806] swap_free: Bad swap file entry 00593c48 [1707620.031807] swap_free: Bad swap file entry 03201430 [1707620.031807] swap_free: Bad swap file entry 01bc5900 [1707620.031807] swap_free: Bad swap file entry 0173ce40 [1707620.031808] swap_free: Bad swap file entry 011c0270 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 val:1536 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 val:-1536 Qemu is 1.7 Does anybody know a fix? Greets, Stefan ___ pve-devel mailing list pve-de...@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
On 06.02.2014 15:03, Stefan Priebe - Profihost AG wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] Hi, I've seen memory corruptions after live (and offline) migrations as well. But in our enviroment its mostly (but not only) seen as timer corruption - guest hangs or have insane date in future. But I've seen segfaults and oopses as well. Sadly it's very hard for me to reproduce it reliably but it occures on all types of linux guests - all versions of ubuntu, centos, debian, etc, so it doesn't seem to be connected to a specific guest kernel version. I've never seen windows crashing though. There was another guy here on qemu-devel who had similar issue and fixed it by running guest with no-kvmclock. I've tested qemu 1.4 - 1.6 and kernels 3.4 - 3.10. -- mg
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] OK, so lets just assume some part of memory (or CPU state, or memory loaded off disk...) You said before that it was happening on a 32GB image - is it *only* happening on a 32GB or bigger VM, or is it just more likely? I think you also said you were using 1.7; have you tried an older version - i.e. is this a regression in 1.7 or don't we know? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
Hi, Am 06.02.2014 20:51, schrieb Dr. David Alan Gilbert: * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] OK, so lets just assume some part of memory (or CPU state, or memory loaded off disk...) You said before that it was happening on a 32GB image - is it *only* happening on a 32GB or bigger VM, or is it just more likely? Not image, memory. I've only seen this with vms having more than 16GB or 32GB memory. But maybe this also indicates that just the migration takes longer. I think you also said you were using 1.7; have you tried an older version - i.e. is this a regression in 1.7 or don't we know? Don't know. Sadly i cannot reproduce this with test VMs only with production ones. Stefan Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK