from:"Stefan Priebe \- Profihost AG"

[Qemu-devel] livepatch kernel to use md-clear without reboot

2019-05-17 Thread Stefan Priebe - Profihost AG

Hello,

i'm interested if it is possible to livepatch the guest kernel without
reboot.

(host kernel and microcode is patched)

Greets,
Stefan

Re: [Qemu-devel] cpu.fail / MDS fixes

2019-05-15 Thread Stefan Priebe - Profihost AG



Am 15.05.19 um 19:54 schrieb Daniel P. Berrangé:
> On Wed, May 15, 2019 at 07:13:56PM +0200, Stefan Priebe - Profihost AG wrote:
>> Hello list,
>>
>> i've updated my host to kernel 4.19.43 and applied the following patch
>> to my qemu 2.12.1:
>> https://bugzilla.suse.com/attachment.cgi?id=798722
>>
>> But my guest running 4.19.43 still says:
>> Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state
>> unknown
>>
>> while the host says:
>> Vulnerable: Clear CPU buffers attempted, SMT Host state unknown
> 
> That suggests your host OS hasn't got the new microcode installed
> or has not loaded it.

No it does not. A not loaded Microcode looks like this:
Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable

but in my case it is:
Mitigation: Clear CPU buffers; SMT vulnerable

on the host as hyper threading is still enabled.

> You want the host to report that it is Mitigated, and for the
> host's /proc/cpuinfo to report "md-clear" exists.
> 
>> I expected the guest can use the new microcode.
> 
> You've not said what CPU model you've given to the guest.
> 
> You need either "-cpu host", or if using a named CPU model
> you need to explicitly turn on the "md-clear" feature
> (and all previous fixes)
> 
>eg  "-cpu Haswell,+spec-ctrl,+ssbd,+md-clear"
hah yes you're true i need to specifiy +md-clean

Thanks!

> Regards,
> Daniel
>

[Qemu-devel] cpu.fail / MDS fixes

2019-05-15 Thread Stefan Priebe - Profihost AG

Hello list,

i've updated my host to kernel 4.19.43 and applied the following patch
to my qemu 2.12.1:
https://bugzilla.suse.com/attachment.cgi?id=798722

But my guest running 4.19.43 still says:
Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state
unknown

while the host says:
Vulnerable: Clear CPU buffers attempted, SMT Host state unknown

I expected the guest can use the new microcode.

Greets,
Stefan

Re: [Qemu-devel] Overcommiting cpu results in all vms offline

2018-09-17 Thread Stefan Priebe - Profihost AG

Am 17.09.2018 um 11:40 schrieb Jack Wang:
> Stefan Priebe - Profihost AG  于2018年9月17日周一 上午9:00写道：
>>
>> Hi,
>>
>> Am 17.09.2018 um 08:38 schrieb Jack Wang:
>>> Stefan Priebe - Profihost AG  于2018年9月16日周日 下午3:31写道：
>>>>
>>>> Hello,
>>>>
>>>> while overcommiting cpu I had several situations where all vms gone 
>>>> offline while two vms saturated all cores.
>>>>
>>>> I believed all vms would stay online but would just not be able to use all 
>>>> their cores?
>>>>
>>>> My original idea was to automate live migration on high host load to move 
>>>> vms to another node but that makes only sense if all vms stay online.
>>>>
>>>> Is this expected? Anything special needed to archive this?
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>> Hi, Stefan,
>>>
>>> Do you have any logs when all VMs go offline?
>>> Maybe OOMkiller play a role there?
>>
>> After reviewing i think this is memory related but OOM did not play a role.
>> All kvm processes where spinning trying to get > 100% CPU and i was not
>> able to even login to ssh. After 5-10 minutes i was able to login.
> So the VMs are not really offline, what the result if you run
> query-status via qmp?

I can't as i can't connect to the host in that stage.

>> There were about 150GB free mem.
>>
>> Relevant settings (no local storage involved):
>> vm.dirty_background_ratio:
>> 3
>> vm.dirty_ratio:
>> 10
>> vm.min_free_kbytes:
>> 10567004
>>
>> # cat /sys/kernel/mm/transparent_hugepage/defrag
>> always defer [defer+madvise] madvise never
>>
>> # cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
>>
>> After that i had the following traces on the host node:
>> https://pastebin.com/raw/0VhyQmAv
> 
> The call trace looks ceph related deadlock or hung.

Yes but i can also show you traces where nothing from ceph is involved
the only thing they have in common is the beginning in page_fault.

>> Thanks!
>>
>> Greets,
>> Stefan

Re: [Qemu-devel] Overcommiting cpu results in all vms offline

2018-09-17 Thread Stefan Priebe - Profihost AG

May be amissing piece:
vm.overcommit_memory=0

Greets,
Stefan


Am 17.09.2018 um 09:00 schrieb Stefan Priebe - Profihost AG:
> Hi,
> 
> Am 17.09.2018 um 08:38 schrieb Jack Wang:
>> Stefan Priebe - Profihost AG  于2018年9月16日周日 下午3:31写道：
>>>
>>> Hello,
>>>
>>> while overcommiting cpu I had several situations where all vms gone offline 
>>> while two vms saturated all cores.
>>>
>>> I believed all vms would stay online but would just not be able to use all 
>>> their cores?
>>>
>>> My original idea was to automate live migration on high host load to move 
>>> vms to another node but that makes only sense if all vms stay online.
>>>
>>> Is this expected? Anything special needed to archive this?
>>>
>>> Greets,
>>> Stefan
>>>
>> Hi, Stefan,
>>
>> Do you have any logs when all VMs go offline?
>> Maybe OOMkiller play a role there?
> 
> After reviewing i think this is memory related but OOM did not play a role.
> All kvm processes where spinning trying to get > 100% CPU and i was not
> able to even login to ssh. After 5-10 minutes i was able to login.
> 
> There were about 150GB free mem.
> 
> Relevant settings (no local storage involved):
> vm.dirty_background_ratio:
> 3
> vm.dirty_ratio:
> 10
> vm.min_free_kbytes:
> 10567004
> 
> # cat /sys/kernel/mm/transparent_hugepage/defrag
> always defer [defer+madvise] madvise never
> 
> # cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
> 
> After that i had the following traces on the host node:
> https://pastebin.com/raw/0VhyQmAv
> 
> Thanks!
> 
> Greets,
> Stefan
>

Re: [Qemu-devel] Overcommiting cpu results in all vms offline

2018-09-17 Thread Stefan Priebe - Profihost AG

Hi,

Am 17.09.2018 um 08:38 schrieb Jack Wang:
> Stefan Priebe - Profihost AG  于2018年9月16日周日 下午3:31写道：
>>
>> Hello,
>>
>> while overcommiting cpu I had several situations where all vms gone offline 
>> while two vms saturated all cores.
>>
>> I believed all vms would stay online but would just not be able to use all 
>> their cores?
>>
>> My original idea was to automate live migration on high host load to move 
>> vms to another node but that makes only sense if all vms stay online.
>>
>> Is this expected? Anything special needed to archive this?
>>
>> Greets,
>> Stefan
>>
> Hi, Stefan,
> 
> Do you have any logs when all VMs go offline?
> Maybe OOMkiller play a role there?

After reviewing i think this is memory related but OOM did not play a role.
All kvm processes where spinning trying to get > 100% CPU and i was not
able to even login to ssh. After 5-10 minutes i was able to login.

There were about 150GB free mem.

Relevant settings (no local storage involved):
vm.dirty_background_ratio:
3
vm.dirty_ratio:
10
vm.min_free_kbytes:
10567004

# cat /sys/kernel/mm/transparent_hugepage/defrag
always defer [defer+madvise] madvise never

# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

After that i had the following traces on the host node:
https://pastebin.com/raw/0VhyQmAv

Thanks!

Greets,
Stefan

[Qemu-devel] Overcommiting cpu results in all vms offline

2018-09-16 Thread Stefan Priebe - Profihost AG

Hello,

while overcommiting cpu I had several situations where all vms gone offline 
while two vms saturated all cores.

I believed all vms would stay online but would just not be able to use all 
their cores?

My original idea was to automate live migration on high host load to move vms 
to another node but that makes only sense if all vms stay online.

Is this expected? Anything special needed to archive this?

Greets,
Stefan

Re: [Qemu-devel] Qemu and Spectre_V4 + l1tf + IBRS_FW

2018-08-21 Thread Stefan Priebe - Profihost AG



Am 17.08.2018 um 11:41 schrieb Daniel P. Berrangé:
> On Fri, Aug 17, 2018 at 08:44:38AM +0200, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> i haven't found anything on the web regarding qemu and mentioned variants.
>>
>> While my host says:
>> l1tf:Mitigation: PTE Inversion; VMX: SMT vulnerable, L1D conditional
>> cache flushes
>> meltdown:Mitigation: PTI
>> spec_store_bypass:Mitigation: Speculative Store Bypass disabled via
>> prctl and seccomp
>> spectre_v1:Mitigation: __user pointer sanitization
>> spectre_v2:Mitigation: Full generic retpoline, IBPB, IBRS_FW
>>
>> My guests bootet with pcid and spec-ctrl only say:
>> l1tf:Mitigation: PTE Inversion
>> meltdown:Mitigation: PTI
>> spec_store_bypass:Vulnerable
>> spectre_v1:Mitigation: __user pointer sanitization
>> spectre_v2:Mitigation: Full generic retpoline, IBPB
>>
>> * What is about spec_store_bypass in Qemu?
> 
> The guest needs an 'ssbd' feature for Intel CPU models and either a
> 'virt-ssbd' or 'amd-ssbd' feature for AMD CPU models.

Ah thanks. That works fine.

>> * What is about IBRS_FW feature?
> 
> I'm not sure what IBRS_FW is referring to, but don't worry about it.
> The fact the the guest kernel says "Mitigation" instead of "Vulnerable"
> means you are protected with your current config.
> 
> For Intel CPU models Spectre v2 needs the guest to have the 'spec-ctrl'
> feature. On AMD models Spectre v2 the guest needs 'ibpb' feature.
> 
>> * What is about L1TF?
> 
> No extra CPU flags are required for QEMU guests for L1TF. The new CPU
> feature is merely an perf optimization for the host hypervisor fixes.
> 
> Note that with L1TF there are extra steps you need to consider wrt
> hyperthreading, that won't be reflected in the 'vulnerabilities'
> data published by the kernel.
> 
> You can read more about the procedure for dealing with L1TF in
> virt hosts in the "Resolve" tab of this article:
> 
>   https://access.redhat.com/security/vulnerabilities/L1TF
> 
>> Or are those just irrelevant to Qemu guests? Would be great to have some
>> informations.
> 
> We have some QEMU docs providing guidance on guest CPU model/feature config
> but they are not yet published. In the meantime this blog post of mine gives
> the same info, covering what's needed for Spectre v2, Meltdown and SSBD and
> guidance in general for CPU config:
> 
>   
> https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/

Thanks, that one was really helpful.

Greets,
Stefan

[Qemu-devel] Qemu and Spectre_V4 + l1tf + IBRS_FW

2018-08-17 Thread Stefan Priebe - Profihost AG

Hello,

i haven't found anything on the web regarding qemu and mentioned variants.

While my host says:
l1tf:Mitigation: PTE Inversion; VMX: SMT vulnerable, L1D conditional
cache flushes
meltdown:Mitigation: PTI
spec_store_bypass:Mitigation: Speculative Store Bypass disabled via
prctl and seccomp
spectre_v1:Mitigation: __user pointer sanitization
spectre_v2:Mitigation: Full generic retpoline, IBPB, IBRS_FW

My guests bootet with pcid and spec-ctrl only say:
l1tf:Mitigation: PTE Inversion
meltdown:Mitigation: PTI
spec_store_bypass:Vulnerable
spectre_v1:Mitigation: __user pointer sanitization
spectre_v2:Mitigation: Full generic retpoline, IBPB

* What is about spec_store_bypass in Qemu?
* What is about IBRS_FW feature?
* What is about L1TF?

Or are those just irrelevant to Qemu guests? Would be great to have some
informations.

Thanks a lot!

Greets,
Stefan

Re: [Qemu-devel] how to pass pcid to guest?

2018-01-09 Thread Stefan Priebe - Profihost AG


Am 08.01.2018 um 23:07 schrieb Eric Blake:
> On 01/08/2018 02:03 PM, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> for meltdown mitigation and performance it's important to have the pcid
>> flag passed down to the guest (f.e.
>> https://groups.google.com/forum/m/#!topic/mechanical-sympathy/L9mHTbeQLNU).
> 
> Indeed; you are still waiting on the qemu patch mentioned here:
> https://www.qemu.org/2018/01/04/spectre/
> 
> which is still undergoing the review process, but should be up (in the
> form of 2.11.1) "in the next few days".

OK thanks as the performance difference is significant:

no pcid:
# time for i in $(seq 1 1 50); do du -sx /; done
...
real0m26.614s
user0m17.548s
sys 0m9.056s


kvm started with +pcid:
# time for i in $(seq 1 1 50); do du -sx /; done
...
real0m14.734s
user0m7.755s
sys 0m6.973s

Greets,
Stefan

[Qemu-devel] how to pass pcid to guest?

2018-01-08 Thread Stefan Priebe - Profihost AG

Hello,

for meltdown mitigation and performance it's important to have the pcid
flag passed down to the guest (f.e.
https://groups.google.com/forum/m/#!topic/mechanical-sympathy/L9mHTbeQLNU).

My host shows the flag:
# grep ' pcid ' /proc/cpuinfo  | wc -l
56

But the guest does not:
# grep pcid /proc/cpuinfo
#

Guest was started with:
-cpu IvyBridge,+kvm_pv_unhalt,+kvm_pv_eoi,enforce,vendor=GenuineIntel

Qemu is 2.9.1

Thanks!

Greets,
Stefan

[Qemu-devel] What's needed to pass pciid cpu flag to guest?

2018-01-08 Thread Stefan Priebe - Profihost AG

Hello,

for meltdown mitigation and performance it's important to have the pcid
flag passed down to the guest (f.e.
https://groups.google.com/forum/m/#!topic/mechanical-sympathy/L9mHTbeQLNU).

My host shows the flag:
# grep ' pcid ' /proc/cpuinfo  | wc -l
56

But the guest does not:
# grep pcid /proc/cpuinfo
#

Guest was started with:
-cpu IvyBridge,+kvm_pv_unhalt,+kvm_pv_eoi,enforce,vendor=GenuineIntel

Qemu is 2.9.1

Thanks!

Greets,
Stefan

Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches

2018-01-05 Thread Stefan Priebe - Profihost AG

Thanks! But it‘s very difficult to get all opinions all together.

SuSE Enterprise told me to update:
- kernel
- qemu
- Intel microcode

And the released already updates for all of them.

Stefan

Excuse my typo sent from my mobile phone.

> Am 05.01.2018 um 09:33 schrieb Paolo Bonzini <pbonz...@redhat.com>:
> 
>> On 04/01/2018 21:15, Stefan Priebe - Profihost AG wrote:
>> attached the relevant patch for everybody who needs it.
> 
> This is the original patch from Intel, which doesn't work unless you
> have a patched kernel (which you almost certainly don't have) and
> doesn't even warn you about that.
> 
> In other words, it's rubbish.  Please read
> https://www.qemu.org/2018/01/04/spectre/ several times, until you
> understand why there is no urgent need to update QEMU.
> 
> Days are 24 hours for QEMU developers just like for you (and believe me,
> we wished several times that they weren't during the last two months).
> We are prioritizing the fixes according to their effect in mitigating
> the vulnerability, their applicability and the availability of patches
> to the lower levels of the stack.  Right now, the most urgent part is
> the simple mitigations that can go in Linux 4.15 and stable kernels.
> 
> Paolo

Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches

2018-01-04 Thread Stefan Priebe - Profihost AG

attached the relevant patch for everybody who needs it.

Greets,
Stefan
Am 04.01.2018 um 16:53 schrieb Paolo Bonzini:
> On 04/01/2018 09:35, Alexandre DERUMIER wrote:
>>>> So you need: 
>>>> 1.) intel / amd cpu microcode update 
>>>> 2.) qemu update to pass the new MSR and CPU flags from the microcode 
>>>> update 
>>>> 3.) host kernel update 
>>>> 4.) guest kernel update 
>>
>> are you sure we need to patch guest kernel if we are able to patch qemu ?
> 
> Patching the guest kernel is only required to protect the guest kernel
> from guest usermode.
> 
>> If I understand, patching the host kernel, should avoid that a vm is reading 
>> memory of another vm.
>> (the most critical)
> 
> Correct.
> 
>> patching the guest kernel, to avoid that a process from the vm have access 
>> to memory of another process of same vm.
> 
> Correct.
> 
> The QEMU updates are pretty boring, mostly taking care of new MSR and
> CPUID flags (and adding new CPU models).
> 
> They are not needed to protect the guest from "Meltdown", only
> "Spectre"---the former only needs a guest kernel update.  Also, to have
> any effect, the guest kernels must also have "Spectre" patches which
> aren't upstream yet for either KVM or the rest of Linux.  So the QEMU
> patches are much less important than the kernel side.
> 
>>> https://access.redhat.com/solutions/3307851 
>>> "Impacts of CVE-2017-5754, CVE-2017-5753, and CVE-2017-5715 to Red Hat 
>>> Virtualization products" 
> 
> It mostly repeats the contents of the RHEL document
> https://access.redhat.com/security/vulnerabilities/speculativeexecution,
> with some information specific to RHV.
> 
> Thanks,
> 
> Paolo
> 
>> i don't have one but the content might be something like this: 
>> https://www.suse.com/de-de/support/kb/doc/?id=7022512 
>>
>> So you need: 
>> 1.) intel / amd cpu microcode update 
>> 2.) qemu update to pass the new MSR and CPU flags from the microcode update 
>> 3.) host kernel update 
>> 4.) guest kernel update 
>>
>> The microcode update and the kernel update is publicly available but i'm 
>> missing the qemu one. 
>>
>> Greets, 
>> Stefan 
>>
>>> - Mail original - 
>>> De: "aderumier" <aderum...@odiso.com> 
>>> À: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag> 
>>> Cc: "qemu-devel" <qemu-devel@nongnu.org> 
>>> Envoyé: Jeudi 4 Janvier 2018 08:24:34 
>>> Objet: Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches 
>>>
>>>>> Can anybody point me to the relevant qemu patches? 
>>>
>>> I don't have find them yet. 
>>>
>>> Do you known if a vm using kvm64 cpu model is protected or not ? 
>>>
>>> - Mail original - 
>>> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag> 
>>> À: "qemu-devel" <qemu-devel@nongnu.org> 
>>> Envoyé: Jeudi 4 Janvier 2018 07:27:01 
>>> Objet: [Qemu-devel] CVE-2017-5715: relevant qemu patches 
>>>
>>> Hello, 
>>>
>>> i've seen some vendors have updated qemu regarding meltdown / spectre. 
>>>
>>> f.e.: 
>>>
>>> CVE-2017-5715: QEMU was updated to allow passing through new MSR and 
>>> CPUID flags from the host VM to the CPU, to allow enabling/disabling 
>>> branch prediction features in the Intel CPU. (bsc#1068032) 
>>>
>>> Can anybody point me to the relevant qemu patches? 
>>>
>>> Thanks! 
>>>
>>> Greets, 
>>> Stefan 
>>>
>>
>>
> 
>From b4fdfeb4545c09a0fdf01edc938f9cce8fcaa5c6 Mon Sep 17 00:00:00 2001
From: Wei Wang <wei.w.w...@intel.com>
Date: Tue, 7 Nov 2017 16:39:49 +0800
Subject: [PATCH] i386/kvm: MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD

CPUID(EAX=0X7,ECX=0).EDX[26]/[27] indicates the support of
MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. Expose the CPUID
to the guest. Also add the support of transferring the MSRs during live
migration.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
[BR: BSC#1068032 CVE-2017-5715]
Signed-off-by: Bruce Rogers <brog...@suse.com>
---
 target/i386/cpu.c |  3 ++-
 target/i386/cpu.h |  4 
 target/i386/kvm.c | 15 ++-
 target/i386/machine.c | 20 
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 55f72b679f..01761db3fc 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2823,13 +2823,14 @@ vo

Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches

2018-01-04 Thread Stefan Priebe - Profihost AG

Nobody? Is this something they did on their own?

Stefan

Am 04.01.2018 um 07:27 schrieb Stefan Priebe - Profihost AG:
> Hello,
> 
> i've seen some vendors have updated qemu regarding meltdown / spectre.
> 
> f.e.:
> 
>  CVE-2017-5715: QEMU was updated to allow passing through new MSR and
>  CPUID flags from the host VM to the CPU, to allow enabling/disabling
>  branch prediction features in the Intel CPU. (bsc#1068032)
> 
> Can anybody point me to the relevant qemu patches?
> 
> Thanks!
> 
> Greets,
> Stefan
>

Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches

2018-01-04 Thread Stefan Priebe - Profihost AG

Am 04.01.2018 um 09:35 schrieb Alexandre DERUMIER:
>>> So you need: 
>>> 1.) intel / amd cpu microcode update 
>>> 2.) qemu update to pass the new MSR and CPU flags from the microcode update 
>>> 3.) host kernel update 
>>> 4.) guest kernel update 
> 
> are you sure we need to patch guest kernel if we are able to patch qemu ?
>> I have some pretty old guest (linux and windows)
>
> If I understand, patching the host kernel, should avoid that a vm is reading 
> memory of another vm.
> (the most critical)

Yes - this was just to complete the mitigation on all layers.

> 
> patching the guest kernel, to avoid that a process from the vm have access to 
> memory of another process of same vm.
Yes.

Stefan

> 
> 
> 
> - Mail original -
> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag>
> À: "aderumier" <aderum...@odiso.com>
> Cc: "qemu-devel" <qemu-devel@nongnu.org>
> Envoyé: Jeudi 4 Janvier 2018 09:17:41
> Objet: Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches
> 
> Am 04.01.2018 um 08:27 schrieb Alexandre DERUMIER: 
>> does somebody have a redhat account to see te content of: 
>>
>> https://access.redhat.com/solutions/3307851 
>> "Impacts of CVE-2017-5754, CVE-2017-5753, and CVE-2017-5715 to Red Hat 
>> Virtualization products" 
> 
> i don't have one but the content might be something like this: 
> https://www.suse.com/de-de/support/kb/doc/?id=7022512 
> 
> So you need: 
> 1.) intel / amd cpu microcode update 
> 2.) qemu update to pass the new MSR and CPU flags from the microcode update 
> 3.) host kernel update 
> 4.) guest kernel update 
> 
> The microcode update and the kernel update is publicly available but i'm 
> missing the qemu one. 
> 
> Greets, 
> Stefan 
> 
>> - Mail original - 
>> De: "aderumier" <aderum...@odiso.com> 
>> À: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag> 
>> Cc: "qemu-devel" <qemu-devel@nongnu.org> 
>> Envoyé: Jeudi 4 Janvier 2018 08:24:34 
>> Objet: Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches 
>>
>>>> Can anybody point me to the relevant qemu patches? 
>>
>> I don't have find them yet. 
>>
>> Do you known if a vm using kvm64 cpu model is protected or not ? 
>>
>> - Mail original - 
>> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag> 
>> À: "qemu-devel" <qemu-devel@nongnu.org> 
>> Envoyé: Jeudi 4 Janvier 2018 07:27:01 
>> Objet: [Qemu-devel] CVE-2017-5715: relevant qemu patches 
>>
>> Hello, 
>>
>> i've seen some vendors have updated qemu regarding meltdown / spectre. 
>>
>> f.e.: 
>>
>> CVE-2017-5715: QEMU was updated to allow passing through new MSR and 
>> CPUID flags from the host VM to the CPU, to allow enabling/disabling 
>> branch prediction features in the Intel CPU. (bsc#1068032) 
>>
>> Can anybody point me to the relevant qemu patches? 
>>
>> Thanks! 
>>
>> Greets, 
>> Stefan 
>>
>

Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches

2018-01-04 Thread Stefan Priebe - Profihost AG


Am 04.01.2018 um 08:27 schrieb Alexandre DERUMIER:
> does somebody have a redhat account to see te content of: 
> 
> https://access.redhat.com/solutions/3307851
> "Impacts of CVE-2017-5754, CVE-2017-5753, and CVE-2017-5715 to Red Hat 
> Virtualization products"

i don't have one but the content might be something like this:
https://www.suse.com/de-de/support/kb/doc/?id=7022512

So you need:
1.) intel / amd cpu microcode update
2.) qemu update to pass the new MSR and CPU flags from the microcode update
3.) host kernel update
4.) guest kernel update

The microcode update and the kernel update is publicly available but i'm
missing the qemu one.

Greets,
Stefan

> - Mail original -
> De: "aderumier" <aderum...@odiso.com>
> À: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag>
> Cc: "qemu-devel" <qemu-devel@nongnu.org>
> Envoyé: Jeudi 4 Janvier 2018 08:24:34
> Objet: Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches
> 
>>> Can anybody point me to the relevant qemu patches? 
> 
> I don't have find them yet. 
> 
> Do you known if a vm using kvm64 cpu model is protected or not ? 
> 
> - Mail original - 
> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag> 
> À: "qemu-devel" <qemu-devel@nongnu.org> 
> Envoyé: Jeudi 4 Janvier 2018 07:27:01 
> Objet: [Qemu-devel] CVE-2017-5715: relevant qemu patches 
> 
> Hello, 
> 
> i've seen some vendors have updated qemu regarding meltdown / spectre. 
> 
> f.e.: 
> 
> CVE-2017-5715: QEMU was updated to allow passing through new MSR and 
> CPUID flags from the host VM to the CPU, to allow enabling/disabling 
> branch prediction features in the Intel CPU. (bsc#1068032) 
> 
> Can anybody point me to the relevant qemu patches? 
> 
> Thanks! 
> 
> Greets, 
> Stefan 
>

[Qemu-devel] CVE-2017-5715: relevant qemu patches

2018-01-03 Thread Stefan Priebe - Profihost AG

Hello,

i've seen some vendors have updated qemu regarding meltdown / spectre.

f.e.:

 CVE-2017-5715: QEMU was updated to allow passing through new MSR and
 CPUID flags from the host VM to the CPU, to allow enabling/disabling
 branch prediction features in the Intel CPU. (bsc#1068032)

Can anybody point me to the relevant qemu patches?

Thanks!

Greets,
Stefan

Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)

2018-01-03 Thread Stefan Priebe - Profihost AG

Am 03.01.2018 um 09:14 schrieb Alexandre DERUMIER:
> Hi Stefan,
> 
>>> The tap devices on the target vm shows dropped RX packages on BOTH tap 
>>> interfaces - strangely with the same amount of pkts? 
> 
> that's strange indeed. 
> if you tcpdump tap interfaces, do you see incoming traffic only on 1 
> interface, or both random ?

complete independend random traffic as it should.

> (can you provide the network configuration in the guest for both interfaces ?)

inside the guest? where the drop counter stays 0?

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
address 192.168.0.2
netmask 255.255.255.0

that's it.

> I'm seeing that you have enable multiqueue on 1 of the interfaces, do you 
> have setup correctly the multiqueue part inside the guest.
uh oh? What is needed inside the guest?

> do you have enough vcpu to handle all the queues ?
Yes.

Stefan

> - Mail original -
> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag>
> À: "qemu-devel" <qemu-devel@nongnu.org>
> Envoyé: Mardi 2 Janvier 2018 12:17:29
> Objet: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
> 
> Hello, 
> 
> currently i'm trying to fix a problem where we have "random" missing 
> packets. 
> 
> We're doing an ssh connect from machine a to machine b every 5 minutes 
> via rsync and ssh. 
> 
> Sometimes it happens that we get this cron message: 
> "Connection to 192.168.0.2 closed by remote host. 
> rsync: connection unexpectedly closed (0 bytes received so far) [sender] 
> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2] 
> ssh: connect to host 192.168.0.2 port 22: Connection refused" 
> 
> The tap devices on the target vm shows dropped RX packages on BOTH tap 
> interfaces - strangely with the same amount of pkts? 
> 
> # ifconfig tap317i0; ifconfig tap317i1 
> tap317i0 Link encap:Ethernet HWaddr 6e:cb:65:94:bb:bf 
> UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 
> RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0 
> TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0 
> collisions:0 txqueuelen:1000 
> RX bytes:177991267 (169.7 MiB) TX bytes:910412749 (868.2 MiB) 
> 
> tap317i1 Link encap:Ethernet HWaddr 96:f8:b5:d0:9a:07 
> UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 
> RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0 
> TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0 
> collisions:0 txqueuelen:1000 
> RX bytes:1597564313 (1.4 GiB) TX bytes:3517734365 (3.2 GiB) 
> 
> Any ideas how to inspect this issue? 
> 
> Greets, 
> Stefan 
>

Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)

2018-01-03 Thread Stefan Priebe - Profihost AG


Am 03.01.2018 um 04:57 schrieb Wei Xu:
> On Tue, Jan 02, 2018 at 10:17:25PM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 02.01.2018 um 18:04 schrieb Wei Xu:
>>> On Tue, Jan 02, 2018 at 04:24:33PM +0100, Stefan Priebe - Profihost AG 
>>> wrote:
>>>> Hi,
>>>> Am 02.01.2018 um 15:20 schrieb Wei Xu:
>>>>> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG 
>>>>> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> currently i'm trying to fix a problem where we have "random" missing
>>>>>> packets.
>>>>>>
>>>>>> We're doing an ssh connect from machine a to machine b every 5 minutes
>>>>>> via rsync and ssh.
>>>>>>
>>>>>> Sometimes it happens that we get this cron message:
>>>>>> "Connection to 192.168.0.2 closed by remote host.
>>>>>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>>>>>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
>>>>>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
>>>>>
>>>>> Hi Stefan,
>>>>> What kind of virtio-net backend are you using? Can you paste your qemu
>>>>> command line here?
>>>>
>>>> Sure netdev part:
>>>> -netdev
>>>> type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
>>>> -device
>>>> virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
>>>> -netdev
>>>> type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
>>>> -device
>>>> virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301
>>>
>>> According to what you have mentioned, the traffic is not heavy for the 
>>> guests,
>>> the dropping shouldn't happen for regular case.
>>
>> The avg traffic is around 300kb/s.
>>
>>> What is your hardware platform?
>>
>> Dual Intel Xeon E5-2680 v4
>>
>>> and Which versions are you using for both
>>> guest/host kernel
>> Kernel v4.4.103
>>
>>> and qemu?
>> 2.9.1
>>
>>> Are there other VMs on the same host?
>> Yes.
> 
> What about the CPU load? 

Host:
80-90% Idle
LoadAvg: 6-7

VM:
97%-99% Idle

>>>>> 'Connection refused' usually means that the client gets a TCP Reset rather
>>>>> than losing packets, so this might not be a relevant issue.
>>>>
>>>> Mhm so you mean these might be two seperate ones?
>>>
>>> Yes.
>>>
>>>>
>>>>> Also you can do a tcpdump on both guests and see what happened to SSH 
>>>>> packets
>>>>> (tcpdump -i tapXXX port 22).
>>>>
>>>> Sadly not as there's too much traffic on that part as rsync is syncing
>>>> every 5 minutes through ssh.
>>>
>>> You can do a tcpdump for the entire traffic from the guest and host and 
>>> compare
>>> what kind of packets are dropped if the traffic is not overloaded.
>>
>> Are you sure? I don't get why the same amount and same kind of packets
>> should be received by both tap which are connected to different bridges
>> to different HW and physical interfaces.
> 
> Exactly, possibly this would be a host or guest kernel bug cos than qemu issue
> you are using vhost kernel as the backend and the two stats are independent,
> you might have to check out what is happening inside the traffic.

What do you mean by inside the traffic?

Stefan

Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)

2018-01-02 Thread Stefan Priebe - Profihost AG


Am 02.01.2018 um 18:04 schrieb Wei Xu:
> On Tue, Jan 02, 2018 at 04:24:33PM +0100, Stefan Priebe - Profihost AG wrote:
>> Hi,
>> Am 02.01.2018 um 15:20 schrieb Wei Xu:
>>> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG 
>>> wrote:
>>>> Hello,
>>>>
>>>> currently i'm trying to fix a problem where we have "random" missing
>>>> packets.
>>>>
>>>> We're doing an ssh connect from machine a to machine b every 5 minutes
>>>> via rsync and ssh.
>>>>
>>>> Sometimes it happens that we get this cron message:
>>>> "Connection to 192.168.0.2 closed by remote host.
>>>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>>>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
>>>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
>>>
>>> Hi Stefan,
>>> What kind of virtio-net backend are you using? Can you paste your qemu
>>> command line here?
>>
>> Sure netdev part:
>> -netdev
>> type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
>> -device
>> virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
>> -netdev
>> type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
>> -device
>> virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301
> 
> According to what you have mentioned, the traffic is not heavy for the guests,
> the dropping shouldn't happen for regular case.

The avg traffic is around 300kb/s.

> What is your hardware platform?

Dual Intel Xeon E5-2680 v4

> and Which versions are you using for both
> guest/host kernel
Kernel v4.4.103

> and qemu?
2.9.1

> Are there other VMs on the same host?
Yes.


>>> 'Connection refused' usually means that the client gets a TCP Reset rather
>>> than losing packets, so this might not be a relevant issue.
>>
>> Mhm so you mean these might be two seperate ones?
> 
> Yes.
> 
>>
>>> Also you can do a tcpdump on both guests and see what happened to SSH 
>>> packets
>>> (tcpdump -i tapXXX port 22).
>>
>> Sadly not as there's too much traffic on that part as rsync is syncing
>> every 5 minutes through ssh.
> 
> You can do a tcpdump for the entire traffic from the guest and host and 
> compare
> what kind of packets are dropped if the traffic is not overloaded.

Are you sure? I don't get why the same amount and same kind of packets
should be received by both tap which are connected to different bridges
to different HW and physical interfaces.

Stefan

> Wei
> 
>>
>>>> The tap devices on the target vm shows dropped RX packages on BOTH tap
>>>> interfaces - strangely with the same amount of pkts?
>>>>
>>>> # ifconfig tap317i0; ifconfig tap317i1
>>>> tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
>>>>   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>>>   RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
>>>>   TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
>>>>   collisions:0 txqueuelen:1000
>>>>   RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)
>>>>
>>>> tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
>>>>   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>>>   RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
>>>>   TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
>>>>   collisions:0 txqueuelen:1000
>>>>   RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)
>>>>
>>>> Any ideas how to inspect this issue?
>>>
>>> It seems both tap interfaces lose RX pkts, dropping pkts of RX means the
>>> host(backend) cann't receive packets from the guest as fast as the guest 
>>> sends.
>>
>> Inside the guest i see no dropped packets at all. It's only on the host
>> and strangely on both taps at the same value? And both are connected to
>> absolutely different networks.
>>
>>> Are you running some symmetrical test on both guests? 
>>
>> No.
>>
>> Stefan
>>
>>
>>> Wei
>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>

Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)

2018-01-02 Thread Stefan Priebe - Profihost AG

Hi,
Am 02.01.2018 um 15:20 schrieb Wei Xu:
> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> currently i'm trying to fix a problem where we have "random" missing
>> packets.
>>
>> We're doing an ssh connect from machine a to machine b every 5 minutes
>> via rsync and ssh.
>>
>> Sometimes it happens that we get this cron message:
>> "Connection to 192.168.0.2 closed by remote host.
>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
> 
> Hi Stefan,
> What kind of virtio-net backend are you using? Can you paste your qemu
> command line here?

Sure netdev part:
-netdev
type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
-device
virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
-netdev
type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
-device
virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301


> 'Connection refused' usually means that the client gets a TCP Reset rather
> than losing packets, so this might not be a relevant issue.

Mhm so you mean these might be two seperate ones?

> Also you can do a tcpdump on both guests and see what happened to SSH packets
> (tcpdump -i tapXXX port 22).

Sadly not as there's too much traffic on that part as rsync is syncing
every 5 minutes through ssh.

>> The tap devices on the target vm shows dropped RX packages on BOTH tap
>> interfaces - strangely with the same amount of pkts?
>>
>> # ifconfig tap317i0; ifconfig tap317i1
>> tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
>>   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>   RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
>>   TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1000
>>   RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)
>>
>> tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
>>   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>   RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
>>   TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1000
>>   RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)
>>
>> Any ideas how to inspect this issue?
> 
> It seems both tap interfaces lose RX pkts, dropping pkts of RX means the
> host(backend) cann't receive packets from the guest as fast as the guest 
> sends.

Inside the guest i see no dropped packets at all. It's only on the host
and strangely on both taps at the same value? And both are connected to
absolutely different networks.

> Are you running some symmetrical test on both guests? 

No.

Stefan


> Wei
> 
>>
>> Greets,
>> Stefan
>>

[Qemu-devel] dropped pkts with Qemu on tap interace (RX)

2018-01-02 Thread Stefan Priebe - Profihost AG

Hello,

currently i'm trying to fix a problem where we have "random" missing
packets.

We're doing an ssh connect from machine a to machine b every 5 minutes
via rsync and ssh.

Sometimes it happens that we get this cron message:
"Connection to 192.168.0.2 closed by remote host.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
ssh: connect to host 192.168.0.2 port 22: Connection refused"

The tap devices on the target vm shows dropped RX packages on BOTH tap
interfaces - strangely with the same amount of pkts?

# ifconfig tap317i0; ifconfig tap317i1
tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
  UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
  RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
  TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)

tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
  UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
  RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
  TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)

Any ideas how to inspect this issue?

Greets,
Stefan

Re: [Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-22 Thread Stefan Priebe - Profihost AG

Hello,

Am 22.11.2017 um 20:41 schrieb Dr. David Alan Gilbert:
> * Paolo Bonzini (pbonz...@redhat.com) wrote:
>> On 06/11/2017 12:09, Stefan Priebe - Profihost AG wrote:
>>> HI Paolo,
>>>
>>> could this patchset be related?
>>
>> Uh oh, yes it should.  Jason, any ways to fix it?  I suppose we need to
>> disable UFO in the newest machine types, but do we also have to do
>> (software) UFO in vhost-net and QEMU for migration compatibility?
> 
> Was there a solution to this?

it will be this one:
https://patchwork.ozlabs.org/patch/840094/


Stefan

> Dave
> 
>> Thanks,
>>
>> Paolo
>>
>>> Greets,
>>> Stefan
>>>
>>> Am 06.11.2017 um 10:52 schrieb Stefan Priebe - Profihost AG:
>>>> Hi Paolo,
>>>>
>>>> Am 06.11.2017 um 10:49 schrieb Paolo Bonzini:
>>>>> On 06/11/2017 10:48, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Paolo,
>>>>>>
>>>>>> Am 06.11.2017 um 10:40 schrieb Paolo Bonzini:
>>>>>>> On 06/11/2017 10:38, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> i've upgraded some servers from kernel 4.4 to 4.12 - both running Qemu
>>>>>>>> 2.9.1.
>>>>>>>>
>>>>>>>> If i migrate a VM from a host running kernel 4.4 to a host running 4.12
>>>>>>>> i get:
>>>>>>>>
>>>>>>>> kvm: virtio-net: saved image requires TUN_F_UFO support
>>>>>>>> kvm: Failed to load virtio-net-device:tmp
>>>>>>>> kvm: Failed to load virtio-net:virtio
>>>>>>>> kvm: error while loading state for instance 0x0 of device
>>>>>>>> ':00:12.0/virtio-net'
>>>>>>>> kvm: load of migration failed: Invalid argument
>>>>>>>>
>>>>>>>>
>>>>>>>> while migrating from 4.12 to 4.4 works fine.
>>>>>>>>
>>>>>>>> Can anybody help? Is this expected?
>>>>>>>
>>>>>>> Can you check why peer_has_ufo failed (in hw/net/virtio-net.c)?
>>>>>>
>>>>>> May be - how can i archieve this? Patching the code is not a problem if
>>>>>> you can give me a hint.
>>>>>>
>>>>>>> Also, did this ioctl fail when the tap device was set up on the 4.12 
>>>>>>> destination?
>>>>>>> int tap_probe_has_ufo(int fd)
>>>>>>> {
>>>>>>> unsigned offload;
>>>>>>>
>>>>>>> offload = TUN_F_CSUM | TUN_F_UFO;
>>>>>>>
>>>>>>> if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
>>>>>>> return 0;
>>>>>>>
>>>>>>> return 1;
>>>>>>> }
>>>>>>
>>>>>> Should there be any kernel output or how can i detect / check it?
>>>>>
>>>>> For both, the simplest answer is probably just using printf.
>>>>
>>>> arg i missed an important part. The kernel is an opensuse SLE15 one.
>>>>
>>>> I've seen it contains the following patchset:
>>>> https://www.spinics.net/lists/netdev/msg443821.html
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

[Qemu-devel] kvm: Failed to flush the L2 table cache: Input/output error

2017-11-20 Thread Stefan Priebe - Profihost AG

Hello,

while using qemu 2.9.1 and doing a backup of a disk:

I have sometimes the following output:

Formatting '/mnt/qemu-249-2017_11_19-04_00_05.qcow2', fmt=qcow2
size=236223201280 encryption=off cluster_size=65536 lazy_refcounts=off
refcount_bits=16


followed by:

kvm: Failed to flush the L2 table cache: Input/output error
kvm: Failed to flush the refcount block cache: Input/output error

If this error happens (Failed to flush) the whole backup is incomplete.
Host kernel is a 4.4 based openSuSE 42.3 kernel.

Is this a known bug?

Greets,
Stefan

Re: [Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-11 Thread Stefan Priebe - Profihost AG

Hello,

Am 10.11.2017 um 05:18 schrieb Jason Wang:
> 
> 
> On 2017年11月08日 19:22, Jason Wang wrote:
>>
>>
>> On 2017年11月08日 18:46, Paolo Bonzini wrote:
>>> On 08/11/2017 09:21, Jason Wang wrote:
>>>>
>>>> On 2017年11月08日 17:05, Stefan Priebe - Profihost AG wrote:
>>>>> Am 08.11.2017 um 08:54 schrieb Jason Wang:
>>>>>> On 2017年11月08日 15:41, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi Paolo,
>>>>>>>
>>>>>>> Am 06.11.2017 um 12:27 schrieb Paolo Bonzini:
>>>>>>>> On 06/11/2017 12:09, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> HI Paolo,
>>>>>>>>>
>>>>>>>>> could this patchset be related?
>>>>>>>> Uh oh, yes it should.  Jason, any ways to fix it?  I suppose we
>>>>>>>> need to
>>>>>>>> disable UFO in the newest machine types, but do we also have to do
>>>>>>>> (software) UFO in vhost-net and QEMU for migration compatibility?
>>>>>>> Any news on this?
>>>>>>>
>>>>>>> Greets,
>>>>>> Since we probe UFO support, it will be disabled automatically on
>>>>>> device
>>>>>> startup.
>>>>>>
>>>>>> For the issue of migration, I think the only way is trying to fix
>>>>>> it in
>>>>>> kernel.
>>>>> But 4.14 is around the corner and the patchset is already included.
>>>>> Shouldn't this get fixed before 4.14 is released?
>>>> We will try to seek a solution as soon as possible. If we can't catch
>>>> 4.14, we will do it for stable for sure.
>>> Jason, can you write to netdev and Cc Linus and me?
>>>
>>> Thanks,
>>>
>>> Paolo
>>
>> Paolo, see this https://www.spinics.net/lists/netdev/msg465454.html
>>
>> Just notice this today since I'm not on the cc list.
>>
>> Thanks
>>
>>
> 
> An update:
> 
> After some discussions on netdev, we will try to bring UFO back
> partially for just TAP. Willem promise to fix this. 4.14 is too late
> consider the fix is not trivial, it will go through stable tree.

OK is it save to just revert the UFO patchset for my local branch?

Greets,
Stefan

> 
> Thanks

Re: [Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-08 Thread Stefan Priebe - Profihost AG


Am 08.11.2017 um 08:54 schrieb Jason Wang:
> 
> 
> On 2017年11月08日 15:41, Stefan Priebe - Profihost AG wrote:
>> Hi Paolo,
>>
>> Am 06.11.2017 um 12:27 schrieb Paolo Bonzini:
>>> On 06/11/2017 12:09, Stefan Priebe - Profihost AG wrote:
>>>> HI Paolo,
>>>>
>>>> could this patchset be related?
>>> Uh oh, yes it should.  Jason, any ways to fix it?  I suppose we need to
>>> disable UFO in the newest machine types, but do we also have to do
>>> (software) UFO in vhost-net and QEMU for migration compatibility?
>> Any news on this?
>>
>> Greets,
> 
> Since we probe UFO support, it will be disabled automatically on device
> startup.
> 
> For the issue of migration, I think the only way is trying to fix it in
> kernel.

But 4.14 is around the corner and the patchset is already included.
Shouldn't this get fixed before 4.14 is released?

Stefan
> 
> Thanks

Re: [Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-07 Thread Stefan Priebe - Profihost AG

Hi Paolo,

Am 06.11.2017 um 12:27 schrieb Paolo Bonzini:
> On 06/11/2017 12:09, Stefan Priebe - Profihost AG wrote:
>> HI Paolo,
>>
>> could this patchset be related?
> 
> Uh oh, yes it should.  Jason, any ways to fix it?  I suppose we need to
> disable UFO in the newest machine types, but do we also have to do
> (software) UFO in vhost-net and QEMU for migration compatibility?

Any news on this?

Greets,
Stefan


> Thanks,
> 
> Paolo
> 
>> Greets,
>> Stefan
>>
>> Am 06.11.2017 um 10:52 schrieb Stefan Priebe - Profihost AG:
>>> Hi Paolo,
>>>
>>> Am 06.11.2017 um 10:49 schrieb Paolo Bonzini:
>>>> On 06/11/2017 10:48, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Paolo,
>>>>>
>>>>> Am 06.11.2017 um 10:40 schrieb Paolo Bonzini:
>>>>>> On 06/11/2017 10:38, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> i've upgraded some servers from kernel 4.4 to 4.12 - both running Qemu
>>>>>>> 2.9.1.
>>>>>>>
>>>>>>> If i migrate a VM from a host running kernel 4.4 to a host running 4.12
>>>>>>> i get:
>>>>>>>
>>>>>>> kvm: virtio-net: saved image requires TUN_F_UFO support
>>>>>>> kvm: Failed to load virtio-net-device:tmp
>>>>>>> kvm: Failed to load virtio-net:virtio
>>>>>>> kvm: error while loading state for instance 0x0 of device
>>>>>>> ':00:12.0/virtio-net'
>>>>>>> kvm: load of migration failed: Invalid argument
>>>>>>>
>>>>>>>
>>>>>>> while migrating from 4.12 to 4.4 works fine.
>>>>>>>
>>>>>>> Can anybody help? Is this expected?
>>>>>>
>>>>>> Can you check why peer_has_ufo failed (in hw/net/virtio-net.c)?
>>>>>
>>>>> May be - how can i archieve this? Patching the code is not a problem if
>>>>> you can give me a hint.
>>>>>
>>>>>> Also, did this ioctl fail when the tap device was set up on the 4.12 
>>>>>> destination?
>>>>>> int tap_probe_has_ufo(int fd)
>>>>>> {
>>>>>> unsigned offload;
>>>>>>
>>>>>> offload = TUN_F_CSUM | TUN_F_UFO;
>>>>>>
>>>>>> if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
>>>>>> return 0;
>>>>>>
>>>>>> return 1;
>>>>>> }
>>>>>
>>>>> Should there be any kernel output or how can i detect / check it?
>>>>
>>>> For both, the simplest answer is probably just using printf.
>>>
>>> arg i missed an important part. The kernel is an opensuse SLE15 one.
>>>
>>> I've seen it contains the following patchset:
>>> https://www.spinics.net/lists/netdev/msg443821.html
>>>
>>> Greets,
>>> Stefan
>>>
>

Re: [Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-06 Thread Stefan Priebe - Profihost AG

HI Paolo,

could this patchset be related?

Greets,
Stefan

Am 06.11.2017 um 10:52 schrieb Stefan Priebe - Profihost AG:
> Hi Paolo,
> 
> Am 06.11.2017 um 10:49 schrieb Paolo Bonzini:
>> On 06/11/2017 10:48, Stefan Priebe - Profihost AG wrote:
>>> Hi Paolo,
>>>
>>> Am 06.11.2017 um 10:40 schrieb Paolo Bonzini:
>>>> On 06/11/2017 10:38, Stefan Priebe - Profihost AG wrote:
>>>>> Hello,
>>>>>
>>>>> i've upgraded some servers from kernel 4.4 to 4.12 - both running Qemu
>>>>> 2.9.1.
>>>>>
>>>>> If i migrate a VM from a host running kernel 4.4 to a host running 4.12
>>>>> i get:
>>>>>
>>>>> kvm: virtio-net: saved image requires TUN_F_UFO support
>>>>> kvm: Failed to load virtio-net-device:tmp
>>>>> kvm: Failed to load virtio-net:virtio
>>>>> kvm: error while loading state for instance 0x0 of device
>>>>> ':00:12.0/virtio-net'
>>>>> kvm: load of migration failed: Invalid argument
>>>>>
>>>>>
>>>>> while migrating from 4.12 to 4.4 works fine.
>>>>>
>>>>> Can anybody help? Is this expected?
>>>>
>>>> Can you check why peer_has_ufo failed (in hw/net/virtio-net.c)?
>>>
>>> May be - how can i archieve this? Patching the code is not a problem if
>>> you can give me a hint.
>>>
>>>> Also, did this ioctl fail when the tap device was set up on the 4.12 
>>>> destination?
>>>> int tap_probe_has_ufo(int fd)
>>>> {
>>>> unsigned offload;
>>>>
>>>> offload = TUN_F_CSUM | TUN_F_UFO;
>>>>
>>>> if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
>>>> return 0;
>>>>
>>>> return 1;
>>>> }
>>>
>>> Should there be any kernel output or how can i detect / check it?
>>
>> For both, the simplest answer is probably just using printf.
> 
> arg i missed an important part. The kernel is an opensuse SLE15 one.
> 
> I've seen it contains the following patchset:
> https://www.spinics.net/lists/netdev/msg443821.html
> 
> Greets,
> Stefan
>

Re: [Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-06 Thread Stefan Priebe - Profihost AG

Hi Paolo,

Am 06.11.2017 um 10:49 schrieb Paolo Bonzini:
> On 06/11/2017 10:48, Stefan Priebe - Profihost AG wrote:
>> Hi Paolo,
>>
>> Am 06.11.2017 um 10:40 schrieb Paolo Bonzini:
>>> On 06/11/2017 10:38, Stefan Priebe - Profihost AG wrote:
>>>> Hello,
>>>>
>>>> i've upgraded some servers from kernel 4.4 to 4.12 - both running Qemu
>>>> 2.9.1.
>>>>
>>>> If i migrate a VM from a host running kernel 4.4 to a host running 4.12
>>>> i get:
>>>>
>>>> kvm: virtio-net: saved image requires TUN_F_UFO support
>>>> kvm: Failed to load virtio-net-device:tmp
>>>> kvm: Failed to load virtio-net:virtio
>>>> kvm: error while loading state for instance 0x0 of device
>>>> ':00:12.0/virtio-net'
>>>> kvm: load of migration failed: Invalid argument
>>>>
>>>>
>>>> while migrating from 4.12 to 4.4 works fine.
>>>>
>>>> Can anybody help? Is this expected?
>>>
>>> Can you check why peer_has_ufo failed (in hw/net/virtio-net.c)?
>>
>> May be - how can i archieve this? Patching the code is not a problem if
>> you can give me a hint.
>>
>>> Also, did this ioctl fail when the tap device was set up on the 4.12 
>>> destination?
>>> int tap_probe_has_ufo(int fd)
>>> {
>>> unsigned offload;
>>>
>>> offload = TUN_F_CSUM | TUN_F_UFO;
>>>
>>> if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
>>> return 0;
>>>
>>> return 1;
>>> }
>>
>> Should there be any kernel output or how can i detect / check it?
> 
> For both, the simplest answer is probably just using printf.

arg i missed an important part. The kernel is an opensuse SLE15 one.

I've seen it contains the following patchset:
https://www.spinics.net/lists/netdev/msg443821.html

Greets,
Stefan

Re: [Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-06 Thread Stefan Priebe - Profihost AG

Hi Paolo,

Am 06.11.2017 um 10:40 schrieb Paolo Bonzini:
> On 06/11/2017 10:38, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> i've upgraded some servers from kernel 4.4 to 4.12 - both running Qemu
>> 2.9.1.
>>
>> If i migrate a VM from a host running kernel 4.4 to a host running 4.12
>> i get:
>>
>> kvm: virtio-net: saved image requires TUN_F_UFO support
>> kvm: Failed to load virtio-net-device:tmp
>> kvm: Failed to load virtio-net:virtio
>> kvm: error while loading state for instance 0x0 of device
>> ':00:12.0/virtio-net'
>> kvm: load of migration failed: Invalid argument
>>
>>
>> while migrating from 4.12 to 4.4 works fine.
>>
>> Can anybody help? Is this expected?
> 
> Can you check why peer_has_ufo failed (in hw/net/virtio-net.c)?

May be - how can i archieve this? Patching the code is not a problem if
you can give me a hint.

> Also, did this ioctl fail when the tap device was set up on the 4.12 
> destination?
> int tap_probe_has_ufo(int fd)
> {
> unsigned offload;
> 
> offload = TUN_F_CSUM | TUN_F_UFO;
> 
> if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
> return 0;
> 
> return 1;
> }

Should there be any kernel output or how can i detect / check it?

Greets,
Stefan

> Thanks,
> 
> Paolo
>

[Qemu-devel] kvm: virtio-net: saved image requires TUN_F_UFO support

2017-11-06 Thread Stefan Priebe - Profihost AG

Hello,

i've upgraded some servers from kernel 4.4 to 4.12 - both running Qemu
2.9.1.

If i migrate a VM from a host running kernel 4.4 to a host running 4.12
i get:

kvm: virtio-net: saved image requires TUN_F_UFO support
kvm: Failed to load virtio-net-device:tmp
kvm: Failed to load virtio-net:virtio
kvm: error while loading state for instance 0x0 of device
':00:12.0/virtio-net'
kvm: load of migration failed: Invalid argument


while migrating from 4.12 to 4.4 works fine.

Can anybody help? Is this expected?

Greets,
Stefan

Re: [Qemu-devel] virtio: network: lost tcp/ip packets

2017-08-30 Thread Stefan Priebe - Profihost AG

Hello Stefan,


Am 30.08.2017 um 19:17 schrieb Stefan Hajnoczi:
> On Fri, Aug 18, 2017 at 04:40:36PM +0200, Stefan Priebe - Profihost AG wrote:
>> i've a problem with two VMs running on the SAME host machine using qemu
>> 2.7.1 or 2.9.0 and vhost_net + virtio.
>>
>> Sometimes TCP packets going from machine a to machine b are simply lost.
>> I see them in VM A using tcpdump going out but they never come in on
>> machine B. Both machines have iptables and stuff turned off.
>>
>> On the host both VMs network interfaces are connected to the same bridge.
>>
>> Any ideas? May be a known bug?
>>
>> Host and Guest Kernel is an OpenSuSE 42.3 kernel based on vanilla 4.4.82.
> 
> Have you tcpdumped the tap interfaces on the host?
> 
currently i can't reproduce the issue - even i was for two days.
Currently i've no idea what has happend and will report back if it
happens again.

Thanks!

Greets,
Stefan

Re: [Qemu-devel] virtio: network: lost tcp/ip packets

2017-08-21 Thread Stefan Priebe - Profihost AG

Hello,

does nobody have an idea?

Greets,
Stefam

Am 18.08.2017 um 16:40 schrieb Stefan Priebe - Profihost AG:
> Hello,
> 
> i've a problem with two VMs running on the SAME host machine using qemu
> 2.7.1 or 2.9.0 and vhost_net + virtio.
> 
> Sometimes TCP packets going from machine a to machine b are simply lost.
> I see them in VM A using tcpdump going out but they never come in on
> machine B. Both machines have iptables and stuff turned off.
> 
> On the host both VMs network interfaces are connected to the same bridge.
> 
> Any ideas? May be a known bug?
> 
> Host and Guest Kernel is an OpenSuSE 42.3 kernel based on vanilla 4.4.82.
> 
> Thanks!
> 
> Greets,
> Stefan
>

[Qemu-devel] virtio: network: lost tcp/ip packets

2017-08-18 Thread Stefan Priebe - Profihost AG

Hello,

i've a problem with two VMs running on the SAME host machine using qemu
2.7.1 or 2.9.0 and vhost_net + virtio.

Sometimes TCP packets going from machine a to machine b are simply lost.
I see them in VM A using tcpdump going out but they never come in on
machine B. Both machines have iptables and stuff turned off.

On the host both VMs network interfaces are connected to the same bridge.

Any ideas? May be a known bug?

Host and Guest Kernel is an OpenSuSE 42.3 kernel based on vanilla 4.4.82.

Thanks!

Greets,
Stefan

Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?

2016-12-19 Thread Stefan Priebe - Profihost AG


Am 19.12.2016 um 12:03 schrieb Stefan Hajnoczi:
> On Fri, Dec 16, 2016 at 10:00:36PM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 15.12.2016 um 07:46 schrieb Alexandre DERUMIER:
>>> does rollbacking the kernel to previous version fix the problem ?
>>
>> The culprit is the used tuned agent from Redhat
>> (https://github.com/redhat-performance/tuned). The used profile
>> virtual-host results in these problems. Stopping tuned or using another
>> profile like throughput-performance everything is fine again.
> 
> Interesting discovery.  Have you filed a bug report about it?

As we're not using RHEL, nor CentOS i didn't expect this to make sense
;-) I just believed that something in the kernel changed.

Greets,
Stefan

>> after upgrading a cluster OS, Qemu, ... i'm experiencing slow and
>> volatile network speeds inside my VMs.
>>
>> Currently I've no idea what causes this but it's related to the host
>> upgrades. Before i was running Qemu 2.6.2.
>>
>> I'm using virtio for the network cards.
> 
> Stefan
>

Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?

2016-12-16 Thread Stefan Priebe - Profihost AG


Am 15.12.2016 um 07:46 schrieb Alexandre DERUMIER:
> does rollbacking the kernel to previous version fix the problem ?

The culprit is the used tuned agent from Redhat
(https://github.com/redhat-performance/tuned). The used profile
virtual-host results in these problems. Stopping tuned or using another
profile like throughput-performance everything is fine again.

Geets,
Stefan

> 
> i'm not sure if "perf" could give you some hints
> - Mail original -
> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag>
> À: "aderumier" <aderum...@odiso.com>
> Cc: "qemu-devel" <qemu-devel@nongnu.org>
> Envoyé: Mercredi 14 Décembre 2016 21:36:23
> Objet: Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?
> 
> Am 14.12.2016 um 16:33 schrieb Alexandre DERUMIER: 
>> Hi Stefan, 
>>
>> do you have upgraded kernel ? 
> 
> Yes sure. But i'm out of ideas how to debug. Sometimes it gives me 
> constant 80MB/s, sometimes 125 and sometimes only 6. While on the host 
> the cards are not busy. 
> 
> Greets, 
> Stefan 
> 
>>
>> maybe it could be related to vhost-net module too. 
>>
>>
>> - Mail original - 
>> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag> 
>> À: "qemu-devel" <qemu-devel@nongnu.org> 
>> Envoyé: Mercredi 14 Décembre 2016 16:04:08 
>> Objet: [Qemu-devel] any known virtio-net regressions in Qemu 2.7? 
>>
>> Hello, 
>>
>> after upgrading a cluster OS, Qemu, ... i'm experiencing slow and 
>> volatile network speeds inside my VMs. 
>>
>> Currently I've no idea what causes this but it's related to the host 
>> upgrades. Before i was running Qemu 2.6.2. 
>>
>> I'm using virtio for the network cards. 
>>
>> Greets, 
>> Stefan 
>>
>

Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?

2016-12-14 Thread Stefan Priebe - Profihost AG

Am 14.12.2016 um 16:33 schrieb Alexandre DERUMIER:
> Hi Stefan,
> 
> do you have upgraded kernel ?

Yes sure. But i'm out of ideas how to debug. Sometimes it gives me
constant 80MB/s, sometimes 125 and sometimes only 6. While on the host
the cards are not busy.

Greets,
Stefan

> 
> maybe it could be related to vhost-net module too.
> 
> 
> - Mail original -----
> De: "Stefan Priebe, Profihost AG" <s.pri...@profihost.ag>
> À: "qemu-devel" <qemu-devel@nongnu.org>
> Envoyé: Mercredi 14 Décembre 2016 16:04:08
> Objet: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?
> 
> Hello, 
> 
> after upgrading a cluster OS, Qemu, ... i'm experiencing slow and 
> volatile network speeds inside my VMs. 
> 
> Currently I've no idea what causes this but it's related to the host 
> upgrades. Before i was running Qemu 2.6.2. 
> 
> I'm using virtio for the network cards. 
> 
> Greets, 
> Stefan 
>

[Qemu-devel] any known virtio-net regressions in Qemu 2.7?

2016-12-14 Thread Stefan Priebe - Profihost AG

Hello,

after upgrading a cluster OS, Qemu, ... i'm experiencing slow and
volatile network speeds inside my VMs.

Currently I've no idea what causes this but it's related to the host
upgrades. Before i was running Qemu 2.6.2.

I'm using virtio for the network cards.

Greets,
Stefan

Re: [Qemu-devel] Qemu 2.6 => Qemu 2.7 migration: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-balloon'

2016-11-15 Thread Stefan Priebe - Profihost AG

Am 15.11.2016 um 12:07 schrieb Ladi Prosek:
> Hi,
> 
> On Tue, Nov 15, 2016 at 11:37 AM, Stefan Priebe - Profihost AG
> <s.pri...@profihost.ag> wrote:
>> Hello,
>>
>> Am 15.11.2016 um 11:30 schrieb Dr. David Alan Gilbert:
>>> * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
>>>> Hello,
>>>>
>>>> today i did a first live migration from Qemu 2.6.2 to Qemu 2.7.0. The VM
>>>> is running windows and virtio-balloon and with machine type pc-i440fx-2.5.
>>>>
>>>> The output of the target qemu process was:
>>>> kvm_apic_post_load: Yeh
>>>> kvm_apic_post_load: Yeh
>>>> kvm_apic_post_load: Yeh
>>>> kvm_apic_post_load: Yeh
>>>> kvm: VQ 2 size 0x80 < last_avail_idx 0x1 - used_idx 0x4
>>>> kvm: error while loading state for instance 0x0 of device
>>>> ':00:03.0/virtio-balloon'
>>>> kvm: load of migration failed: Operation not permitted
>>>
>>> Yes that's a known bug; only seems to affect windows guests, and I believe
>>> doesn't even need to cross versions.
>>>
>>> There's a bunch of fixes that Stefan applied to virtio code
>>> that I think fix this; I see that he cc'd qemu-stable.
>>> I think it's 4b7f91ed, but I'm not sure if there are others needed.
>>
>> thanks for pointing to that commit.
>>
>> Stefan can you tell me whether it's enough to cherry-pick 4b7f91ed into
>> 2.7.0 ?
> 
> I don't believe that 4b7f91ed will help here (no device reset on
> migration). We've seen this error with QEMU running without:
> 
>   commit 4eae2a657d1ff5ada56eb9b4966eae0eff333b0b
>   Author: Ladi Prosek <lpro...@redhat.com>
>   Date:   Tue Mar 1 12:14:03 2016 +0100
> 
>   balloon: fix segfault and harden the stats queue
> 
> 
> Is it possible that the VM has run on such a QEMU, then was
> live-migrated to 2.6.2, and then to 2.7.0?

Hi,

yes, it was started under Qemu 2.5.0. Was then migrated to 2.6.2 and
then to 2.7.0.

Greets,
Stefan

> 
> Thanks,
> Ladi
> 
>> Greets,
>> Stefan
>>
>>>
>>> Dave
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>>
>>

Re: [Qemu-devel] Qemu 2.6 => Qemu 2.7 migration: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-balloon'

2016-11-15 Thread Stefan Priebe - Profihost AG

Hello,

Am 15.11.2016 um 11:30 schrieb Dr. David Alan Gilbert:
> * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
>> Hello,
>>
>> today i did a first live migration from Qemu 2.6.2 to Qemu 2.7.0. The VM
>> is running windows and virtio-balloon and with machine type pc-i440fx-2.5.
>>
>> The output of the target qemu process was:
>> kvm_apic_post_load: Yeh
>> kvm_apic_post_load: Yeh
>> kvm_apic_post_load: Yeh
>> kvm_apic_post_load: Yeh
>> kvm: VQ 2 size 0x80 < last_avail_idx 0x1 - used_idx 0x4
>> kvm: error while loading state for instance 0x0 of device
>> ':00:03.0/virtio-balloon'
>> kvm: load of migration failed: Operation not permitted
> 
> Yes that's a known bug; only seems to affect windows guests, and I believe
> doesn't even need to cross versions.
> 
> There's a bunch of fixes that Stefan applied to virtio code
> that I think fix this; I see that he cc'd qemu-stable.
> I think it's 4b7f91ed, but I'm not sure if there are others needed.

thanks for pointing to that commit.

Stefan can you tell me whether it's enough to cherry-pick 4b7f91ed into
2.7.0 ?

Greets,
Stefan

> 
> Dave
>>
>> Greets,
>> Stefan
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

[Qemu-devel] Qemu 2.6 => Qemu 2.7 migration: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-balloon'

2016-11-14 Thread Stefan Priebe - Profihost AG

Hello,

today i did a first live migration from Qemu 2.6.2 to Qemu 2.7.0. The VM
is running windows and virtio-balloon and with machine type pc-i440fx-2.5.

The output of the target qemu process was:
kvm_apic_post_load: Yeh
kvm_apic_post_load: Yeh
kvm_apic_post_load: Yeh
kvm_apic_post_load: Yeh
kvm: VQ 2 size 0x80 < last_avail_idx 0x1 - used_idx 0x4
kvm: error while loading state for instance 0x0 of device
':00:03.0/virtio-balloon'
kvm: load of migration failed: Operation not permitted

Greets,
Stefan

Re: [Qemu-devel] [PATCH] net: fix qemu_announce_self not emitting packets

2016-06-07 Thread Stefan Priebe - Profihost AG


Am 07.06.2016 um 09:37 schrieb Peter Lieven:
> commit fefe2a78 accidently dropped the code path for injecting
> raw packets. This feature is needed for sending gratuitous ARPs
> after an incoming migration has completed. The result is increased
> network downtime for vservers where the network card is not virtio-net
> with the VIRTIO_NET_F_GUEST_ANNOUNCE feature.
> 
> Fixes: fefe2a78abde932e0f340b21bded2c86def1d242
> Cc: qemu-sta...@nongnu.org
> Cc: yan...@cn.fujitsu.com
> Signed-off-by: Peter Lieven 
> ---
>  net/net.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/net.c b/net/net.c
> index 5f3e5a9..d5834ea 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -722,7 +722,11 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
>  return 0;
>  }
>  
> -if (nc->info->receive_iov) {
> +if (flags & QEMU_NET_PACKET_FLAG_RAW && iovcnt == 1 &&
> +nc->info->receive_raw) {
> +/* this is required for qemu_announce_self() */
> +ret = nc->info->receive_raw(nc, iov[0].iov_base, iov[0].iov_len);
> +} else  if (nc->info->receive_iov) {
>  ret = nc->info->receive_iov(nc, iov, iovcnt);
>  } else {
>  ret = nc_sendv_compat(nc, iov, iovcnt, flags);
> 

Thanks for the patch. Sadly it does not fix our problem. So we still
have another Problem.

Thanks!

Greets,
Stefan

Re: [Qemu-devel] [Qemu-stable] Broken live Migration in Qemu 2.5.1.1?

2016-06-07 Thread Stefan Priebe - Profihost AG


Am 07.06.2016 um 09:38 schrieb Peter Lieven:
> Am 06.06.2016 um 18:13 schrieb Stefan Priebe - Profihost AG:
>> We're most probably seeing the same while migrating a machine running
>> balanceng but haven't thought this might be a qemu bug. Instead we're
>> investigating with balanceng people.
>>
>> Waiting for your further results.
> 
> Can you try the patch
> 
> net: fix qemu_announce_self not emitting packets

Thx - will wait for a new patch based on Paolo's comments - if that's ok.

Stefan

> I just send to the mailing list.
> 
> Thanks,
> Peter
>

Re: [Qemu-devel] [Qemu-stable] Broken live Migration in Qemu 2.5.1.1?

2016-06-06 Thread Stefan Priebe - Profihost AG

We're most probably seeing the same while migrating a machine running balanceng 
but haven't thought this might be a qemu bug. Instead we're investigating with 
balanceng people.

Waiting for your further results.

Greets,
Stefan

Excuse my typo sent from my mobile phone.

> Am 06.06.2016 um 17:51 schrieb Peter Lieven :
> 
>> Am 06.06.2016 um 15:32 schrieb Peter Lieven:
>> Hi,
>> 
>> during internal testing of Qemu 2.5.1.1 I found a vServer running Ubuntu 
>> 12.04 (kernel 3.13) and a slave SQL server to
>> stop replicating from the master. This seems to be reproducible. It is 
>> possible to continue replication when issuing a slave stop / slave start.
>> There is no error visible on the vServer.
>> 
>> Has anyone a fix in mind that could be related to such an issue?
>> 
>> Host kernel in Linux 4.4, Guest kernel 3.13. Guest driver is virtio-blk via 
>> iSCSI. Emulated vCPU is Westmere.
> 
> After a lot of testing I found out that obviously thats no block driver 
> problem, but a regression in the virtio-net or the network stack.
> 
> qemu_announce_self() is generating packets for all NICs, but it seems they 
> are no longer emitted. This worked at least in qemu-2.2.0 with
> the same guest kernel and host kernel.
> 
> I will continue debugging tomorrow why this happens.
> 
> Peter
>

Re: [Qemu-devel] drive-backup

2016-02-24 Thread Stefan Priebe - Profihost AG


Am 22.02.2016 um 23:08 schrieb John Snow:
> 
> 
> On 02/22/2016 03:21 PM, Stefan Priebe wrote:
>> Hello,
>>
>> is there any chance or hack to work with a bigger cluster size for the
>> drive backup job?
>>
>> See:
>> http://git.qemu.org/?p=qemu.git;a=blob;f=block/backup.c;h=16105d40b193be9bb40346027bdf58e62b956a96;hb=98d2c6f2cd80afaa2dc10091f5e35a97c181e4f5
>>
>>
>> This is very slow with ceph - may be due to the 64k block size. I would
>> like to check whether this is faster with cephs native block size of 4mb.
>>
>> Greets,
>> Stefan
>>
> 
> It's hardcoded to 64K at the moment, but I am checking in a patch to
> round up the cluster size to be the bigger of (64k,
> $target_cluster_size) in order to make sure that incremental backups in
> particular never copy a fraction of a cluster. As a side-effect, the
> same round-up will happen for all modes (sync=top,none,full).
> 
> If QEMU is aware of the target cluster size of 4MB, this would
> immediately jump the copy-size up to 4MB clusters for you.
> 
> See: https://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg02839.html

Thanks for your patches and thanks for your great answer. But our
problem is not the target but the source ;-) The target has a local
cache and don't care about the cluster size but the source does not.

But it works fine if we change the default cluster size to 4MB. So it
has point us to the right direction.

Stefan

> 
> Otherwise, after my trivial fix, you should find cluster_size to be a
> mutable concept and perhaps one that you could introduce a runtime
> parameter for if you could convince the necessary parties that it's
> needed in the API.
> 
> You'd have to be careful in the case of incremental that all the various
> cluster sizes work well together:
> 
> - Granularity of bitmap (Defaults to cluster size of source, or 64K if
> unknown. can be configured to be arbitrarily larger.)
> - Cluster size of source file (For qcow2, defaults to 64k)
> - Cluster size of target file
> - Cluster size of backup routine (Currently always 64K)
> 
> I think that LCM(source_cluster_size, target_cluster_size,
> backup_cluster_size) = MAX(A, B, C) will always be a safe minimum.
> 
> Bitmap granularity is more flexible, and it is more efficient when the
> bitmap granularity matches the backup cluster size, but it can cope with
> mismatches. If the bitmap is smaller or larger than the backup cluster
> size, it generally means that more data that is clean is going to be
> transferred across the pipe.
> 
> (Hmm, I wonder if it's worth checking in code to adjust a bitmap
> granularity after it has been created so people can easily experiment
> with performance tweaking here.)
> 
> 
> Let me know if any of my ramble sounds interesting :)
> --John
>

Re: [Qemu-devel] trace in arch/x86/kernel/apic/apic.c:1309 setup_local_APIC

2016-01-26 Thread Stefan Priebe - Profihost AG


Am 26.01.2016 um 11:13 schrieb Yang Zhang:
> On 2016/1/26 15:22, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> Am 26.01.2016 um 02:46 schrieb Han, Huaitong:
>>> What is the host kernel version and host dmesg information? And does
>>> the problem exist when you use latest kernel and QEMU to replace old
>>> binary file?
>>
>> Guest and Host is both 4.1.15. You mean the complete dmesg output from
>> host?
>>
>> What do you mean with replace old binary file? I haven't tested Kernel
>> 4.4 as we use 4.1 as it is a long term stable kernel release.
> 
> Have you seen this before? I mean use the old KVM like 3.10?

Guest or host? To test with a guest would be quite easy. Downgrading the
host is very difficult not sure if the hw is supported.

> 
>>
>> Stefan
>>
>>> On Mon, 2016-01-25 at 14:51 +0100, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> while running qemu 2.4 on whestmere CPUs i'm pretty often getting
>>>> this
>>>> one while booting:
>>>> [0.811645] Switched APIC routing to physical x2apic.
>>>> [1.835678] [ cut here ]
>>>> [1.835704] WARNING: CPU: 0 PID: 1 at
>>>> arch/x86/kernel/apic/apic.c:1309 setup_local_APIC+0x284/0x340()
>>>> [1.835714] Modules linked in:
>>>> [1.835724] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.1.15+72-ph
>>>> #1
>>>> [1.835731] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>> BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>>>> [1.835743]  b69ffcea 88042d5e3d68 b669c37b
>>>> 0918
>>>> [1.835754]   88042d5e3da8 b6080d67
>>>> 88042d5e3da8
>>>> [1.835765]  0001  8000
>>>> 
>>>> [1.835777] Call Trace:
>>>> [1.835789]  [] dump_stack+0x45/0x57
>>>> [1.835799]  [] warn_slowpath_common+0x97/0xe0
>>>> [1.835806]  [] warn_slowpath_null+0x1a/0x20
>>>> [1.835813]  [] setup_local_APIC+0x284/0x340
>>>> [1.835824]  [] apic_bsp_setup+0x5b/0xb0
>>>> [1.835832]  []
>>>> native_smp_prepare_cpus+0x23b/0x295
>>>> [1.835842]  [] kernel_init_freeable+0xc7/0x20f
>>>> [1.835853]  [] ? rest_init+0x80/0x80
>>>> [1.835860]  [] kernel_init+0xe/0xf0
>>>> [1.835870]  [] ret_from_fork+0x42/0x70
>>>> [1.835877]  [] ? rest_init+0x80/0x80
>>>> [1.835891] ---[ end trace bdbe630a8de2832c ]---
>>>> [1.837613] Spurious LAPIC timer interrupt on cpu 0
>>>> [1.837957] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>>>> [1.939574] smpboot: CPU0: Intel Westmere E56xx/L56xx/X56xx
>>>> (Nehalem-C) (fam: 06, model: 2c, stepping: 01)
>>>> [1.939630] Performance Events: unsupported p6 CPU model 44 no PMU
>>>> driver, software events only.
>>>> [1.950868] KVM setup paravirtual spinlock
>>>>
>>>> Greets,
>>>> Stefan
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
>

Re: [Qemu-devel] trace in arch/x86/kernel/apic/apic.c:1309 setup_local_APIC

2016-01-26 Thread Stefan Priebe - Profihost AG


Am 26.01.2016 um 12:39 schrieb Yang Zhang:
> On 2016/1/26 18:43, Stefan Priebe - Profihost AG wrote:
>>
>> Am 26.01.2016 um 11:13 schrieb Yang Zhang:
>>> On 2016/1/26 15:22, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>>
>>>> Am 26.01.2016 um 02:46 schrieb Han, Huaitong:
>>>>> What is the host kernel version and host dmesg information? And does
>>>>> the problem exist when you use latest kernel and QEMU to replace old
>>>>> binary file?
>>>>
>>>> Guest and Host is both 4.1.15. You mean the complete dmesg output from
>>>> host?
>>>>
>>>> What do you mean with replace old binary file? I haven't tested Kernel
>>>> 4.4 as we use 4.1 as it is a long term stable kernel release.
>>>
>>> Have you seen this before? I mean use the old KVM like 3.10?
>>
>> Guest or host? To test with a guest would be quite easy. Downgrading the
>> host is very difficult not sure if the hw is supported.
> 
> Host. Does the issue only exist on the Westmere CPU?

Yes. All E5 Xeons v1, v2, v3 are working fine and i've never seen this
message.

Stefan

> 
>>
>>>
>>>>
>>>> Stefan
>>>>
>>>>> On Mon, 2016-01-25 at 14:51 +0100, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi,
>>>>>>
>>>>>> while running qemu 2.4 on whestmere CPUs i'm pretty often getting
>>>>>> this
>>>>>> one while booting:
>>>>>> [0.811645] Switched APIC routing to physical x2apic.
>>>>>> [1.835678] [ cut here ]
>>>>>> [1.835704] WARNING: CPU: 0 PID: 1 at
>>>>>> arch/x86/kernel/apic/apic.c:1309 setup_local_APIC+0x284/0x340()
>>>>>> [1.835714] Modules linked in:
>>>>>> [1.835724] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.1.15+72-ph
>>>>>> #1
>>>>>> [1.835731] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>>> BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>>>>>> [1.835743]  b69ffcea 88042d5e3d68 b669c37b
>>>>>> 0918
>>>>>> [1.835754]   88042d5e3da8 b6080d67
>>>>>> 88042d5e3da8
>>>>>> [1.835765]  0001  8000
>>>>>> 
>>>>>> [1.835777] Call Trace:
>>>>>> [1.835789]  [] dump_stack+0x45/0x57
>>>>>> [1.835799]  [] warn_slowpath_common+0x97/0xe0
>>>>>> [1.835806]  [] warn_slowpath_null+0x1a/0x20
>>>>>> [1.835813]  [] setup_local_APIC+0x284/0x340
>>>>>> [1.835824]  [] apic_bsp_setup+0x5b/0xb0
>>>>>> [1.835832]  []
>>>>>> native_smp_prepare_cpus+0x23b/0x295
>>>>>> [1.835842]  [] kernel_init_freeable+0xc7/0x20f
>>>>>> [1.835853]  [] ? rest_init+0x80/0x80
>>>>>> [1.835860]  [] kernel_init+0xe/0xf0
>>>>>> [1.835870]  [] ret_from_fork+0x42/0x70
>>>>>> [1.835877]  [] ? rest_init+0x80/0x80
>>>>>> [1.835891] ---[ end trace bdbe630a8de2832c ]---
>>>>>> [1.837613] Spurious LAPIC timer interrupt on cpu 0
>>>>>> [1.837957] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>>>>>> [1.939574] smpboot: CPU0: Intel Westmere E56xx/L56xx/X56xx
>>>>>> (Nehalem-C) (fam: 06, model: 2c, stepping: 01)
>>>>>> [1.939630] Performance Events: unsupported p6 CPU model 44 no PMU
>>>>>> driver, software events only.
>>>>>> [1.950868] KVM setup paravirtual spinlock
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>> the body of a message to majord...@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>>
> 
>

Re: [Qemu-devel] trace in arch/x86/kernel/apic/apic.c:1309 setup_local_APIC

2016-01-25 Thread Stefan Priebe - Profihost AG

Hi,

Am 26.01.2016 um 02:46 schrieb Han, Huaitong:
> What is the host kernel version and host dmesg information? And does
> the problem exist when you use latest kernel and QEMU to replace old
> binary file?

Guest and Host is both 4.1.15. You mean the complete dmesg output from host?

What do you mean with replace old binary file? I haven't tested Kernel
4.4 as we use 4.1 as it is a long term stable kernel release.

Stefan

> On Mon, 2016-01-25 at 14:51 +0100, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> while running qemu 2.4 on whestmere CPUs i'm pretty often getting
>> this
>> one while booting:
>> [0.811645] Switched APIC routing to physical x2apic.
>> [1.835678] [ cut here ]
>> [1.835704] WARNING: CPU: 0 PID: 1 at
>> arch/x86/kernel/apic/apic.c:1309 setup_local_APIC+0x284/0x340()
>> [1.835714] Modules linked in:
>> [1.835724] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.1.15+72-ph
>> #1
>> [1.835731] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>> [1.835743]  b69ffcea 88042d5e3d68 b669c37b
>> 0918
>> [1.835754]   88042d5e3da8 b6080d67
>> 88042d5e3da8
>> [1.835765]  0001  8000
>> 
>> [1.835777] Call Trace:
>> [1.835789]  [] dump_stack+0x45/0x57
>> [1.835799]  [] warn_slowpath_common+0x97/0xe0
>> [1.835806]  [] warn_slowpath_null+0x1a/0x20
>> [1.835813]  [] setup_local_APIC+0x284/0x340
>> [1.835824]  [] apic_bsp_setup+0x5b/0xb0
>> [1.835832]  []
>> native_smp_prepare_cpus+0x23b/0x295
>> [1.835842]  [] kernel_init_freeable+0xc7/0x20f
>> [1.835853]  [] ? rest_init+0x80/0x80
>> [1.835860]  [] kernel_init+0xe/0xf0
>> [1.835870]  [] ret_from_fork+0x42/0x70
>> [1.835877]  [] ? rest_init+0x80/0x80
>> [1.835891] ---[ end trace bdbe630a8de2832c ]---
>> [1.837613] Spurious LAPIC timer interrupt on cpu 0
>> [1.837957] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>> [1.939574] smpboot: CPU0: Intel Westmere E56xx/L56xx/X56xx
>> (Nehalem-C) (fam: 06, model: 2c, stepping: 01)
>> [1.939630] Performance Events: unsupported p6 CPU model 44 no PMU
>> driver, software events only.
>> [1.950868] KVM setup paravirtual spinlock
>>
>> Greets,
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Qemu-devel] trace in arch/x86/kernel/apic/apic.c:1309 setup_local_APIC

2016-01-25 Thread Stefan Priebe - Profihost AG

Hi,

while running qemu 2.4 on whestmere CPUs i'm pretty often getting this
one while booting:
[0.811645] Switched APIC routing to physical x2apic.
[1.835678] [ cut here ]
[1.835704] WARNING: CPU: 0 PID: 1 at
arch/x86/kernel/apic/apic.c:1309 setup_local_APIC+0x284/0x340()
[1.835714] Modules linked in:
[1.835724] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.1.15+72-ph #1
[1.835731] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[1.835743]  b69ffcea 88042d5e3d68 b669c37b
0918
[1.835754]   88042d5e3da8 b6080d67
88042d5e3da8
[1.835765]  0001  8000

[1.835777] Call Trace:
[1.835789]  [] dump_stack+0x45/0x57
[1.835799]  [] warn_slowpath_common+0x97/0xe0
[1.835806]  [] warn_slowpath_null+0x1a/0x20
[1.835813]  [] setup_local_APIC+0x284/0x340
[1.835824]  [] apic_bsp_setup+0x5b/0xb0
[1.835832]  [] native_smp_prepare_cpus+0x23b/0x295
[1.835842]  [] kernel_init_freeable+0xc7/0x20f
[1.835853]  [] ? rest_init+0x80/0x80
[1.835860]  [] kernel_init+0xe/0xf0
[1.835870]  [] ret_from_fork+0x42/0x70
[1.835877]  [] ? rest_init+0x80/0x80
[1.835891] ---[ end trace bdbe630a8de2832c ]---
[1.837613] Spurious LAPIC timer interrupt on cpu 0
[1.837957] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[1.939574] smpboot: CPU0: Intel Westmere E56xx/L56xx/X56xx
(Nehalem-C) (fam: 06, model: 2c, stepping: 01)
[1.939630] Performance Events: unsupported p6 CPU model 44 no PMU
driver, software events only.
[1.950868] KVM setup paravirtual spinlock

Greets,
Stefan

Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever

2015-11-09 Thread Stefan Priebe - Profihost AG


> - Original Message -
>> From: "Alexandre DERUMIER" 
>> To: "ceph-devel" 
>> Cc: "qemu-devel" , jdur...@redhat.com
>> Sent: Monday, November 9, 2015 5:48:45 AM
>> Subject: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and 
>> vm_stop is hanging forever
>>
>> adding to ceph.conf
>>
>> [client]
>> rbd_non_blocking_aio = false
>>
>>
>> fix the problem for me (with rbd_cache=false)
>>
>>
>> (@cc jdur...@redhat.com)

+1 same to me.

Stefan

>>
>>
>>
>> - Mail original -
>> De: "Denis V. Lunev" 
>> À: "aderumier" , "ceph-devel"
>> , "qemu-devel" 
>> Envoyé: Lundi 9 Novembre 2015 08:22:34
>> Objet: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop
>> is hanging forever
>>
>> On 11/09/2015 10:19 AM, Denis V. Lunev wrote:
>>> On 11/09/2015 06:10 AM, Alexandre DERUMIER wrote:
 Hi,

 with qemu (2.4.1), if I do an internal snapshot of an rbd device,
 then I pause the vm with vm_stop,

 the qemu process is hanging forever


 monitor commands to reproduce:


 # snapshot_blkdev_internal drive-virtio0 yoursnapname
 # stop




 I don't see this with qcow2 or sheepdog block driver for example.


 Regards,

 Alexandre

>>> this could look like the problem I have recenty trying to
>>> fix with dataplane enabled. Patch series is named as
>>>
>>> [PATCH for 2.5 v6 0/10] dataplane snapshot fixes
>>>
>>> Den
>>
>> anyway, even if above will not help, can you collect gdb
>> traces from all threads in QEMU process. May be I'll be
>> able to give a hit.
>>
>> Den
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [Qemu-devel] [Qemu-stable] [PULL 0/3] Cve 2015 5154 patches

2015-07-27 Thread Stefan Priebe - Profihost AG


Am 27.07.2015 um 14:01 schrieb John Snow:
 The following changes since commit f793d97e454a56d17e404004867985622ca1a63b:
 
   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2015-07-24 13:07:10 +0100)
 
 are available in the git repository at:
 
   https://github.com/jnsnow/qemu.git tags/cve-2015-5154-pull-request

Any details on this CVE? Is RCE possible? Only if IDE is used?

Stefan

 for you to fetch changes up to cb72cba83021fa42719e73a5249c12096a4d1cfc:
 
   ide: Clear DRQ after handling all expected accesses (2015-07-26 23:42:53 
 -0400)
 
 
 
 
 
 Kevin Wolf (3):
   ide: Check array bounds before writing to io_buffer (CVE-2015-5154)
   ide/atapi: Fix START STOP UNIT command completion
   ide: Clear DRQ after handling all expected accesses
 
  hw/ide/atapi.c |  1 +
  hw/ide/core.c  | 32 
  2 files changed, 29 insertions(+), 4 deletions(-)

Re: [Qemu-devel] [Qemu-stable] [PULL 0/3] Cve 2015 5154 patches

2015-07-27 Thread Stefan Priebe - Profihost AG


Am 27.07.2015 um 14:28 schrieb John Snow:
 
 
 On 07/27/2015 08:10 AM, Stefan Priebe - Profihost AG wrote:

 Am 27.07.2015 um 14:01 schrieb John Snow:
 The following changes since commit f793d97e454a56d17e404004867985622ca1a63b:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2015-07-24 13:07:10 +0100)

 are available in the git repository at:

   https://github.com/jnsnow/qemu.git tags/cve-2015-5154-pull-request

 Any details on this CVE? Is RCE possible? Only if IDE is used?

 Stefan

 
 It's a heap overflow. The most likely outcome is a segfault, but the
 guest is allowed to continue writing past the end of the PIO buffer at
 its leisure. This makes it similar to CVE-2015-3456.
 
 This CVE can be mitigated unlike CVE-2015-3456 by just removing the
 CD-ROM drive until the patch can be applied.

Thanks. The seclist article explicitly references xen. So it does not
apply to qemu/kvm? Sorry for asking may be stupid questions.

Stefan

 for you to fetch changes up to cb72cba83021fa42719e73a5249c12096a4d1cfc:

   ide: Clear DRQ after handling all expected accesses (2015-07-26 23:42:53 
 -0400)

 

 

 Kevin Wolf (3):
   ide: Check array bounds before writing to io_buffer (CVE-2015-5154)
   ide/atapi: Fix START STOP UNIT command completion
   ide: Clear DRQ after handling all expected accesses

  hw/ide/atapi.c |  1 +
  hw/ide/core.c  | 32 
  2 files changed, 29 insertions(+), 4 deletions(-)

[Qemu-devel] Query CPU model / type

2015-07-15 Thread Stefan Priebe - Profihost AG

Hi,

is there a way to query the current cpu model / type of a running qemu
machine?

I mean host, kvm64, qemu64, ...

Stefan

Re: [Qemu-devel] Qemu 2.2.1 black screen of death in windows 2012 R2

2015-03-24 Thread Stefan Priebe - Profihost AG


Am 24.03.2015 um 11:45 schrieb Paolo Bonzini:
 
 
 On 24/03/2015 11:39, Stefan Priebe - Profihost AG wrote:
 after upgrading Qemu from 2.2.0 to 2.2.1

 Windows 2012 R2 works after installing. But after applying 72 updates it
 breaks with a black screen of death.
 
 Can you bisect it?

Have to try might be possible.

 Linking to this KB:
 https://support.microsoft.com/en-us/kb/2939259
 
 That KB, I think, refers to running a hypervisor (e.g. VMware
 Workstation) _inside_ Windows.

Screenshot of the windows screen attached. This is just a blank windows
2012 R2 with virtio 94 drivers installed.

Stefan

[Qemu-devel] Qemu 2.2.1 black screen of death in windows 2012 R2

2015-03-24 Thread Stefan Priebe - Profihost AG

Hi,

after upgrading Qemu from 2.2.0 to 2.2.1

Windows 2012 R2 works after installing. But after applying 72 updates it
breaks with a black screen of death.

Linking to this KB:
https://support.microsoft.com/en-us/kb/2939259

It works fine with Qemu 2.2.0

Greets,
Stefan

Re: [Qemu-devel] live migration fails after host kernel upgrade (3.12 = 3.18)

2015-03-23 Thread Stefan Priebe - Profihost AG

Hi,

Thanks.

I fixed it - there is already a patchseries in 4.0 to fix this. It will
be backported in 3.18.10 or 3.18.11.

Stefan

Am 23.03.2015 um 12:54 schrieb Stefan Hajnoczi:
 On Sun, Mar 15, 2015 at 09:59:25AM +0100, Stefan Priebe wrote:
 after upgrading the host kernel from 3.12 to 3.18 live migration fails with
 the following qemu output (guest running on a host with 3.12 = host with
 3.18):

 kvm: Features 0x30afffe3 unsupported. Allowed features: 0x79bfbbe7
 qemu: warning: error while loading state for instance 0x0 of device
 ':00:12.0/virtio-net'
 kvm: load of migration failed: Operation not permitted

 But i can't
 
 Hi Stefan,
 I haven't checked the exact feature bits but it might be the UFO feature
 problem.  The tun driver was changed to drop a feature, this broke live
 migration.
 
 Take a look at
 https://lists.linuxfoundation.org/pipermail/virtualization/2015-February/029217.html
 
 Stefan

Re: [Qemu-devel] slow speed for virtio-scsi since qemu 2.2

2015-02-16 Thread Stefan Priebe - Profihost AG

Hi,

Am 16.02.2015 um 13:24 schrieb Paolo Bonzini:
 
 
 On 15/02/2015 19:46, Stefan Priebe wrote:

 while i get a constant random 4k i/o write speed of 20.000 iops with
 qemu 2.1.0 or 2.1.3. I get jumping speeds with qemu 2.2 (jumping between
 500 iops and 15.000 iop/s).

 If i use virtio instead of virtio-scsi speed is the same between 2.2 and
 2.1.
 
 http://wiki.qemu.org/Contribute/ReportABug
 
 What is your command line?
 
 Paolo
 

Could it be that this is a results of compiling qemu with --enable-debug
to get debugging symbols?

Stefan

Re: [Qemu-devel] slow speed for virtio-scsi since qemu 2.2

2015-02-16 Thread Stefan Priebe - Profihost AG

Hi,
Am 16.02.2015 um 15:44 schrieb Paolo Bonzini:
 
 
 On 16/02/2015 15:43, Stefan Priebe - Profihost AG wrote:
 Hi,

 Am 16.02.2015 um 13:24 schrieb Paolo Bonzini:


 On 15/02/2015 19:46, Stefan Priebe wrote:

 while i get a constant random 4k i/o write speed of 20.000 iops with
 qemu 2.1.0 or 2.1.3. I get jumping speeds with qemu 2.2 (jumping between
 500 iops and 15.000 iop/s).

 If i use virtio instead of virtio-scsi speed is the same between 2.2 and
 2.1.

 http://wiki.qemu.org/Contribute/ReportABug

 What is your command line?

 Could it be that this is a results of compiling qemu with --enable-debug
 to get debugging symbols?
 
 Yes.

*urg* my fault - sorry. Is there a way to get debugging symbols without
using enable-debug / getting slower performance?

Greets,
Stefan

Re: [Qemu-devel] slow speed for virtio-scsi since qemu 2.2

2015-02-16 Thread Stefan Priebe - Profihost AG

Hi,

 Am 16.02.2015 um 15:58 schrieb Andrey Korolyov and...@xdel.ru:
 
 On Mon, Feb 16, 2015 at 5:47 PM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Hi,
 Am 16.02.2015 um 15:44 schrieb Paolo Bonzini:
 
 
 On 16/02/2015 15:43, Stefan Priebe - Profihost AG wrote:
 Hi,
 
 Am 16.02.2015 um 13:24 schrieb Paolo Bonzini:
 
 
 On 15/02/2015 19:46, Stefan Priebe wrote:
 
 while i get a constant random 4k i/o write speed of 20.000 iops with
 qemu 2.1.0 or 2.1.3. I get jumping speeds with qemu 2.2 (jumping between
 500 iops and 15.000 iop/s).
 
 If i use virtio instead of virtio-scsi speed is the same between 2.2 and
 2.1.
 
 http://wiki.qemu.org/Contribute/ReportABug
 
 What is your command line?
 
 Could it be that this is a results of compiling qemu with --enable-debug
 to get debugging symbols?
 
 Yes.
 
 *urg* my fault - sorry. Is there a way to get debugging symbols without
 using enable-debug / getting slower performance?
 
 Hi Stefan, splitdebug will help you there. Also this is a standard way
 to ship debugging symbols in most distribuitions, so I wonder if you
 may use just a generic build skeleton for this task...

I'm using enable-debug with dh_strip from Debian

Stefan

Re: [Qemu-devel] slow speed for virtio-scsi since qemu 2.2

2015-02-16 Thread Stefan Priebe - Profihost AG


 Am 16.02.2015 um 15:49 schrieb Paolo Bonzini pbonz...@redhat.com:
 
 
 
 On 16/02/2015 15:47, Stefan Priebe - Profihost AG wrote:
 
 Could it be that this is a results of compiling qemu with --enable-debug
 to get debugging symbols?
 
 Yes.
 *urg* my fault - sorry. Is there a way to get debugging symbols without
 using enable-debug / getting slower performance?
 
 Yes, just do nothing (--enable-debug-info is the default;
 --enable-debug enables debug info _and_ turns off optimization).

If I do not enable-debug dh_strip does not extract any debugging symbols.

Stefan 

 
 Paolo

Re: [Qemu-devel] [PATCH] aio: fix qemu_bh_schedule() bh-ctx race condition

2014-06-03 Thread Stefan Priebe - Profihost AG

Tested-by: Stefan Priebe s.pri...@profihost.ag

Am 03.06.2014 11:21, schrieb Stefan Hajnoczi:
 qemu_bh_schedule() is supposed to be thread-safe at least the first time
 it is called.  Unfortunately this is not quite true:
 
   bh-scheduled = 1;
   aio_notify(bh-ctx);
 
 Since another thread may run the BH callback once it has been scheduled,
 there is a race condition if the callback frees the BH before
 aio_notify(bh-ctx) has a chance to run.
 
 Reported-by: Stefan Priebe s.pri...@profihost.ag
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  async.c | 14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)
 
 diff --git a/async.c b/async.c
 index 6930185..5b6fe6b 100644
 --- a/async.c
 +++ b/async.c
 @@ -117,15 +117,21 @@ void qemu_bh_schedule_idle(QEMUBH *bh)
  
  void qemu_bh_schedule(QEMUBH *bh)
  {
 +AioContext *ctx;
 +
  if (bh-scheduled)
  return;
 +ctx = bh-ctx;
  bh-idle = 0;
 -/* Make sure that idle  any writes needed by the callback are done
 - * before the locations are read in the aio_bh_poll.
 +/* Make sure that:
 + * 1. idle  any writes needed by the callback are done before the
 + *locations are read in the aio_bh_poll.
 + * 2. ctx is loaded before scheduled is set and the callback has a chance
 + *to execute.
   */
 -smp_wmb();
 +smp_mb();
  bh-scheduled = 1;
 -aio_notify(bh-ctx);
 +aio_notify(ctx);
  }

Re: [Qemu-devel] qemu 2.0 segfaults in event notifier

2014-06-02 Thread Stefan Priebe - Profihost AG


 Am 02.06.2014 um 15:40 schrieb Stefan Hajnoczi stefa...@gmail.com:
 
 On Fri, May 30, 2014 at 04:10:39PM +0200, Stefan Priebe wrote:
 even with
 +From 271c0f68b4eae72691721243a1c37f46a3232d61 Mon Sep 17 00:00:00 2001
 +From: Fam Zheng f...@redhat.com
 +Date: Wed, 21 May 2014 10:42:13 +0800
 +Subject: [PATCH] aio: Fix use-after-free in cancellation path
 
 applied i saw today segfault with the following backtrace:
 
 Program terminated with signal 11, Segmentation fault.
 #0  0x7f9dd633343f in event_notifier_set (e=0x124) at 
 util/event_notifier-posix.c:97
 97  util/event_notifier-posix.c: No such file or directory.
 (gdb) bt
 #0  0x7f9dd633343f in event_notifier_set (e=0x124) at 
 util/event_notifier-posix.c:97
 #1  0x7f9dd5f4eafc in aio_notify (ctx=0x0) at async.c:246
 #2  0x7f9dd5f4e697 in qemu_bh_schedule (bh=0x7f9b98eeeb30) at async.c:128
 #3  0x7f9dd5fa2c44 in rbd_finish_aiocb (c=0x7f9dd9069ad0, 
 rcb=0x7f9dd85f1770) at block/rbd.c:585
 
 Hi Stefan,
 Please print the QEMUBH:
 (gdb) p *(QEMUBH*)0x7f9b98eeeb30
 
 It would also be interesting to print out the qemu_aio_context-first_bh
 linked list of QEMUBH structs to check whether 0x7f9b98eeeb30 is on the
 list.
 
 The aio_bh_new() and aio_bh_schedule() APIs are supposed to be
 thread-safe.  In theory the rbd.c code is fine.  But maybe there is a
 race condition somewhere.
 
 If you want to debug interactively, ping me on #qemu on irc.oftc.net.

Hi,

that would be great what's your username? On trip right now. Will be on irc in 
4-5 hours or tomorrow in 16 hours.

Greets,
Stefan

Re: [Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work

2014-05-14 Thread Stefan Priebe - Profihost AG

Hi,

i now was able to catch the error.

It is:
Length mismatch: :00:12.0/virtio-net-pci.rom: 4 in != 1
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed

Stefan

Am 09.05.2014 19:05, schrieb Paolo Bonzini:
 Il 09/05/2014 15:13, Stefan Priebe - Profihost AG ha scritto:

 I see no output at the monitor of Qemu 2.0.

 # migrate -d tcp:a.b.c.d:6000

 # info migrate
 capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: on
 zero-blocks: off
 Migration status: failed
 total time: 0 milliseconds

 The target machine is still running at this point with no output.
 
 Anything on its stdout?
 
 Another test you could do, in addition to changing the devices, is this:
 
 1) try with commit 6141f3bd6904df7cf9519c6444a14a608b9874c4 on the
 destination (the next one caused a migration problem that was fixed
 later).  If it passes, go to step 1a.  If it fails, go to step 2.
 
 1a) try with commit c01a71c1a56fa27f43449ff59e5d03b2483658a2.  If it
 passes, go to step 1b.  If it fails, report it here.
 
 1b) bisect between v2.0.0 (known-bad) and
 c01a71c1a56fa27f43449ff59e5d03b2483658a2 (known-good) to find the
 culprit.  Report results.
 
 2) change the source to v1.7.0 and bisect between v1.7.0 and
 6141f3bd6904df7cf9519c6444a14a608b9874c4.  Report results.
 
 Thanks!
 
 Paolo

Re: [Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work

2014-05-14 Thread Stefan Priebe - Profihost AG

Am 14.05.2014 10:11, schrieb Paolo Bonzini:
 Il 14/05/2014 09:17, Stefan Priebe - Profihost AG ha scritto:
 i now was able to catch the error.

 It is:
 Length mismatch: :00:12.0/virtio-net-pci.rom: 4 in != 1
 qemu: warning: error while loading state for instance 0x0 of device 'ram'
 load of migration failed
 
 This is a bug of your distribution.  The file sizes for ROMs should
 never change.  In particular, if you round the sizes up to the next
 power of 2 you should always have:
 
 128k for bios.bin
 256k for bios-256k.bin
 64k for pxe-*.rom
 256k for efi-*.rom
 64k for vgabios-*
 
 Unfortunately, most distribution get pxe-*.rom sizes wrong, because at
 some point iPXE grew more features and didn't fit in 64k anymore with
 the default configuration.  I know at least Fedora does.
 
 The solution is to copy the binaries from the QEMU git repository
 (directory pc-bios/) to /usr/share/qemu.

Hi,

i compile qemu on my own.

I have the rom files under /usr/share/kvm and they look like this:
ls -la /usr/share/kvm/*.rom
-rw-r--r-- 1 root root 173568 May 14 09:39 /usr/share/kvm/efi-e1000.rom
-rw-r--r-- 1 root root 174592 May 14 09:39 /usr/share/kvm/efi-eepro100.rom
-rw-r--r-- 1 root root 173056 May 14 09:39 /usr/share/kvm/efi-ne2k_pci.rom
-rw-r--r-- 1 root root 173056 May 14 09:39 /usr/share/kvm/efi-pcnet.rom
-rw-r--r-- 1 root root 176640 May 14 09:39 /usr/share/kvm/efi-rtl8139.rom
-rw-r--r-- 1 root root 171008 May 14 09:39 /usr/share/kvm/efi-virtio.rom
-rw-r--r-- 1 root root  67072 May 14 09:39 /usr/share/kvm/pxe-e1000.rom
-rw-r--r-- 1 root root  61440 May 14 09:39 /usr/share/kvm/pxe-eepro100.rom
-rw-r--r-- 1 root root  61440 May 14 09:39 /usr/share/kvm/pxe-ne2k_pci.rom
-rw-r--r-- 1 root root  61440 May 14 09:39 /usr/share/kvm/pxe-pcnet.rom
-rw-r--r-- 1 root root  61440 May 14 09:39 /usr/share/kvm/pxe-rtl8139.rom
-rw-r--r-- 1 root root  60416 May 14 09:39 /usr/share/kvm/pxe-virtio.rom

currently i don't know what's wrong.

Stefan

Re: [Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work

2014-05-14 Thread Stefan Priebe - Profihost AG

Am 14.05.2014 10:36, schrieb Paolo Bonzini:
 Il 14/05/2014 10:29, Stefan Priebe - Profihost AG ha scritto:
 Hi,

 i compile qemu on my own.

 I have the rom files under /usr/share/kvm and they look like this:
 ls -la /usr/share/kvm/*.rom
 -rw-r--r-- 1 root root 173568 May 14 09:39 /usr/share/kvm/efi-e1000.rom
 -rw-r--r-- 1 root root 174592 May 14 09:39
 /usr/share/kvm/efi-eepro100.rom
 -rw-r--r-- 1 root root 173056 May 14 09:39
 /usr/share/kvm/efi-ne2k_pci.rom
 -rw-r--r-- 1 root root 173056 May 14 09:39 /usr/share/kvm/efi-pcnet.rom
 -rw-r--r-- 1 root root 176640 May 14 09:39 /usr/share/kvm/efi-rtl8139.rom
 -rw-r--r-- 1 root root 171008 May 14 09:39 /usr/share/kvm/efi-virtio.rom
 -rw-r--r-- 1 root root  67072 May 14 09:39 /usr/share/kvm/pxe-e1000.rom
 -rw-r--r-- 1 root root  61440 May 14 09:39
 /usr/share/kvm/pxe-eepro100.rom
 -rw-r--r-- 1 root root  61440 May 14 09:39
 /usr/share/kvm/pxe-ne2k_pci.rom
 -rw-r--r-- 1 root root  61440 May 14 09:39 /usr/share/kvm/pxe-pcnet.rom
 -rw-r--r-- 1 root root  61440 May 14 09:39 /usr/share/kvm/pxe-rtl8139.rom
 -rw-r--r-- 1 root root  60416 May 14 09:39 /usr/share/kvm/pxe-virtio.rom

 currently i don't know what's wrong.
 
 What about the source machine?

Currently it has the same as i already updated the package there too.

So you mean i had done a mistake compiling the old package - so it had
wrong sizes?

Greets.
Stefan

Re: [Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work

2014-05-14 Thread Stefan Priebe - Profihost AG


Am 14.05.2014 11:00, schrieb Paolo Bonzini:
 Il 14/05/2014 10:38, Stefan Priebe - Profihost AG ha scritto:
 Currently it has the same as i already updated the package there too.

 So you mean i had done a mistake compiling the old package - so it had
 wrong sizes?
 
 Yes, probably.
 
 Can you do an info mtree for a machine that's running on the source?
 
 Paolo

here it is:
# info mtree
memory
-7ffe (prio 0, RW): system
  -dfff (prio 0, RW): alias ram-below-4g
@pc.ram -dfff
  000a-000b (prio 1, RW): alias smram-region
@pci 000a-000b
  000c-000c3fff (prio 1, R-): alias pam-rom @pc.ram
000c-000c3fff
  000c4000-000c7fff (prio 1, R-): alias pam-rom @pc.ram
000c4000-000c7fff
  000c8000-000cbfff (prio 1, R-): alias pam-rom @pc.ram
000c8000-000cbfff
  000ca000-000ccfff (prio 1000, RW): alias kvmvapic-rom
@pc.ram 000ca000-000ccfff
  000cc000-000c (prio 1, R-): alias pam-rom @pc.ram
000cc000-000c
  000d-000d3fff (prio 1, RW): alias pam-ram @pc.ram
000d-000d3fff
  000d4000-000d7fff (prio 1, RW): alias pam-ram @pc.ram
000d4000-000d7fff
  000d8000-000dbfff (prio 1, RW): alias pam-ram @pc.ram
000d8000-000dbfff
  000dc000-000d (prio 1, RW): alias pam-ram @pc.ram
000dc000-000d
  000e-000e3fff (prio 1, RW): alias pam-ram @pc.ram
000e-000e3fff
  000e4000-000e7fff (prio 1, RW): alias pam-ram @pc.ram
000e4000-000e7fff
  000e8000-000ebfff (prio 1, RW): alias pam-ram @pc.ram
000e8000-000ebfff
  000ec000-000e (prio 1, RW): alias pam-ram @pc.ram
000ec000-000e
  000f-000f (prio 1, R-): alias pam-rom @pc.ram
000f-000f
  e000- (prio 0, RW): alias pci-hole @pci
e000-
  fec0-fec00fff (prio 0, RW): kvm-ioapic
  fed0-fed003ff (prio 0, RW): hpet
  fee0-feef (prio 4096, RW): icc-apic-container
fee0-feef (prio 0, RW): kvm-apic-msi
  0001-00021fff (prio 0, RW): alias ram-above-4g
@pc.ram e000-0001
  00022000-40021fff (prio 0, RW): alias pci-hole64 @pci
00022000-40021fff
I/O
- (prio 0, RW): io
  -0007 (prio 0, RW): dma-chan
  0008-000f (prio 0, RW): dma-cont
  0020-0021 (prio 0, RW): kvm-pic
  0040-0043 (prio 0, RW): kvm-pit
  0060-0060 (prio 0, RW): i8042-data
  0061-0061 (prio 0, RW): elcr
  0064-0064 (prio 0, RW): i8042-cmd
  0070-0071 (prio 0, RW): rtc
  007e-007f (prio 0, RW): kvmvapic
  0080-0080 (prio 0, RW): ioport80
  0081-0083 (prio 0, RW): alias dma-page
@dma-page 0081-0083
  0087-0087 (prio 0, RW): alias dma-page
@dma-page 0087-0087
  0089-008b (prio 0, RW): alias dma-page
@dma-page 0089-008b
  008f-008f (prio 0, RW): alias dma-page
@dma-page 008f-008f
  0092-0092 (prio 0, RW): port92
  00a0-00a1 (prio 0, RW): kvm-pic
  00b2-00b3 (prio 0, RW): apm-io
  00c0-00cf (prio 0, RW): dma-chan
  00d0-00df (prio 0, RW): dma-cont
  00f0-00f0 (prio 0, RW): ioportF0
  0170-0177 (prio 0, RW): alias ide @ide
0170-0177
  01f0-01f7 (prio 0, RW): alias ide @ide
01f0-01f7
  0376-0376 (prio 0, RW): alias ide @ide
0376-0376
  03b0-03df (prio 0, RW): cirrus-io
  03f1-03f5 (prio 0, RW): alias fdc @fdc
03f1-03f5
  03f6-03f6 (prio 0, RW): alias ide @ide
03f6-03f6
  03f7-03f7 (prio 0, RW): alias fdc @fdc
03f7-03f7
  04d0-04d0 (prio 0, RW): kvm-elcr
  04d1-04d1 (prio 0, RW): kvm-elcr
  0505-0505 (prio 0, RW): pvpanic

[Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work

2014-05-09 Thread Stefan Priebe - Profihost AG

Hello list,

i was trying to migrate older Qemu (1.5 and 1.7.2) to a machine running
Qemu 2.0.

I started the target machine with:

-machine type=pc-i440fx-1.5 / -machine type=pc-i440fx-1.7

But the migration simply fails. Migrating Qemu 2.0 to Qemu 2.0 succeeds.

I see no output at the monitor of Qemu 2.0.

# migrate -d tcp:a.b.c.d:6000

# info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: on
zero-blocks: off
Migration status: failed
total time: 0 milliseconds

The target machine is still running at this point with no output.

Stefan

Re: [Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work

2014-05-09 Thread Stefan Priebe - Profihost AG


 Am 09.05.2014 um 15:41 schrieb Dr. David Alan Gilbert dgilb...@redhat.com:
 
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 Hello list,
 
 i was trying to migrate older Qemu (1.5 and 1.7.2) to a machine running
 Qemu 2.0.
 
 I started the target machine with:
 
 -machine type=pc-i440fx-1.5 / -machine type=pc-i440fx-1.7
 
 I'd expect you to have to run with the same machine type on both sides.
 I ran some simple virt-test migrate tests (just the basic one) and got
 v1.5.3-v1.6.2
 v1.5.3-v1.7.1
 v1.5.3-v2.0.0-rc1
   working for most machine types, pc-i440fx-1.5 passed with all of those.
 Note that's only the simplest test.

My test involved USB Controller virtio Network cards and rbd virtio-scsi drives.

Can you give me your simple start line so I could test if this works for me 
while adding some more arguments.

Thanks!

Stefan

 
 Dave
 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 00/35] pc: ACPI memory hotplug

2014-05-07 Thread Stefan Priebe - Profihost AG

Hello Igor,

while testing your patchset i came to a very stupid problem.

I wanted to test migration and it cames out that the migration works
fine after plugging in memory only if i run the target VM without the
-daemonize option.

If i enable the -daemonize option the target vm tries to read from non
readable memory.

proc maps shows:
7f9334021000-7f933800 ---p  00:00 0

where it tries to read from.

Also the memory layout is different in daemonize mode than in non
daemonize mode.

Stefan

Am 04.04.2014 15:36, schrieb Igor Mammedov:
 What's new since v7:
 
 * Per Andreas' suggestion dropped DIMMBus concept.
 
 * Added hotplug binding for bus-less devices
 
 * DIMM device is split to backend and frontend. Therefore following
   command/options were added for supporting it:
 
   For memory-ram backend:
   CLI: -object-add memory-ram,
   with options: 'id' and 'size'
   For dimm frontend:
   option size became readonly, pulling it's size from attached backend
   added option memdev for specifying backend by 'id'
 
 * dropped support for 32 bit guests
 
 * failed hotplug action doesn't consume 1 slot anymore
 
 * vaious fixes adressing reviewer's comments most of them in ACPI part
 ---
 
 This series allows to hotplug 'arbitrary' DIMM devices specifying size,
 NUMA node mapping (guest side), slot and address where to map it, at runtime.
 
 Due to ACPI limitation there is need to specify a number of possible
 DIMM devices. For this task -m option was extended to support
 following format:
 
   -m [mem=]RamSize[,slots=N,maxmem=M]
 
 To allow memory hotplug user must specify a pair of additional parameters:
 'slots' - number of possible increments
 'maxmem' - max possible total memory size QEMU is allowed to use,
including RamSize.
 
 minimal monitor command syntax to hotplug DIMM device:
 
   object_add memory-ram,id=memX,size=1G
   device_add dimm,id=dimmX,memdev=memX
 
 DIMM device provides following properties that could be used with
 device_add / -device to alter default behavior:
 
   id- unique string identifying device [mandatory]
   slot  - number in range [0-slots) [optional], if not specified
   the first free slot is used
   node  - NUMA node id [optional] (default: 0)
   size  - amount of memory to add, readonly derived from backing memdev
   start - guest's physical address where to plug DIMM [optional],
   if not specified the first gap in hotplug memory region
   that fits DIMM is used
 
  -device option could be used for adding potentially hotunplugable DIMMs
 and also for specifying hotplugged DIMMs in migration case.
 
 Tested guests:
  - RHEL 6x64
  - Windows 2012DCx64
  - Windows 2008DCx64
 
 Known limitations/bugs/TODOs:
  - hot-remove is not supported, yet
  - max number of supported DIMM devices 255 (due to ACPI object name
limit), could be increased creating several containers and putting
DIMMs there. (exercise for future) 
  - e820 table doesn't include DIMM devices added with -device /
(or after reboot devices added with device_add)
  - Windows 2008 remembers DIMM configuration, so if DIMM with other
start/size is added into the same slot, it refuses to use it insisting
on old mapping.
 
 QEMU git tree for testing is available at:
   https://github.com/imammedo/qemu/commits/memory-hotplug-v8
 
 Example QEMU cmd line:
   qemu-system-x86_64 -enable-kvm -monitor unix:/tmp/mon,server,nowait \ 
  -m 4096,slots=4,maxmem=8G guest.img
 
 PS:
   Windows guest requires SRAT table for hotplug to work so add an extra 
 option:
-numa node
   to QEMU command line.
 
 
 Igor Mammedov (34):
   vl: convert -m to QemuOpts
   object_add: allow completion handler to get canonical path
   add memdev backend infrastructure
   vl.c: extend -m option to support options for memory hotplug
   add pc-{i440fx,q35}-2.1 machine types
   pc: create custom generic PC machine type
   qdev: hotplug for buss-less devices
   qdev: expose DeviceState.hotplugged field as a property
   dimm: implement dimm device abstraction
   memory: add memory_region_is_mapped() API
   dimm: do not allow to set already busy memdev
   pc: initialize memory hotplug address space
   pc: exit QEMU if slots  256
   pc: add 'etc/reserved-memory-end' fw_cfg interface for SeaBIOS
   pc: add memory hotplug handler to PC_MACHINE
   dimm: add busy address check and address auto-allocation
   dimm: add busy slot check and slot auto-allocation
   acpi: rename cpu_hotplug_defs.h to acpi_defs.h
   acpi: memory hotplug ACPI hardware implementation
   trace: add acpi memory hotplug IO region events
   trace: add DIMM slot  address allocation for target-i386
   acpi:piix4: make plug/unlug callbacks generic
   acpi:piix4: add memory hotplug handling
   pc: ich9 lpc: make it work with global/compat properties
   acpi:ich9: add memory hotplug handling
   pc: migrate piix4  ich9 MemHotplugState
   pc: propagate memory hotplug event to ACPI device

[Qemu-devel] multiple USB Serial devices qemu load

2014-03-26 Thread Stefan Priebe - Profihost AG

Hello,

i've 4 USB serial devices and one HID device i want to pass to a guest.

The passing itself works fine but while the guest has 0 load or cpu
usage the qemu process itself has around 40% cpu usage on a single 3,2
ghz E3 xeon.

I already tried xhci but it doesn't change anything. Also the latency of
the usb devices is very high and fluctuate. Qemu Version 1.7.1

Inside the guest:
# lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M
|__ Port 1: Dev 2, If 0, Class=HID, Driver=usbhid, 480M
|__ Port 2: Dev 3, If 0, Class=HID, Driver=usbfs, 12M
|__ Port 3: Dev 4, If 0, Class=vend., Driver=ftdi_sio, 12M
|__ Port 4: Dev 5, If 0, Class=hub, Driver=hub/8p, 12M
|__ Port 1: Dev 6, If 0, Class=vend., Driver=ftdi_sio, 12M
|__ Port 2: Dev 7, If 0, Class=comm., Driver=cdc_acm, 12M
|__ Port 2: Dev 7, If 1, Class=data, Driver=cdc_acm, 12M
|__ Port 3: Dev 8, If 0, Class=vend., Driver=ftdi_sio, 12M

Qremu command line:

qemu -pidfile /var/run/qemu-server/102.pid -daemonize -name test -smp
sockets=1,cores=1 -nodefaults -boot
menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu
kvm64,+lahf_lm,+x2apic,+sep -k de -m 1024 -device
nec-usb-xhci,id=xhci,bus=pci.0,addr=0x1.0x2 -device
usb-tablet,id=tablet,bus=xhci.0,port=1 -device
usb-host,hostbus=2,hostport=1.2 -device usb-host,hostbus=2,hostport=1.3
-device usb-host,hostbus=2,hostport=1.4 -device
usb-host,hostbus=2,hostport=1.1.1 -device
usb-host,hostbus=2,hostport=1.1.4 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -device
virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive
file=/ssdstor/images/102/vm-102-disk-1.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=writeback,discard=on,aio=native
-device
scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0
-netdev
type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,vhost=on
-device
virtio-net-pci,mac=3A:53:02:E3:76:59,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300

Does anybody have an idea how to lower the CPU usage?

Greets,
Stefan

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-11 Thread Stefan Priebe - Profihost AG


Am 11.02.2014 14:32, schrieb Orit Wasserman:
 On 02/08/2014 09:23 PM, Stefan Priebe wrote:
 i could fix it by explicitly disable xbzrle - it seems its
 automatically on if i do not set the migration caps to false.

 So it seems to be a xbzrle bug.

 
 XBZRLE is disabled by default (actually all capabilities are off by
 default)
 What version of QEMU are you using that you need to disable it explicitly?
 Maybe you run migration with XBZRLE and canceled it, so it stays on?

No real idea why this happens - but yes this seems to be a problem for me.

But the bug in XBZRLE is still there ;-)

Stefan

 Orit
 
 Stefan

 Am 07.02.2014 21:10, schrieb Stefan Priebe:
 Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe (s.pri...@profihost.ag) wrote:
 anything i could try or debug? to help to find the problem?

 I think the most useful would be to see if the problem is
 a new problem in the 1.7 you're using or has existed
 for a while; depending on the machine type you used, it might
 be possible to load that image on an earlier (or newer) qemu
 and try the same test, however if the problem doesn't
 repeat reliably it can be hard.

 I've seen this first with Qemu 1.5 but was not able to reproduce it for
 month. 1.4 was working fine.

 If you have any way of simplifying the configuration of the
 VM it would be good; e.g. if you could get a failure on
 something without graphics (-nographic) and USB.

 Sadly not ;-(

 Dave


 Stefan

 Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG:
 it's always the same pattern there are too many 0 instead of X.

 only seen:

 read:0x ... expected:0x

 or

 read:0x ... expected:0x

 or

 read:0xbf00bf00 ... expected:0xbfffbfff

 or

 read:0x ... expected:0xb5b5b5b5b5b5b5b5

 no idea if this helps.

 Stefan

 Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG:
 Hi,
 Am 07.02.2014 14:19, schrieb Paolo Bonzini:
 Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto:
 first of all i've now a memory image of a VM where i can
 reproduce it.

 You mean you start that VM with -incoming 'exec:cat
 /path/to/vm.img'?
 But google stress test doesn't report any error until you start
 migration _and_ it finishes?

 Sorry no i meant i have a VM where i saved the memory to disk - so i
 don't need to wait hours until i can reproduce as it does not happen
 with a fresh started VM. So it's a state file i think.

 Another test:

 - start the VM with -S, migrate, do errors appear on the
 destination?

 I started with -S and the errors appear AFTER resuming/unpause
 the VM.
 So it is fine until i resume it on the new host.

 Stefan


 -- 
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-11 Thread Stefan Priebe - Profihost AG

Am 11.02.2014 14:45, schrieb Orit Wasserman:
 On 02/11/2014 03:33 PM, Stefan Priebe - Profihost AG wrote:

 Am 11.02.2014 14:32, schrieb Orit Wasserman:
 On 02/08/2014 09:23 PM, Stefan Priebe wrote:
 i could fix it by explicitly disable xbzrle - it seems its
 automatically on if i do not set the migration caps to false.

 So it seems to be a xbzrle bug.


 XBZRLE is disabled by default (actually all capabilities are off by
 default)
 What version of QEMU are you using that you need to disable it
 explicitly?
 Maybe you run migration with XBZRLE and canceled it, so it stays on?

 No real idea why this happens - but yes this seems to be a problem for
 me.

 
 I checked upstream QEMU and it is still off by default (always been)

May be i had it on in the past and the VM was still running from an
older migration.

 But the bug in XBZRLE is still there ;-)

 
 We need to understand the exact scenario in order to understand the
 problem.
 
 What exact version of Qemu are you using?

Qemu 1.7.0

 Can you try with the latest upstream version, there were some fixes to the
 XBZRLE code?

Sadly not - i have some custom patches (not related to xbzrle) which
won't apply to current upstream.

But i could cherry-pick the ones you have in mind - if you give me the
commit ids.

Stefan

 Stefan

 Orit

 Stefan

 Am 07.02.2014 21:10, schrieb Stefan Priebe:
 Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe (s.pri...@profihost.ag) wrote:
 anything i could try or debug? to help to find the problem?

 I think the most useful would be to see if the problem is
 a new problem in the 1.7 you're using or has existed
 for a while; depending on the machine type you used, it might
 be possible to load that image on an earlier (or newer) qemu
 and try the same test, however if the problem doesn't
 repeat reliably it can be hard.

 I've seen this first with Qemu 1.5 but was not able to reproduce it
 for
 month. 1.4 was working fine.

 If you have any way of simplifying the configuration of the
 VM it would be good; e.g. if you could get a failure on
 something without graphics (-nographic) and USB.

 Sadly not ;-(

 Dave


 Stefan

 Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG:
 it's always the same pattern there are too many 0 instead of X.

 only seen:

 read:0x ... expected:0x

 or

 read:0x ... expected:0x

 or

 read:0xbf00bf00 ... expected:0xbfffbfff

 or

 read:0x ... expected:0xb5b5b5b5b5b5b5b5

 no idea if this helps.

 Stefan

 Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG:
 Hi,
 Am 07.02.2014 14:19, schrieb Paolo Bonzini:
 Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto:
 first of all i've now a memory image of a VM where i can
 reproduce it.

 You mean you start that VM with -incoming 'exec:cat
 /path/to/vm.img'?
 But google stress test doesn't report any error until you start
 migration _and_ it finishes?

 Sorry no i meant i have a VM where i saved the memory to disk -
 so i
 don't need to wait hours until i can reproduce as it does not
 happen
 with a fresh started VM. So it's a state file i think.

 Another test:

 - start the VM with -S, migrate, do errors appear on the
 destination?

 I started with -S and the errors appear AFTER resuming/unpause
 the VM.
 So it is fine until i resume it on the new host.

 Stefan


 -- 
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] memory allocation of migration changed?

2014-02-11 Thread Stefan Priebe - Profihost AG

Hello,

in the past (Qemu 1.5) a migration failed if there was not enogh memory
on the target host available directly at the beginning.

Now with Qemu 1.7 i've seen succeeded migrations but the kernel OOM
memory killer killing qemu processes. So the migration seems to takes
place without having anough memory on the target machine?

Could this be?

Greets,
Stefan

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

Am 07.02.2014 09:15, schrieb Alexandre DERUMIER:
 
 do you use xbzrle for live migration ?

no - i'm really stucked right now with this. Biggest problem i can't
reproduce with test machines ;-(

Stefan


 
 - Mail original - 
 
 De: Stefan Priebe s.pri...@profihost.ag 
 À: Dr. David Alan Gilbert dgilb...@redhat.com 
 Cc: Alexandre DERUMIER aderum...@odiso.com, qemu-devel 
 qemu-devel@nongnu.org 
 Envoyé: Jeudi 6 Février 2014 21:00:27 
 Objet: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap 
 file entry 
 
 Hi, 
 Am 06.02.2014 20:51, schrieb Dr. David Alan Gilbert: 
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: 
 some more things which happen during migration: 

 php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 
 error 4 in php-cgi[40+6d7000] 

 php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 
 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] 

 cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 
 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] 

 OK, so lets just assume some part of memory (or CPU state, or memory 
 loaded off disk...) 

 You said before that it was happening on a 32GB image - is it *only* 
 happening on a 32GB or bigger VM, or is it just more likely? 
 
 Not image, memory. I've only seen this with vms having more than 16GB or 
 32GB memory. But maybe this also indicates that just the migration takes 
 longer. 
 
 I think you also said you were using 1.7; have you tried an older 
 version - i.e. is this a regression in 1.7 or don't we know? 
 Don't know. Sadly i cannot reproduce this with test VMs only with 
 production ones. 
 
 Stefan 
 
 Dave 
 -- 
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG


Am 07.02.2014 10:15, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 Am 07.02.2014 09:15, schrieb Alexandre DERUMIER:

 do you use xbzrle for live migration ?

 no - i'm really stucked right now with this. Biggest problem i can't
 reproduce with test machines ;-(
 
 Only being able to test on your production VMs isn't fun;
 is it possible or you to run an extra program on these VMs - e.g.
 if we came up with a simple (userland) memory test?

You mean to reproduce?

I already tried https://code.google.com/p/stressapptest/ while migrating
on a test VM but this works fine.

I also tried running mysql bench while migrating on a test vm and this
works too ;-(

Stefan

 Dave
 

 Stefan



 - Mail original - 

 De: Stefan Priebe s.pri...@profihost.ag 
 À: Dr. David Alan Gilbert dgilb...@redhat.com 
 Cc: Alexandre DERUMIER aderum...@odiso.com, qemu-devel 
 qemu-devel@nongnu.org 
 Envoyé: Jeudi 6 Février 2014 21:00:27 
 Objet: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad 
 swap file entry 

 Hi, 
 Am 06.02.2014 20:51, schrieb Dr. David Alan Gilbert: 
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: 
 some more things which happen during migration: 

 php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 
 error 4 in php-cgi[40+6d7000] 

 php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 
 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] 

 cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 
 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] 

 OK, so lets just assume some part of memory (or CPU state, or memory 
 loaded off disk...) 

 You said before that it was happening on a 32GB image - is it *only* 
 happening on a 32GB or bigger VM, or is it just more likely? 

 Not image, memory. I've only seen this with vms having more than 16GB or 
 32GB memory. But maybe this also indicates that just the migration takes 
 longer. 

 I think you also said you were using 1.7; have you tried an older 
 version - i.e. is this a regression in 1.7 or don't we know? 
 Don't know. Sadly i cannot reproduce this with test VMs only with 
 production ones. 

 Stefan 

 Dave 
 -- 
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK 


 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

Hi,
Am 07.02.2014 10:29, schrieb Marcin Gibuła:
 do you use xbzrle for live migration ?

 no - i'm really stucked right now with this. Biggest problem i can't
 reproduce with test machines ;-(

 Only being able to test on your production VMs isn't fun;
 is it possible or you to run an extra program on these VMs - e.g.
 if we came up with a simple (userland) memory test?

 You mean to reproduce?

 I already tried https://code.google.com/p/stressapptest/ while migrating
 on a test VM but this works fine.

 I also tried running mysql bench while migrating on a test vm and this
 works too ;-(
 
 Have you tried to let test VM run idle for some time before migrating?
 (like 18-24 hours)
 
 Having the same (or very similar) problem, I had bigger luck with
 reproducing it by not using freshly started VMs.

no i haven't tried this will do so soon.

Stefan

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG


Am 07.02.2014 10:31, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:

 Am 07.02.2014 10:15, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 Am 07.02.2014 09:15, schrieb Alexandre DERUMIER:

 do you use xbzrle for live migration ?

 no - i'm really stucked right now with this. Biggest problem i can't
 reproduce with test machines ;-(

 Only being able to test on your production VMs isn't fun;
 is it possible or you to run an extra program on these VMs - e.g.
 if we came up with a simple (userland) memory test?

 You mean to reproduce?
 
 I'm more interested in seeing what type of corruption is happening;
 if you've got a test VM that corrupts memory and we can run a program
 in that vm that writes a known pattern into memory and checks it
 then see what changed after migration, it might give a clue.
 
 But obviously this would only be of any use if run on the VM that actually
 fails.

Right that makes sense - sadly i still don't know how to reproduce? Any
app ideas i can try?


 I already tried https://code.google.com/p/stressapptest/ while migrating
 on a test VM but this works fine.

 I also tried running mysql bench while migrating on a test vm and this
 works too ;-(
 
 
 Dave
 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

 Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa000d0(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa000d8(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00100(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00108(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00110(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00118(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00140(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00148(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00150(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00158(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00180(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00188(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00190(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa00198(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa001c0(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Report Error: miscompare : DIMM Unknown : 1 : 572s
Page Error: miscompare on CPU 7(0x) at 0x7f4d8aa001c8(0x0:DIMM
Unknown): read:0x, reread:0x
expected:0x
Log: Thread 61 found 85568 hardware incidents
Log: Thread 62 found 169344 hardware incidents
Log: Thread 63 found 44544 hardware incidents
Log: Thread 64 found 149504 hardware incidents
Log: Thread 65 found 131968 hardware incidents
Log: Thread 66 found 150528 hardware incidents
Log: Thread 67 found 144384 hardware incidents
Log: Thread 68 found 149888 hardware incidents
Stats: Found 1025728 hardware incidents
Stats: Completed: 9176812.00M in 524.63s 17491.92MB/s, with 1025728
hardware incidents, 0 errors
Stats: Memory Copy: 4890244.00M at 9402.74MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 4286568.00M at 8242.44MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: FAIL - test discovered HW problems
---

Stefan
Am 07.02.2014 10:37, schrieb Stefan Priebe - Profihost AG:
 
 Am 07.02.2014 10:31, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:

 Am 07.02.2014 10:15, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 Am 07.02.2014 09:15, schrieb Alexandre DERUMIER:

 do you use xbzrle for live migration ?

 no - i'm really stucked right now with this. Biggest problem i can't
 reproduce with test machines ;-(

 Only being able to test on your production VMs isn't fun;
 is it possible or you to run an extra program on these VMs - e.g.
 if we came up with a simple (userland) memory test?

 You mean to reproduce?

 I'm more interested in seeing what type of corruption is happening;
 if you've got a test VM that corrupts memory

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

Hi,
Am 07.02.2014 13:21, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 Hi,

 i was able to reproduce with a longer running test VM running the google
 stress test.
 
 Hmm that's quite a fun set of differences; I think I'd like
 to understand whether the pattern is related to the pattern of what
 the test is doing.
 
 Can you just give an explanation of exactly how you ran that test?
What you installed, how exactly you ran it.

While migrating i've still no reliable way to reproduce but i'll try to.

I can force the problem without migration when start with:
bin/stressapptest -s 3600 -m 20 -i 20 -C 20 --force_errors

= inject false errors to test error handling

Stefan

 Then Marcin and I can try and replicate it.
 
 Dave
 
 And it happens exacly when the migration finishes it does not happen
 while the migration is running.

 Google Stress Output displays Memory errors:
 
 Page Error: miscompare on CPU 5(0x) at 0x7f52431341c0(0x0:DIMM
 Unknown): read:0x004000bf, reread:0x004000bf
 expected:0x0040ffbf
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Page Error: miscompare on CPU 5(0x) at 0x7f52431341c8(0x0:DIMM
 Unknown): read:0x002000df, reread:0x002000df
 expected:0x0020ffdf
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34020(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34028(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34060(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34068(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a0(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340a8(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e0(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c340e8(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34120(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34128(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34160(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34168(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a0(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341a8(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e0(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c341e8(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected:0x
 Report Error: miscompare : DIMM Unknown : 1 : 571s
 Hardware Error: miscompare on CPU 3(0x) at 0x7f4fd5c34220(0x0:DIMM
 Unknown): read:0x, reread:0x
 expected

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

Hi,

Am 07.02.2014 13:44, schrieb Paolo Bonzini:
 Il 07/02/2014 13:30, Stefan Priebe - Profihost AG ha scritto:
  i was able to reproduce with a longer running test VM running the
 google
  stress test.

 Hmm that's quite a fun set of differences; I think I'd like
 to understand whether the pattern is related to the pattern of what
 the test is doing.
 
 Stefan, can you try to reproduce it:

first of all i've now a memory image of a VM where i can reproduce it.
reproducing does NOT work if i boot the VM freshly i need to let it run
for some hours.

Then just when the migration finishes there is a short time frame where
the google stress app reports memory errors than when the migration
finishes it runs fine again.

It seems to me it is related to pause and unpause/resume?

 - with Unix migration between two QEMUs on the same host
now tested = same issue

 - with different hosts
already tested = same issue

 - with a different network (e.g. just a cross cable between two machines)
already tested = same issue

Greets,
Stefan

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

Hi,

Am 07.02.2014 14:08, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 Hi,

 Am 07.02.2014 13:44, schrieb Paolo Bonzini:
 Il 07/02/2014 13:30, Stefan Priebe - Profihost AG ha scritto:
 i was able to reproduce with a longer running test VM running the
 google
 stress test.

 Hmm that's quite a fun set of differences; I think I'd like
 to understand whether the pattern is related to the pattern of what
 the test is doing.

 Stefan, can you try to reproduce it:

 first of all i've now a memory image of a VM where i can reproduce it.
 reproducing does NOT work if i boot the VM freshly i need to let it run
 for some hours.

 Then just when the migration finishes there is a short time frame where
 the google stress app reports memory errors than when the migration
 finishes it runs fine again.

 It seems to me it is related to pause and unpause/resume?
 
 But do you have to pause/resume it to cause the error? Have you got cases
 where you boot it and then leave it running for a few hours and then it 
 fails if you migrate it?

Yes but isn't migration always a pause / unpause at the end? I thought
migration_downtime is the value a very small pause unpause is allowed.

Stefan

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

Hi,

Am 07.02.2014 14:15, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 Hi,

 Am 07.02.2014 14:08, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote:
 
 first of all i've now a memory image of a VM where i can reproduce it.
 reproducing does NOT work if i boot the VM freshly i need to let it run
 for some hours.

 Then just when the migration finishes there is a short time frame where
 the google stress app reports memory errors than when the migration
 finishes it runs fine again.

 It seems to me it is related to pause and unpause/resume?

 But do you have to pause/resume it to cause the error? Have you got cases
 where you boot it and then leave it running for a few hours and then it 
 fails if you migrate it?

 Yes but isn't migration always a pause / unpause at the end? I thought
 migration_downtime is the value a very small pause unpause is allowed.
 
 There's a heck of a lot of other stuff that goes on in migration, and that
 downtime isn't quite the same.
 
 If it can be reproduced with just suspend/resume stuff then that's a different
 place to start looking than if it's migration only.

ah OK now i got it. No i can't reproduce with suspend resume. But while
migrating it happens directly at the end when the switch from host a to
b happens.

 Dave
 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

Hi,
Am 07.02.2014 14:19, schrieb Paolo Bonzini:
 Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto:
 first of all i've now a memory image of a VM where i can reproduce it.
 
 You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'?
 But google stress test doesn't report any error until you start
 migration _and_ it finishes?

Sorry no i meant i have a VM where i saved the memory to disk - so i
don't need to wait hours until i can reproduce as it does not happen
with a fresh started VM. So it's a state file i think.

 Another test:
 
 - start the VM with -S, migrate, do errors appear on the destination?

I started with -S and the errors appear AFTER resuming/unpause the VM.
So it is fine until i resume it on the new host.

Stefan

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Stefan Priebe - Profihost AG

it's always the same pattern there are too many 0 instead of X.

only seen:

read:0x ... expected:0x

or

read:0x ... expected:0x

or

read:0xbf00bf00 ... expected:0xbfffbfff

or

read:0x ... expected:0xb5b5b5b5b5b5b5b5

no idea if this helps.

Stefan

Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG:
 Hi,
 Am 07.02.2014 14:19, schrieb Paolo Bonzini:
 Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto:
 first of all i've now a memory image of a VM where i can reproduce it.

 You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'?
 But google stress test doesn't report any error until you start
 migration _and_ it finishes?
 
 Sorry no i meant i have a VM where i saved the memory to disk - so i
 don't need to wait hours until i can reproduce as it does not happen
 with a fresh started VM. So it's a state file i think.
 
 Another test:

 - start the VM with -S, migrate, do errors appear on the destination?
 
 I started with -S and the errors appear AFTER resuming/unpause the VM.
 So it is fine until i resume it on the new host.
 
 Stefan

Re: [Qemu-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-06 Thread Stefan Priebe - Profihost AG


Am 06.02.2014 11:22, schrieb Orit Wasserman:
 On 02/06/2014 09:20 AM, Stefan Priebe - Profihost AG wrote:
 Am 05.02.2014 21:15, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe (s.pri...@profihost.ag) wrote:
 Hello,

 after live migrating machines with a lot of memory (32GB, 48GB, ...)
 i see pretty often crashing services after migration and the guest
 kernel prints:

 [1707620.031806] swap_free: Bad swap file entry 00377410
 [1707620.031806] swap_free: Bad swap file entry 00593c48
 [1707620.031807] swap_free: Bad swap file entry 03201430
 [1707620.031807] swap_free: Bad swap file entry 01bc5900
 [1707620.031807] swap_free: Bad swap file entry 0173ce40
 [1707620.031808] swap_free: Bad swap file entry 011c0270
 [1707620.031808] swap_free: Bad swap file entry 03c58ae8
 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380
 idx:1 val:1536
 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380
 idx:2 val:-1536

 
 Is this live migration with shared storage? what kind of shared storage?

Yes - i'm using ceph / rbd.

 Does this happens with smaller guests?

Never seen that. Always with guest having memory  16GB.

 Qemu is 1.7

 Does anybody know a fix?

 I don't, but some more information about:
  1) What guest you're running

 Linux guest the output is also from the guest. Kernel 3.10.26

  2) The configuration of your hosts

 What do you mean by that?

  3) The command line (or XML if you're running libvirt) for
 your qemu so we can see what devices you're running.

 qemu -chardev
 socket,id=qmp,path=/var/run/qemu-server/179.qmp,server,nowait -mon
 chardev=qmp,mode=control -vnc
 unix:/var/run/qemu-server/179.vnc,x509,password -pidfile
 /var/run/qemu-server/179.pid -daemonize -name K31953 -smp
 sockets=1,cores=16 -nodefaults -boot
 menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu
 kvm64,+lahf_lm,+x2apic,+sep -k de -m 32768 -device
 piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device
 usb-tablet,id=tablet,bus=uhci.0,port=1 -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive
 if=none,id=drive-ide2,media=cdrom,aio=native -device
 ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -device
 virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive
 file=rbd:...,if=none,id=drive-scsi0,iops_rd=1000,iops_wr=500,bps_rd=314572800,bps_wr=209715200,aio=native,discard=on

 -device
 scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100

 -netdev
 type=tap,id=net0,ifname=tap179i0,script=/var/lib/qemu-server/pve-bridge,vhost=on

 -device
 virtio-net-pci,mac=CA:CA:23:AC:2D:C5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300

 -rtc base=localtime -machine type=pc-i440fx-1.7

 Do you get any messages on either the source or destination
 qemu during the migrate?

no

Stefan

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-06 Thread Stefan Priebe - Profihost AG


Am 06.02.2014 12:14, schrieb Alexandre DERUMIER:
 Do you force rbd_cache=true in ceph.conf?

no

 if yes, do you use cache=writeback ?

yes

So this should be safe.

PS: all my guests do not even have !!SWAP!!

# free|grep Swap
Swap:0  0  0

Stefan

 according to ceph doc:
 http://ceph.com/docs/next/rbd/qemu-rbd/
 
 Important If you set rbd_cache=true, you must set cache=writeback or risk 
 data loss. Without cache=writeback, QEMU will not send flush requests to 
 librbd. If QEMU exits uncleanly in this configuration, filesystems on top of 
 rbd can be corrupted.
 
 
 
 - Mail original - 
 
 De: Stefan Priebe s.pri...@profihost.ag 
 À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org 
 Envoyé: Mercredi 5 Février 2014 18:51:15 
 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry 
 
 Hello, 
 
 after live migrating machines with a lot of memory (32GB, 48GB, ...) i 
 see pretty often crashing services after migration and the guest kernel 
 prints: 
 
 [1707620.031806] swap_free: Bad swap file entry 00377410 
 [1707620.031806] swap_free: Bad swap file entry 00593c48 
 [1707620.031807] swap_free: Bad swap file entry 03201430 
 [1707620.031807] swap_free: Bad swap file entry 01bc5900 
 [1707620.031807] swap_free: Bad swap file entry 0173ce40 
 [1707620.031808] swap_free: Bad swap file entry 011c0270 
 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 
 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 
 val:1536 
 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 
 val:-1536 
 
 Qemu is 1.7 
 
 Does anybody know a fix? 
 
 Greets, 
 Stefan 
 ___ 
 pve-devel mailing list 
 pve-de...@pve.proxmox.com 
 http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-06 Thread Stefan Priebe - Profihost AG

May be,

sadly i've no idea. Only using 3.10 Kernel with XFS.

Stefan

Am 06.02.2014 12:40, schrieb Alexandre DERUMIER:
 PS: all my guests do not even have !!SWAP!! 
 
 Not sure is related to swap file.
 
 I found an similar problem here, triggered with suspend/resume on ext4
 
 http://lkml.indiana.edu/hypermail/linux/kernel/1106.3/01340.html
 
 
 Maybe is it a guest kernel bug ?
 
 - Mail original - 
 
 De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 À: Alexandre DERUMIER aderum...@odiso.com 
 Cc: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org 
 Envoyé: Jeudi 6 Février 2014 12:19:36 
 Objet: Re: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry 
 
 
 Am 06.02.2014 12:14, schrieb Alexandre DERUMIER: 
 Do you force rbd_cache=true in ceph.conf? 
 
 no 
 
 if yes, do you use cache=writeback ? 
 
 yes 
 
 So this should be safe. 
 
 PS: all my guests do not even have !!SWAP!! 
 
 # free|grep Swap 
 Swap: 0 0 0 
 
 Stefan 
 
 according to ceph doc: 
 http://ceph.com/docs/next/rbd/qemu-rbd/ 

 Important If you set rbd_cache=true, you must set cache=writeback or risk 
 data loss. Without cache=writeback, QEMU will not send flush requests to 
 librbd. If QEMU exits uncleanly in this configuration, filesystems on top of 
 rbd can be corrupted. 



 - Mail original - 

 De: Stefan Priebe s.pri...@profihost.ag 
 À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org 
 Envoyé: Mercredi 5 Février 2014 18:51:15 
 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry 

 Hello, 

 after live migrating machines with a lot of memory (32GB, 48GB, ...) i 
 see pretty often crashing services after migration and the guest kernel 
 prints: 

 [1707620.031806] swap_free: Bad swap file entry 00377410 
 [1707620.031806] swap_free: Bad swap file entry 00593c48 
 [1707620.031807] swap_free: Bad swap file entry 03201430 
 [1707620.031807] swap_free: Bad swap file entry 01bc5900 
 [1707620.031807] swap_free: Bad swap file entry 0173ce40 
 [1707620.031808] swap_free: Bad swap file entry 011c0270 
 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 
 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 
 val:1536 
 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 
 val:-1536 

 Qemu is 1.7 

 Does anybody know a fix? 

 Greets, 
 Stefan 
 ___ 
 pve-devel mailing list 
 pve-de...@pve.proxmox.com 
 http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-06 Thread Stefan Priebe - Profihost AG

some more things which happen during migration:

php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0
error 4 in php-cgi[40+6d7000]

php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20
error 4 in ZendOptimizer.so[7f1fb8e71000+147000]

cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp
7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000]

Stefan

Am 06.02.2014 13:10, schrieb Stefan Priebe - Profihost AG:
 May be,
 
 sadly i've no idea. Only using 3.10 Kernel with XFS.
 
 Stefan
 
 Am 06.02.2014 12:40, schrieb Alexandre DERUMIER:
 PS: all my guests do not even have !!SWAP!! 

 Not sure is related to swap file.

 I found an similar problem here, triggered with suspend/resume on ext4

 http://lkml.indiana.edu/hypermail/linux/kernel/1106.3/01340.html


 Maybe is it a guest kernel bug ?

 - Mail original - 

 De: Stefan Priebe - Profihost AG s.pri...@profihost.ag 
 À: Alexandre DERUMIER aderum...@odiso.com 
 Cc: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org 
 Envoyé: Jeudi 6 Février 2014 12:19:36 
 Objet: Re: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry 


 Am 06.02.2014 12:14, schrieb Alexandre DERUMIER: 
 Do you force rbd_cache=true in ceph.conf? 

 no 

 if yes, do you use cache=writeback ? 

 yes 

 So this should be safe. 

 PS: all my guests do not even have !!SWAP!! 

 # free|grep Swap 
 Swap: 0 0 0 

 Stefan 

 according to ceph doc: 
 http://ceph.com/docs/next/rbd/qemu-rbd/ 

 Important If you set rbd_cache=true, you must set cache=writeback or risk 
 data loss. Without cache=writeback, QEMU will not send flush requests to 
 librbd. If QEMU exits uncleanly in this configuration, filesystems on top 
 of rbd can be corrupted. 



 - Mail original - 

 De: Stefan Priebe s.pri...@profihost.ag 
 À: pve-de...@pve.proxmox.com, qemu-devel qemu-devel@nongnu.org 
 Envoyé: Mercredi 5 Février 2014 18:51:15 
 Objet: [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry 

 Hello, 

 after live migrating machines with a lot of memory (32GB, 48GB, ...) i 
 see pretty often crashing services after migration and the guest kernel 
 prints: 

 [1707620.031806] swap_free: Bad swap file entry 00377410 
 [1707620.031806] swap_free: Bad swap file entry 00593c48 
 [1707620.031807] swap_free: Bad swap file entry 03201430 
 [1707620.031807] swap_free: Bad swap file entry 01bc5900 
 [1707620.031807] swap_free: Bad swap file entry 0173ce40 
 [1707620.031808] swap_free: Bad swap file entry 011c0270 
 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 
 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 
 val:1536 
 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 
 val:-1536 

 Qemu is 1.7 

 Does anybody know a fix? 

 Greets, 
 Stefan 
 ___ 
 pve-devel mailing list 
 pve-de...@pve.proxmox.com 
 http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [Qemu-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-05 Thread Stefan Priebe - Profihost AG

Am 05.02.2014 21:15, schrieb Dr. David Alan Gilbert:
 * Stefan Priebe (s.pri...@profihost.ag) wrote:
 Hello,

 after live migrating machines with a lot of memory (32GB, 48GB, ...)
 i see pretty often crashing services after migration and the guest
 kernel prints:

 [1707620.031806] swap_free: Bad swap file entry 00377410
 [1707620.031806] swap_free: Bad swap file entry 00593c48
 [1707620.031807] swap_free: Bad swap file entry 03201430
 [1707620.031807] swap_free: Bad swap file entry 01bc5900
 [1707620.031807] swap_free: Bad swap file entry 0173ce40
 [1707620.031808] swap_free: Bad swap file entry 011c0270
 [1707620.031808] swap_free: Bad swap file entry 03c58ae8
 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380
 idx:1 val:1536
 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380
 idx:2 val:-1536

 Qemu is 1.7

 Does anybody know a fix?
 
 I don't, but some more information about:
 1) What guest you're running

Linux guest the output is also from the guest. Kernel 3.10.26

 2) The configuration of your hosts

What do you mean by that?

 3) The command line (or XML if you're running libvirt) for
your qemu so we can see what devices you're running.

qemu -chardev
socket,id=qmp,path=/var/run/qemu-server/179.qmp,server,nowait -mon
chardev=qmp,mode=control -vnc
unix:/var/run/qemu-server/179.vnc,x509,password -pidfile
/var/run/qemu-server/179.pid -daemonize -name K31953 -smp
sockets=1,cores=16 -nodefaults -boot
menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu
kvm64,+lahf_lm,+x2apic,+sep -k de -m 32768 -device
piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device
usb-tablet,id=tablet,bus=uhci.0,port=1 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive
if=none,id=drive-ide2,media=cdrom,aio=native -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -device
virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive
file=rbd:...,if=none,id=drive-scsi0,iops_rd=1000,iops_wr=500,bps_rd=314572800,bps_wr=209715200,aio=native,discard=on
-device
scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100
-netdev
type=tap,id=net0,ifname=tap179i0,script=/var/lib/qemu-server/pve-bridge,vhost=on
-device
virtio-net-pci,mac=CA:CA:23:AC:2D:C5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
-rtc base=localtime -machine type=pc-i440fx-1.7

 Do you get any messages on either the source or destination
 qemu during the migrate?

no

Stefan

Re: [Qemu-devel] [Qemu-stable] [QEMU PATCH v3] qdev: fix get_fw_dev_path to support to add nothing to fw_dev_path

2013-06-03 Thread Stefan Priebe - Profihost AG

Am 31.05.2013 13:02, schrieb Amos Kong:

 ...

 thanks for this great explanation. I've done what you sayd but it still
 does not work.

 Here is the output of the seabis debug log where you see the loop:
 http://pastebin.com/raw.php?i=e53rdW2b
 
 | found virtio-scsi at 0:5
 | Searching bootorder for: /pci@i0cf8/*@5/*@0/*@0,0
 | virtio-scsi vendor='QEMU' product='QEMU HARDDISK' rev='1.5.' type=0 
 removable=0
 | virtio-scsi blksize=512 sectors=104857600
 
 It mean the fixed fw_dev_path can be identified.
 
 You problem is a disrelated issue. I didn't see handle_18 before restart, it
 means guest succeses to boot from second nic. How does the resume be
 caused?

No it def. does not succeed. Only the first nic gets a reply from a tftp
server. It shows a menu and then does:
  localboot -1

which causes to go to the next boot device (pxelinux.cfg).

It then tries to boot from the 2nd nic. But there it gets only an IP
through DHCP but no tftp details or even an answer.

PS: this was working fine with Qemu 1.4.2

 Please only aasign two nics for guest, let's see if restart occurs.

With one nic i see correctly - no bootable device restart in 1 sec.
With only two nics the screen just turns black and nothing happens at
all after trying PXE from 2nd nic. But no message and no restart.

 Did you config pxe+tftp service for second nic? did you set some rom
 that just reboot the system?
DHCP yes tftp service no.

Stefan

Re: [Qemu-devel] [Qemu-stable] [QEMU PATCH v3] qdev: fix get_fw_dev_path to support to add nothing to fw_dev_path

2013-05-31 Thread Stefan Priebe - Profihost AG

Am 31.05.2013 00:51, schrieb Amos Kong:
 On Thu, May 30, 2013 at 10:30:21PM +0200, Stefan Priebe wrote:
 Am 30.05.2013 15:13, schrieb Amos Kong:
 On Thu, May 30, 2013 at 02:09:25PM +0200, Stefan Priebe - Profihost AG 
 wrote:
 Am 29.05.2013 09:56, schrieb Amos Kong:
 Recent virtio refactoring in QEMU made virtio-bus become the parent bus
 of scsi-bus, and virtio-bus doesn't have get_fw_dev_path implementation,
 typename will be added to fw_dev_path by default, the new fw_dev_path
 could not be identified by seabios. It causes that bootindex parameter
 of scsi device doesn't work.

 This patch implements get_fw_dev_path() in BusClass, it will be called
 if bus doesn't implement the method, tyename will be added to
 fw_dev_path. If the implemented method returns NULL, nothing will be
 added to fw_dev_path.

 It also implements virtio_bus_get_fw_dev_path() to return NULL. Then
 QEMU will still pass original style of fw_dev_path to seabios.

 Signed-off-by: Amos Kong ak...@redhat.com
 --
 v2: only add nothing to fw_dev_path when get_fw_dev_path() is
 implemented and returns NULL. then it will not effect other devices
 don't have get_fw_dev_path() implementation.
 v3: implement default get_fw_dev_path() in BusClass
 ---
  hw/core/qdev.c | 10 +-
  hw/virtio/virtio-bus.c |  6 ++
  2 files changed, 15 insertions(+), 1 deletion(-)

 diff --git a/hw/core/qdev.c b/hw/core/qdev.c
 index 6985ad8..9190a7e 100644
 --- a/hw/core/qdev.c
 +++ b/hw/core/qdev.c
 @@ -515,7 +515,7 @@ static int qdev_get_fw_dev_path_helper(DeviceState 
 *dev, char *p, int size)
  l += snprintf(p + l, size - l, %s, d);
  g_free(d);
  } else {
 -l += snprintf(p + l, size - l, %s, 
 object_get_typename(OBJECT(dev)));
 +return l;
  }
  }
  l += snprintf(p + l , size - l, /);
 @@ -867,9 +867,17 @@ static void qbus_initfn(Object *obj)
  QTAILQ_INIT(bus-children);
  }

 +static char *default_bus_get_fw_dev_path(DeviceState *dev)
 +{
 +return g_strdup(object_get_typename(OBJECT(dev)));
 +}
 +
  static void bus_class_init(ObjectClass *class, void *data)
  {
 +BusClass *bc = BUS_CLASS(class);
 +
  class-unparent = bus_unparent;
 +bc-get_fw_dev_path = default_bus_get_fw_dev_path;
  }

  static void qbus_finalize(Object *obj)
 diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
 index ea2e11a..6849a01 100644
 --- a/hw/virtio/virtio-bus.c
 +++ b/hw/virtio/virtio-bus.c
 @@ -161,10 +161,16 @@ static char *virtio_bus_get_dev_path(DeviceState 
 *dev)
  return qdev_get_dev_path(proxy);
  }

 +static char *virtio_bus_get_fw_dev_path(DeviceState *dev)
 +{
 +return NULL;
 +}
 +
  static void virtio_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *bus_class = BUS_CLASS(klass);
  bus_class-get_dev_path = virtio_bus_get_dev_path;
 +bus_class-get_fw_dev_path = virtio_bus_get_fw_dev_path;
  }

  static const TypeInfo virtio_bus_info = {


 To me booting VMs with more than one SCSI disk does still not work.

 Hi Stefan,

 Can you provide your full command-lines ?

 net: bootindex=100
 scsi0: bootindex=201

 does not work

 this one works fine:
 net: bootindex=200
 scsi0: bootindex=101

 For me, they all work, (1. check the bootindex string, 2. check boot menu 
 by entering F12, 3. check by waiting ).

 
 Thanks for your reply.
 
 Oh it does only NOT work if i have TWO network cards. It never seems
 to try to boot from scsi0. It tries PXE net0 then net1 and then it
 restarts.
  
 Something is wrong here, '-boot menu=on ' - guest could not restart
 if no available boot device, it will also try to boot from other
 unselected devices (DVD, floppy)
 
 '-boot menu=on,strict=on,reboot-timeout=1000' - boot from net0, net1, disk1, 
 then restart ...
 
 It seems the problem of your bios.bin or rbd device.

I've also updated to seabios 1.7.2.2

 I'm using seabios(pc-bios/bios.bin) in qemu repo  latest seabios in 
 seabios.org
 
 Example:
 Command line:
 qemu -chardev
 socket,id=qmp,path=/var/run/qemu-server/155.qmp,server,nowait -mon
 chardev=qmp,mode=control -pidfile /var/run/qemu-server/155.pid
 -daemonize -name TTT -smp sockets=1,cores=4 -nodefaults -boot
 menu=on -vga cirrus -k de -m 4096 -device
 piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device
 usb-tablet,id=tablet,bus=uhci.0,port=1 -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -device
 virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive 
 file=rbd:stor/vmdisk-1:mon_host=10.255.0.100\:6789\;10.255.0.101\:6789\;10.255.0.102\:6789\;:auth_supported=none,if=none,id=drive-scsi0,iops_rd=215,iops_wr=155,bps_rd=136314880,bps_wr=94371840,aio=native,discard=on
 -device 
 scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=200
 -drive if=none,id=drive-ide2,media=cdrom,aio=native -device
 ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2 -netdev 
 type=tap,id=net0,ifname=tap155i0,script=/var/lib/qemu-server/pve-bridge,vhost

Re: [Qemu-devel] [Qemu-stable] [QEMU PATCH v3] qdev: fix get_fw_dev_path to support to add nothing to fw_dev_path

2013-05-31 Thread Stefan Priebe - Profihost AG

Am 31.05.2013 13:02, schrieb Amos Kong:

 ...

 thanks for this great explanation. I've done what you sayd but it still
 does not work.

 Here is the output of the seabis debug log where you see the loop:
 http://pastebin.com/raw.php?i=e53rdW2b
 
 | found virtio-scsi at 0:5
 | Searching bootorder for: /pci@i0cf8/*@5/*@0/*@0,0
 | virtio-scsi vendor='QEMU' product='QEMU HARDDISK' rev='1.5.' type=0 
 removable=0
 | virtio-scsi blksize=512 sectors=104857600
 
 It mean the fixed fw_dev_path can be identified.
 
 You problem is a disrelated issue. I didn't see handle_18 before restart, it
 means guest succeses to boot from second nic. How does the resume be
 caused?

No it def. does not succeed. Only the first nic gets a reply from a tftp
server. It shows a menu and then does:
  localboot -1

which causes to go to the next boot device (pxelinux.cfg).

It then tries to boot from the 2nd nic. But there it gets only an IP
through DHCP but no tftp details or even an answer.

PS: this was working fine with Qemu 1.4.2

 Please only aasign two nics for guest, let's see if restart occurs.

With one nic i see correctly - no bootable device restart in 1 sec.
With only two nics the screen just turns black and nothing happens at
all after trying PXE from 2nd nic. But no message and no restart.

 Did you config pxe+tftp service for second nic? did you set some rom
 that just reboot the system?
DHCP yes tftp service no.

Stefan

Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events

2013-05-30 Thread Stefan Priebe - Profihost AG

Am 30.05.2013 00:29, schrieb mdroth:
 On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote:
 On Mon, 27 May 2013 12:59:25 -0500
 mdroth mdr...@linux.vnet.ibm.com wrote:

 On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
 Luiz Capitulino lcapitul...@redhat.com writes:

 On Sun, 26 May 2013 10:33:39 -0500
 Michael Roth mdr...@linux.vnet.ibm.com wrote:

 In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
 QEMUTimer. Due to timers being processing at the tail end of each main
 loop iteration, this generally meant that such events would be emitted
 within the same main loop iteration, prior any client data being read
 by tcp/unix socket server backends.

 With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
 a bottom-half that is registered via g_idle_add(). This makes it
 likely that the event won't be sent until a subsequent iteration, and
 also adds the possibility that such events will race with the
 processing of client data.

 In cases where we rely on the CHR_EVENT_OPENED event being delivered
 prior to any client data being read, this may cause unexpected behavior.

 In the case of a QMP monitor session using a socket backend, the delayed
 processing of the CHR_EVENT_OPENED event can lead to a situation where
 a previous session, where 'qmp_capabilities' has already processed, can
 cause the rejection of 'qmp_capabilities' for a subsequent session,
 since we reset capabilities in response to CHR_EVENT_OPENED, which may
 not yet have been delivered. This can be reproduced with the following
 command, generally within 50 or so iterations:

   mdroth@loki:~$ cat cmds
   {'execute':'qmp_capabilities'}
   {'execute':'query-cpus'}
   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
   cmds | grep CommandNotFound /dev/null; then echo failed; break; else
   echo ok; fi; done;
   ok
   ok
   failed
   mdroth@loki:~$

 As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
 hook, which gets called as part of the GIOChannel cb associated with the
 client rather than a bottom-half, and is thus guaranteed to be delivered
 prior to accepting any subsequent client sessions.

 This does not address the more general problem of whether or not there
 are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
 client data, and whether or not we should consider moving 
 CHR_EVENT_OPENED
 in-band with connection establishment as a general solution, but fixes
 QMP for the time being.

 Reported-by: Stefan Priebe s.pri...@profihost.ag
 Cc: qemu-sta...@nongnu.org
 Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com

 Thanks for debugging this Michael. I'm going to apply your patch to the 
 qmp
 branch because we can't let this broken (esp. in -stable) but this is 
 papering
 over the real bug...

 Do we really need OPENED to happen in a bottom half?  Shouldn't the open
 event handlers register the callback instead if they need it?

 I think that's the more general fix, but wasn't sure if there were
 historical reasons why we didn't do this in the first place.

 Looking at it more closely it does seem like we need a general fix
 sooner rather than later though, and I don't see any reason why we can't
 move BH registration into the actual OPENED hooks as you suggest:

 hw/usb/ccid-card-passthru.c
  - currently affected? no
 debug hook only
  - needs OPENED BH? no

 hw/usb/redirect.c
  - currently affected? yes
 can_read handler checks for dev-parser != NULL, which may be
 true if CLOSED BH has been executed yet. In the past, OPENED
 quiesced outstanding CLOSED events prior to us reading client data.
  - need OPENED BH? possibly
 seems to only be needed by CLOSED events, which can be issued by
 USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
 may not apply. if we do issue a BH, we'd need to modify can_read
 handler to avoid race.

 hw/usb/dev-serial.c
  - currently affected? no
 can_read checks for dev.attached != NULL. set to NULL
 synchronously in CLOSED, will not accept reads until OPENED gets
 called and sets dev.attached
  - need opened BH?  no

 hw/char/sclpconsole.c
  - currently affected? no
 guarded by can_read handler
  - need opened BH? no

 hw/char/ipoctal232.c
  - currently affected? no
 debug hook only
  - need opened BH? no

 hw/char/virtio-console.c
  - currently affected? no
 OPENED/CLOSED map to vq events handled asynchronously. can_read
 checks for guest_connected state rather than anything set by OPENED
  - need opened BH? no

 qtest.c
  - currently affected? yes
 can_read handler doesn't check for qtest_opened == true, can read
 data before OPENED event is processed
  - need opened BH? no

 monitor.c
  - current affected? yes
 negotiated session state can temporarilly leak into a new session
  - need opened BH? no

 gdbstub.c
  - currently affected? yes
 can fail to pause machine prior to starting gdb

Re: [Qemu-devel] [Qemu-stable] [QEMU PATCH v3] qdev: fix get_fw_dev_path to support to add nothing to fw_dev_path

2013-05-30 Thread Stefan Priebe - Profihost AG

Am 29.05.2013 09:56, schrieb Amos Kong:
 Recent virtio refactoring in QEMU made virtio-bus become the parent bus
 of scsi-bus, and virtio-bus doesn't have get_fw_dev_path implementation,
 typename will be added to fw_dev_path by default, the new fw_dev_path
 could not be identified by seabios. It causes that bootindex parameter
 of scsi device doesn't work.
 
 This patch implements get_fw_dev_path() in BusClass, it will be called
 if bus doesn't implement the method, tyename will be added to
 fw_dev_path. If the implemented method returns NULL, nothing will be
 added to fw_dev_path.
 
 It also implements virtio_bus_get_fw_dev_path() to return NULL. Then
 QEMU will still pass original style of fw_dev_path to seabios.
 
 Signed-off-by: Amos Kong ak...@redhat.com
 --
 v2: only add nothing to fw_dev_path when get_fw_dev_path() is
 implemented and returns NULL. then it will not effect other devices
 don't have get_fw_dev_path() implementation.
 v3: implement default get_fw_dev_path() in BusClass
 ---
  hw/core/qdev.c | 10 +-
  hw/virtio/virtio-bus.c |  6 ++
  2 files changed, 15 insertions(+), 1 deletion(-)
 
 diff --git a/hw/core/qdev.c b/hw/core/qdev.c
 index 6985ad8..9190a7e 100644
 --- a/hw/core/qdev.c
 +++ b/hw/core/qdev.c
 @@ -515,7 +515,7 @@ static int qdev_get_fw_dev_path_helper(DeviceState *dev, 
 char *p, int size)
  l += snprintf(p + l, size - l, %s, d);
  g_free(d);
  } else {
 -l += snprintf(p + l, size - l, %s, 
 object_get_typename(OBJECT(dev)));
 +return l;
  }
  }
  l += snprintf(p + l , size - l, /);
 @@ -867,9 +867,17 @@ static void qbus_initfn(Object *obj)
  QTAILQ_INIT(bus-children);
  }
  
 +static char *default_bus_get_fw_dev_path(DeviceState *dev)
 +{
 +return g_strdup(object_get_typename(OBJECT(dev)));
 +}
 +
  static void bus_class_init(ObjectClass *class, void *data)
  {
 +BusClass *bc = BUS_CLASS(class);
 +
  class-unparent = bus_unparent;
 +bc-get_fw_dev_path = default_bus_get_fw_dev_path;
  }
  
  static void qbus_finalize(Object *obj)
 diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
 index ea2e11a..6849a01 100644
 --- a/hw/virtio/virtio-bus.c
 +++ b/hw/virtio/virtio-bus.c
 @@ -161,10 +161,16 @@ static char *virtio_bus_get_dev_path(DeviceState *dev)
  return qdev_get_dev_path(proxy);
  }
  
 +static char *virtio_bus_get_fw_dev_path(DeviceState *dev)
 +{
 +return NULL;
 +}
 +
  static void virtio_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *bus_class = BUS_CLASS(klass);
  bus_class-get_dev_path = virtio_bus_get_dev_path;
 +bus_class-get_fw_dev_path = virtio_bus_get_fw_dev_path;
  }
  
  static const TypeInfo virtio_bus_info = {
 

To me booting VMs with more than one SCSI disk does still not work.

net: bootindex=100
scsi0: bootindex=201

does not work

this one works fine:
net: bootindex=200
scsi0: bootindex=101


Stefan

Re: [Qemu-devel] qmp commands get rejected

2013-05-24 Thread Stefan Priebe - Profihost AG

Am 24.05.2013 um 15:23 schrieb Luiz Capitulino lcapitul...@redhat.com:

 On Fri, 24 May 2013 07:50:33 +0200
 Stefan Priebe s.pri...@profihost.ag wrote:
 
 Hello list,
 
 since upgrading from qemu 1.4.1 to 1.5.0 i've problems with qmp commands.
 
 With Qemu 1.5 i've the following socket communication:
 
 '{execute:qmp_capabilities,id:12125:1,arguments:{}}'
 
 '{return: {}, id: 12125:1}'
 
 '{execute:qom-set,id:12125:2,arguments:{value:2,path:machine/peripheral/balloon0,property:guest-stats-polling-interval}}'
 
 '{QMP: {version: {qemu: {micro: 0, minor: 5, major: 1}, 
 package: }, capabilities: []}}'
 
 '{id: 12125:2, error: {class: CommandNotFound, desc: The 
 command qom-set has not been found}}'
 
 
 It seems that the command mode (qmp_capabilities) gets resets by the 
 welcome banner?
 
 It looks like you got disconnected before qom-set was issued.

No its the same socket connection. No disconnect had happened.

 
 Can you share more details on how those commands are being issued?

They're send through socket with a perl script. What do you need?

Stefan

Re: [Qemu-devel] qmp commands get rejected

2013-05-24 Thread Stefan Priebe - Profihost AG




Am 24.05.2013 um 16:02 schrieb Luiz Capitulino lcapitul...@redhat.com:

 On Fri, 24 May 2013 15:57:59 +0200
 Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote:
 
 Am 24.05.2013 um 15:23 schrieb Luiz Capitulino lcapitul...@redhat.com:
 
 On Fri, 24 May 2013 07:50:33 +0200
 Stefan Priebe s.pri...@profihost.ag wrote:
 
 Hello list,
 
 since upgrading from qemu 1.4.1 to 1.5.0 i've problems with qmp commands.
 
 With Qemu 1.5 i've the following socket communication:
 
 '{execute:qmp_capabilities,id:12125:1,arguments:{}}'
 
 '{return: {}, id: 12125:1}'
 
 '{execute:qom-set,id:12125:2,arguments:{value:2,path:machine/peripheral/balloon0,property:guest-stats-polling-interval}}'
 
 '{QMP: {version: {qemu: {micro: 0, minor: 5, major: 1}, 
 package: }, capabilities: []}}'
 
 '{id: 12125:2, error: {class: CommandNotFound, desc: The 
 command qom-set has not been found}}'
 
 
 It seems that the command mode (qmp_capabilities) gets resets by the 
 welcome banner?
 
 It looks like you got disconnected before qom-set was issued.
 
 No its the same socket connection. No disconnect had happened.
 
 
 Can you share more details on how those commands are being issued?
 
 They're send through socket with a perl script. What do you need?
 
 That perl script maybe? I can't reproduce the problem.

I would try to create a small example script. Am this be due to the fact that I 
don't wait for the welcome banner right now?

Stefan

Re: [Qemu-devel] segfault in aio_bh_poll async.c:80 WAS: Re: kvm process disappears

2013-05-22 Thread Stefan Priebe - Profihost AG

Hi josh, hi Stefan,

 Am 14.05.2013 17:05, schrieb Stefan Hajnoczi:
 On Tue, May 14, 2013 at 4:29 PM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Am 10.05.2013 13:09, schrieb Stefan Hajnoczi:
 On Fri, May 10, 2013 at 11:07 AM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Am 10.05.2013 09:42, schrieb Stefan Hajnoczi:
 On Fri, May 10, 2013 at 08:12:39AM +0200, Stefan Priebe - Profihost AG 
 wrote:
 3. Either use gdb or an LD_PRELOAD library that catches exit(3) and
   _exit(2) and dumps core using abort(3).  Make sure core dumps are
   enabled.
 
 This time i had a segfault Qemu 1.4.1 plus
 http://git.qemu.org/?p=qemu.git;a=commitdiff;h=dc7588c1eb3008bda53dde1d6b890cd299758155.
 
 aio_bh_pollasync.c:80
 
 Code...
 
   for (bh = ctx-first_bh; bh; bh = next) {
next = bh-next;
if (!bh-deleted  bh-scheduled) {
bh-scheduled = 0;
if (!bh-idle)
ret = 1;
bh-idle = 0;
bh-cb(bh-opaque);
}
}
 
ctx-walking_bh--;
 
/* remove deleted bhs */
if (!ctx-walking_bh) {
bhp = ctx-first_bh;
while (*bhp) {
bh = *bhp;
 = THIS IS THE SEGFAULT LINE =if (bh-deleted) {
*bhp = bh-next;
g_free(bh);
} else {
bhp = bh-next;
}
}
}
 
return ret;
 
 Interesting crash.  Do you have the output of thread apply all bt?
 
 I would try looking at the AioContext using p *ctx, and print out
 the ctx-first_bh linked list.
 
 Hi,
 
 as i can't reproduce no ;-( i just saw the kernel segfault message and
 used addr2line and a qemu dbg package to get the code line.

I've now seen this again for two or three times. It always happens when we do 
an fstrim inside the guest.

And I've seen this first since josh async rbd patch.

Stefan



 
 Stefan

Re: [Qemu-devel] segfault in aio_bh_poll async.c:80 WAS: Re: kvm process disappears

2013-05-22 Thread Stefan Priebe - Profihost AG

Am 22.05.2013 um 10:41 schrieb Paolo Bonzini pbonz...@redhat.com:

 Il 22/05/2013 08:26, Stefan Priebe - Profihost AG ha scritto:
 Hi,
 
 as i can't reproduce no ;-( i just saw the kernel segfault message and
 used addr2line and a qemu dbg package to get the code line.
 
 I've now seen this again for two or three times. It always happens
 when we do an fstrim inside the guest.
 
 
 And I've seen this first since josh async rbd patch.
 
 This one?
 
 commit dc7588c1eb3008bda53dde1d6b890cd299758155

Yes. But i'm not sure whether this is coincendence.

 
  Do you see it even with -drive discard=off?
 
I use discard / trim for thin provisioning and need it. This is a production 
system so I can't test without it.

I use scsi virtio with discard_granularity=512

Stefan

 Paolo

[Qemu-devel] segfault in aio_bh_poll async.c:80 WAS: Re: kvm process disappears

2013-05-14 Thread Stefan Priebe - Profihost AG

Am 10.05.2013 13:09, schrieb Stefan Hajnoczi:
 On Fri, May 10, 2013 at 11:07 AM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Am 10.05.2013 09:42, schrieb Stefan Hajnoczi:
 On Fri, May 10, 2013 at 08:12:39AM +0200, Stefan Priebe - Profihost AG 
 wrote:
 3. Either use gdb or an LD_PRELOAD library that catches exit(3) and
_exit(2) and dumps core using abort(3).  Make sure core dumps are
enabled.

This time i had a segfault Qemu 1.4.1 plus
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=dc7588c1eb3008bda53dde1d6b890cd299758155.

aio_bh_pollasync.c:80

Code...

   for (bh = ctx-first_bh; bh; bh = next) {
next = bh-next;
if (!bh-deleted  bh-scheduled) {
bh-scheduled = 0;
if (!bh-idle)
ret = 1;
bh-idle = 0;
bh-cb(bh-opaque);
}
}

ctx-walking_bh--;

/* remove deleted bhs */
if (!ctx-walking_bh) {
bhp = ctx-first_bh;
while (*bhp) {
bh = *bhp;
= THIS IS THE SEGFAULT LINE =if (bh-deleted) {
*bhp = bh-next;
g_free(bh);
} else {
bhp = bh-next;
}
}
}

return ret;

Greets,
Stefan

1 2 >

1 - 100 of 180 matches

Mail list logo