Re: [Qemu-devel] [PATCH] SMM: disable smram region if smm is disabled
> On 16/05/2017 22:22, Xu, Anthony wrote: > >> On 16/05/2017 03:21, Anthony Xu wrote: > >>> when smm is disabled, smram is not used, so disable it > >>> > >>> Signed-off-by: Anthony Xu <anthony...@intel.com> > >> > >> What is the benefit? > > > > This patch removes 1 memory region for i440 platform and 3 memory > regions > > for q35 platform. That makes functions which iterates memory region tree > > a little bit fast even the memory regions are disabled. > > Does it translate to anything measurable in benchmarks? Yes , we see boot time improvement with this patch in our setup (skip guest BIOS, disable guest PAM). >Could you leave > the regions there, but skip the creation of the SMRAM address space in > register_smram_listener when the machine doesn't have SMM enabled? Sounds like you have concerns on removing smram regions. What are your concerns? -Anthony
Re: [Qemu-devel] [PATCH] Memory: use memory address space for cpu-memory
> On 18/05/2017 23:48, Xu, Anthony wrote: > >> It should be called. Alternatively you could try adding a new function > >> to mark address_space_memory as a never-destroyed AddressSpace: > >> > > This patch would do it, could you please submit this patch? > > If you have tested it (together with the change in the initialization of > address_space_memory), I can do that. > Based on your patch, I added the change in the initialization of address_space_memory. It works well in my setup, cpu-memory address space doesn't show up as we expected. Anthony diff --git a/exec.c b/exec.c index 96e3ac9..746dbbc 100644 --- a/exec.c +++ b/exec.c @@ -2712,7 +2712,7 @@ static void memory_map_init(void) system_memory = g_malloc(sizeof(*system_memory)); memory_region_init(system_memory, NULL, "system", UINT64_MAX); -address_space_init(_space_memory, system_memory, "memory"); +address_space_init_static(_space_memory, system_memory, "memory"); system_io = g_malloc(sizeof(*system_io)); memory_region_init_io(system_io, NULL, _io_ops, NULL, "io", diff --git a/include/exec/memory.h b/include/exec/memory.h index b27b288..6f44b79 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1395,6 +1395,17 @@ MemTxResult memory_region_dispatch_write(MemoryRegion *mr, void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name); /** + * address_space_init_static: initializes an static address space + * + * @as: an uninitialized #AddressSpace + * @root: a #MemoryRegion that routes addresses for the address space + * @name: an address space name. The name is only used for debugging + *output. + */ +void address_space_init_static(AddressSpace *as, MemoryRegion *root, + const char *name); + +/** * address_space_init_shareable: return an address space for a memory region, * creating it if it does not already exist * diff --git a/memory.c b/memory.c index 190cd3d..6c933d8 100644 --- a/memory.c +++ b/memory.c @@ -2461,7 +2461,8 @@ static void do_address_space_destroy(AddressSpace *as) } } -void address_space_init_static(AddressSpace *as, MemoryRegion *root, const char *name) +void address_space_init_static(AddressSpace *as, MemoryRegion *root, + const char *name) { address_space_init(as, root, name); as->shared = true;
Re: [Qemu-devel] [PATCH] Memory: use memory address space for cpu-memory
> >>> -AddressSpace *as = address_space_init_shareable(cpu->memory, > >>> -"cpu-memory"); > >>> +AddressSpace *as; > >>> +if (cpu->memory == address_space_memory.root) { > >>> +address_space_memory.ref_count++; > >> probably this would cause reference leak when vcpu is destroyed > > I thought address_space_destroy is called when vcpu is unplugged, > > seems that's not the case, then ref_count++ is not needed. > > It should be called. Alternatively you could try adding a new function > to mark address_space_memory as a never-destroyed AddressSpace: > This patch would do it, could you please submit this patch? address_space_destroy is not called when vcpu is unplugged, that likely causes memory leak. I will take a look when I have time. If someone can take a look now, that'd be great. Thanks, Anthony > diff --git a/include/exec/memory.h b/include/exec/memory.h > index 99e0f54d86..b27b288c8f 100644 > --- a/include/exec/memory.h > +++ b/include/exec/memory.h > @@ -290,6 +290,7 @@ struct AddressSpace { > MemoryRegion *root; > int ref_count; > bool malloced; > +bool shared; > > /* Accessed via RCU. */ > struct FlatView *current_map; > diff --git a/memory.c b/memory.c > index b727f5ec0e..190cd3d5ce 100644 > --- a/memory.c > +++ b/memory.c > @@ -2432,6 +2432,7 @@ void address_space_init(AddressSpace *as, > MemoryRegion *root, const char *name) > as->ref_count = 1; > as->root = root; > as->malloced = false; > +as->shared = false; > as->current_map = g_new(FlatView, 1); > flatview_init(as->current_map); > as->ioeventfd_nb = 0; > @@ -2460,12 +2461,18 @@ static void > do_address_space_destroy(AddressSpace *as) > } > } > > +void address_space_init_static(AddressSpace *as, MemoryRegion *root, > const char *name) > +{ > +address_space_init(as, root, name); > +as->shared = true; > +} > + > AddressSpace *address_space_init_shareable(MemoryRegion *root, const > char *name) > { > AddressSpace *as; > > QTAILQ_FOREACH(as, _spaces, address_spaces_link) { > -if (root == as->root && as->malloced) { > +if (root == as->root && as->shared) { > as->ref_count++; > return as; > } > @@ -2474,6 +2481,7 @@ AddressSpace > *address_space_init_shareable(MemoryRegion *root, const char *name) > as = g_malloc0(sizeof *as); > address_space_init(as, root, name); > as->malloced = true; > +as->shared = true; > return as; > } > > @@ -2485,6 +2493,8 @@ void address_space_destroy(AddressSpace *as) > if (as->ref_count) { > return; > } > +assert(!as->shared || as->malloced); > + > /* Flush out anything from MemoryListeners listening in on this */ > memory_region_transaction_begin(); > as->root = NULL; > > > then CPUs can keep using address_space_init_shareable. > > Paolo
Re: [Qemu-devel] [PATCH] Memory: use memory address space for cpu-memory
> > If cpu-memory address space is same as memory address space, > > use memory address space for cpu-memory address space. > > > > any memory region change causeaddress space to rebuild PhysPageMap, > > rebuilding PhysPageMap is very expensive. > > > > removing cpu-memory address space reduces the guest boot time and > > memory usage. > > > > Signed-off-by: Anthony Xu <anthony...@intel.com> > > --- > > cpus.c | 9 +++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/cpus.c b/cpus.c > > index 740b8dc..15c7a6a 100644 > > --- a/cpus.c > > +++ b/cpus.c > > @@ -1748,8 +1748,13 @@ void qemu_init_vcpu(CPUState *cpu) > > /* If the target cpu hasn't set up any address spaces itself, > > * give it the default one. > > */ > > -AddressSpace *as = address_space_init_shareable(cpu->memory, > > -"cpu-memory"); > > +AddressSpace *as; > > +if (cpu->memory == address_space_memory.root) { > > +address_space_memory.ref_count++; > probably this would cause reference leak when vcpu is destroyed I thought address_space_destroy is called when vcpu is unplugged, seems that's not the case, then ref_count++ is not needed. Any other comments? Thanks, Anthony
Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode
> > > > With the above change, I got below data > > > > > > > > Platformaccel count of restoring A20 to 0 > > > > Q35 kvm 96 > > > > Q35 tcg 271 > > > > PC kvm 3 > > > > PC tcg 3 > > > > > > Okay, thanks. I think the number of a20 switches is due to > > > differences in option rom execution interacting with the fact that > > > some mode switches were occurring before SeaBIOS set > > > call16_override(). > > > > > > > But I still see a lot of PORT_A20 accesses in QEMU as I expected > > > > > > Yes, but it should be possible to significantly reduce the number of > > > outb() calls by limiting them to when A20 changes. This should also > > > be useful to reduce the number of outb() calls needed to disable NMIs. > > > I sent a patch series to the seabios mailing list to demonstrate the > > > idea. > > > > If both TCG and KVM work by ignoring A20, why not remove all PORT_A20 > > access in SeaBios when CONFIG_DISABLE_A20 is not defined? > > Do you see any impact? > > The SeaBIOS CONFIG_DISABLE_A20 build option does not mean "disable > support for A20"; it means "start the initial operating system > bootloader with A20 disabled". CONFIG_DISABLE_A20=y is a > pessimization, not an optimization. Make sense, Thanks for explanation, > > As for adding a new SeaBIOS build option to compile out support for > A20 - that seems like a very small optimization that would risk memory > corruption and hard to diagnose crashes. SeaBIOS runs natively on > real hardware (with coreboot and as a CSM on UEFI) as well as on > QEMU/KVM. I heard new platform doesn't support A20. What's the hardware SeaBIOS runs natively on needs A20 support? It is just a build option, we can disable A20 support for QEMU/KVM and enable A20 support for real hardware. Any concerns here? BTW QEMU/KVM ignores A20 even SeaBIOS supports A20. Or, we can add some logic in SeaBIOS to check if the platform supports A20, if it doesn't support A20, SeaBIOS won't access PORT_A20 anymore. The check logic is like, write 0x55 to 0x00:0xeff0 (or other unused address) disable A20 write 0xaa to 0x:0xf000 read from 0x00:0xeff0, If the return value is 0x55, A20 is not supported by this platform. Anthony
Re: [Qemu-devel] [PATCH] SMM: disable smram region if smm is disabled
> On 16/05/2017 03:21, Anthony Xu wrote: > > when smm is disabled, smram is not used, so disable it > > > > Signed-off-by: Anthony Xu <anthony...@intel.com> > > What is the benefit? This patch removes 1 memory region for i440 platform and 3 memory regions for q35 platform. That makes functions which iterates memory region tree a little bit fast even the memory regions are disabled. Anthony
Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode
> On Sat, May 13, 2017 at 01:24:30AM +0000, Xu, Anthony wrote: > > I think it is related to accel and platform, the result I gave before is > > for q35 > tcg, > > > > With the above change, I got below data > > > > Platformaccel count of restoring A20 to 0 > > Q35 kvm 96 > > Q35 tcg 271 > > PC kvm 3 > > PC tcg 3 > > Okay, thanks. I think the number of a20 switches is due to > differences in option rom execution interacting with the fact that > some mode switches were occurring before SeaBIOS set > call16_override(). > > > But I still see a lot of PORT_A20 accesses in QEMU as I expected > > Yes, but it should be possible to significantly reduce the number of > outb() calls by limiting them to when A20 changes. This should also > be useful to reduce the number of outb() calls needed to disable NMIs. > I sent a patch series to the seabios mailing list to demonstrate the > idea. If both TCG and KVM work by ignoring A20, why not remove all PORT_A20 access in SeaBios when CONFIG_DISABLE_A20 is not defined? Do you see any impact? -Anthony
Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode
> -Original Message- > From: Kevin O'Connor [mailto:ke...@koconnor.net] > Sent: Friday, May 12, 2017 5:02 PM > To: Xu, Anthony <anthony...@intel.com> > Cc: Paolo Bonzini <pbonz...@redhat.com>; qemu-devel@nongnu.org > Subject: Re: [PATCH] target/i386: enable A20 automatically in system > management mode > > On Fri, May 12, 2017 at 11:19:00PM +, Xu, Anthony wrote: > > > SeaBIOS defaults to enabling A20 and it's a rare beast that disables > > > it. One could change x86.h:set_a20 and romlayout.S:transition32 to > > > only issue the outb() if the inb() indicates a change is needed. That > > > would likely eliminate half the accesses. > > > > The 350 port 92 access is for write operation only. > > If include the inb(), it would be 700, and every time it actually has a > > change > > To be precise, It is about 175 switches from 32 bit to 16 bit, then back to > > 32 > bit. > > call16 is called 175 times during Seabios boot without any option rom, > > It would be more if some option roms are included. > > > > > > I think A20 is disabled by default in SeaBios. > > I don't know why you think that. One can check with: > > --- a/src/stacks.c > +++ b/src/stacks.c > @@ -99,6 +99,8 @@ call32_post(void) > if (cr0_caching) > cr0_mask(CR0_CD|CR0_NW, cr0_caching); > } > +if (!get_a20()) > +dprintf(1, "a20=0\n"); > > // Restore cmos index register > outb(GET_LOW(Call16Data.cmosindex), PORT_CMOS_INDEX); > > With the above I only see a handful of cases where SeaBIOS has to > restore a20 to a disabled state. I think it is related to accel and platform, the result I gave before is for q35 tcg, With the above change, I got below data Platformaccel count of restoring A20 to 0 Q35 kvm 96 Q35 tcg 271 PC kvm 3 PC tcg 3 A lot of A20 restoring happen when SeaBIOS scans AHCI links. > > The handful I do see are due to cases where yield() is called prior to > option rom initialization. Those handful are eliminated for me with > the following fix: > > --- a/src/stacks.c > +++ b/src/stacks.c > @@ -496,6 +496,7 @@ void > thread_setup(void) > { > CanInterrupt = 1; > +call16_override(1); > if (! CONFIG_THREADS) > return; > ThreadControl = romfile_loadint("etc/threads", 1); But I still see a lot of PORT_A20 accesses in QEMU as I expected > > What OS / bootloader are you running? /x86_64-softmmu/qemu-system-x86_64 -bios /home/root/git/seabios/out/bios.bin -smp 1 -machine q35,accel=tcg -m 1G -drive format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0 -nographic -nodefaults -serial stdio -monitor pty -Anthony
Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode
wrote: > > On 12/05/2017 20:55, Xu, Anthony wrote: > > > If that's the case, QEMU/TCG should work with SeaBios even with > ignoring A20. > > > > > > During SeaBios boot, there are >350 port 92 access, if we don't need to > handle A20, > > > we can make A20 configurable in Seabios, It may reduce SeaBios boot > time. > > > > Yes, that's a good idea. > > SeaBIOS defaults to enabling A20 and it's a rare beast that disables > it. One could change x86.h:set_a20 and romlayout.S:transition32 to > only issue the outb() if the inb() indicates a change is needed. That > would likely eliminate half the accesses. The 350 port 92 access is for write operation only. If include the inb(), it would be 700, and every time it actually has a change To be precise, It is about 175 switches from 32 bit to 16 bit, then back to 32 bit. call16 is called 175 times during Seabios boot without any option rom, It would be more if some option roms are included. I think A20 is disabled by default in SeaBios. Call16Data.a20 is initialized to 0, call16_override may set Call16Data.a20 to 1, call16 called before call16_override would disable and enable A20. BTW, A20 is enabled in QEMU by default. > > I'd be surprised if it would impact the overall boot time though. > SeaBIOS only touches the port on a cpu mode switch and I would have > thought that was heavier than an IO port access. Maybe that is skewed > on KVM though. Maybe not, but if we have more these kind of optimizations , we may see the impact. A20 is already configurable in SeaBios, CONFIG_DISABLE_A20. Then the change is very small. No PORT_A20 access after the change. -Anthony diff --git a/src/x86.h b/src/x86.h index a770e6f..8efb94a 100644 --- a/src/x86.h +++ b/src/x86.h @@ -21,6 +21,7 @@ #ifndef __ASSEMBLY__ #include "types.h" // u32 +#include "../out/autoconf.h" static inline void irq_disable(void) { @@ -254,13 +255,19 @@ static inline void lgdt(struct descloc_s *desc) { } static inline u8 get_a20(void) { -return (inb(PORT_A20) & A20_ENABLE_BIT) != 0; +if (CONFIG_DISABLE_A20) { +return (inb(PORT_A20) & A20_ENABLE_BIT) != 0; +} +return 1; } static inline u8 set_a20(u8 cond) { -u8 val = inb(PORT_A20); -outb((val & ~A20_ENABLE_BIT) | (cond ? A20_ENABLE_BIT : 0), PORT_A20); -return (val & A20_ENABLE_BIT) != 0; +if (CONFIG_DISABLE_A20) { +u8 val = inb(PORT_A20); +outb((val & ~A20_ENABLE_BIT) | (cond ? A20_ENABLE_BIT : 0), PORT_A20); +return (val & A20_ENABLE_BIT) != 0; +} +return 1; }
Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode
> On 12/05/2017 01:55, Xu, Anthony wrote: > > Hi Paolo, > > > > In KVM mode, seems A20 is ignored. > > Do you see any potential issue here? > > No; recent processors don't have A20 at all. I mean A20 in guest, not A20 in host. Guest is running on old platform, it tries to control A20 through port 92 like what SeaBios does. QEMU/KVM does handle port 92 access to set correct env->a20_mask, but QEMU/KVM ignores A20 status when handling guest memory access. Since QEMU/KVM works well with SeaBios, does that imply SeaBios doesn't generate address larger than 0x10 in real mode? If that's the case, QEMU/TCG should work with SeaBios even with ignoring A20. During SeaBios boot, there are >350 port 92 access, if we don't need to handle A20, we can make A20 configurable in Seabios, It may reduce SeaBios boot time. Anthony
Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode
Hi Paolo, In KVM mode, seems A20 is ignored. Do you see any potential issue here? Anthony > -Original Message- > From: Kevin O'Connor [mailto:ke...@koconnor.net] > Sent: Thursday, May 11, 2017 9:35 AM > To: Paolo Bonzini <pbonz...@redhat.com> > Cc: qemu-devel@nongnu.org; Xu, Anthony <anthony...@intel.com> > Subject: Re: [PATCH] target/i386: enable A20 automatically in system > management mode > > On Thu, May 11, 2017 at 05:32:47PM +0200, Paolo Bonzini wrote: > > On 11/05/2017 16:53, Kevin O'Connor wrote: > > > On Thu, May 11, 2017 at 01:35:28PM +0200, Paolo Bonzini wrote: > > >> Ignore env->a20_mask when running in system management mode. > > > > > > Thanks Paolo. I don't think this patch will help SeaBIOS though. The > > > SeaBIOS SMM handler doesn't do much - it doesn't even access ram > above > > > 1MiB. See SeaBIOS' code in src/fw/smm.c:handle_smi(). > > > > > > Instead, the SeaBIOS code does a cpu state backup/restore to switch > > > into 32bit mode. I thought the A20 state would be part of that cpu > > > backup/restore. However, looking at the Intel SDM docs now, it's not > > > really clear to me how the processor "inhibits" A20 when in SMM mode - > > > does it save/restore that state on SMI/RSM or does it have special > > > logic to ignore A20 while in SMM mode? > > > > There isn't any documented place for A20 in the state save map (I checked > > AMD's BIOS/Kernel Developer Guide which is pretty comprehensive), so I > > think the latter is more plausible. What I'm doing in this patch is > > ignoring A20 while in SMM mode. > > Okay. > > > Then you would have to add an A20 save/restore in handle_smi; since > > CALL32SMM_ENTERID should not nest, I think you can just do this: > > Yes, that should be fine. > > > --- a/src/fw/smm.c > > +++ b/src/fw/smm.c > > @@ -54,7 +54,8 @@ struct smm_layout { > > struct smm_state backup2; > > u8 stack[0x7c00]; > > u64 codeentry; > > -u8 pad_8008[0x7df8]; > > +u8 a20; > > +u8 pad_8009[0x7df7]; > > struct smm_state cpu; > > }; > > In order to avoid mixing code and data in the same cache line we could > do this instead: > > struct smm_layout { > struct smm_state backup1; > struct smm_state backup2; > -u8 stack[0x7c00]; > +u32 backup_a20; > +u8 stack[0x8000 - sizeof(struct smm_state)*2 - sizeof(u32)]; > u64 codeentry; > u8 pad_8008[0x7df8]; > struct smm_state cpu; > > Thanks, > -Kevin
Re: [Qemu-devel] [PATCH] trace: add sanity check
> Please post steps for reproducing the abort. I cannot reproduce this > with qemu-system-x86_64. The steps to reproduce the issue, ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu gdb -args ./x86_64-softmmu/qemu-system-x86_64 -bios /home/root/guest/seabios.bin -smp 1 -machine q35,accel=kvm -m 1G -drive format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0 -nographic -nodefaults -serial stdio -monitor pty (gdb) bt #0 0x704e25f7 in raise () from /lib64/libc.so.6 #1 0x704e3ce8 in abort () from /lib64/libc.so.6 #2 0x559de905 in bitmap_new (nbits=) at /home/root/git/qemu2.git/include/qemu/bitmap.h:96 #3 cpu_common_initfn (obj=0x56621d30) at qom/cpu.c:399 #4 0x55a11869 in object_init_with_type (obj=0x56621d30, ti=0x5656bbb0) at qom/object.c:341 #5 0x55a11869 in object_init_with_type (obj=0x56621d30, ti=0x5656bd30) at qom/object.c:341 #6 0x55a11efc in object_initialize_with_type (data=data@entry=0x56621d30, size=76560, type=type@entry=0x5656bd30) at qom/object.c:376 #7 0x55a12061 in object_new_with_type (type=0x5656bd30) at qom/object.c:484 #8 0x55a121c5 in object_new (typename=typename@entry=0x56550340 "qemu64-x86_64-cpu") at qom/object.c:494 #9 0x557f6e3d in pc_new_cpu (typename=typename@entry=0x56550340 "qemu64-x86_64-cpu", apic_id=0, errp=errp@entry=0x565391b0 ) at /home/root/git/qemu2.git/hw/i386/pc.c:1101 #10 0x557fa33e in pc_cpus_init (pcms=pcms@entry=0x565f9690) at /home/root/git/qemu2.git/hw/i386/pc.c:1184 #11 0x557fe0f6 in pc_q35_init (machine=0x565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121 #12 0x5574fbad in main (argc=, argv=, envp=) at vl.c:4562 Anthony > > > diff --git a/qom/cpu.c b/qom/cpu.c > > index f02e9c0..f9111a0 100644 > > --- a/qom/cpu.c > > +++ b/qom/cpu.c > > @@ -382,6 +382,7 @@ static void cpu_common_unrealizefn(DeviceState > *dev, Error **errp) > > > > static void cpu_common_initfn(Object *obj) > > { > > +uint32_t count; > > CPUState *cpu = CPU(obj); > > CPUClass *cc = CPU_GET_CLASS(obj); > > > > @@ -396,7 +397,10 @@ static void cpu_common_initfn(Object *obj) > > QTAILQ_INIT(>breakpoints); > > QTAILQ_INIT(>watchpoints); > > > > -cpu->trace_dstate = bitmap_new(trace_get_vcpu_event_count()); > > +count = trace_get_vcpu_event_count(); > > +if (count) { > > +cpu->trace_dstate = bitmap_new(count); > > +} > > > > cpu_exec_initfn(cpu); > > } > > -- > > 1.8.3.1 > > > >
Re: [Qemu-devel] centos 7.2 guest doesn't boot without kvmvapic in TCG mode
+ Richard Henderson Anthony > -Original Message- > From: Xu, Anthony > Sent: Tuesday, April 4, 2017 5:19 PM > To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org> > Cc: 'Paolo Bonzini' <pbonz...@redhat.com>; 'Stefan Hajnoczi' > <stefa...@gmail.com> > Subject: [Qemu-devel] centos 7.2 guest doesn't boot without kvmvapic in > TCG mode > > I disabled kvmvapic by > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c > index c3829e3..52be2b0 100644 > --- a/hw/intc/apic_common.c > +++ b/hw/intc/apic_common.c > @@ -440,7 +440,7 @@ static const VMStateDescription > vmstate_apic_common = { > static Property apic_properties_common[] = { > DEFINE_PROP_UINT8("version", APICCommonState, version, 0x14), > DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control, > VAPIC_ENABLE_BIT, > -true), > +false), > DEFINE_PROP_BOOL("legacy-instance-id", APICCommonState, > legacy_instance_id, > false), > DEFINE_PROP_END_OF_LIST(), > > > Tried to boot centos guest with this command > > ./x86_64-softmmu/qemu-system-x86_64 -bios > /home/root/guest/seabios.bin > -machine q35 -m 1G -drive > format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0 > -nographic -nodefaults -serial pty -monitor pty > -chardev stdio,id=seabios -device isa- > debugcon,iobase=0x402,chardev=seabios > > > Seabios complained > > Seabios output > " > Booting from Hard Disk... > WARNING - Timeout at ahci_command:154! > Boot failed: could not read the boot disk > > enter handle_18: > NULL > Booting from Floppy... > Boot failed: could not read the boot disk > > enter handle_18: > NULL > No bootable device. > " > > It can boot with kvmvapic enabled by using the same command. > > Seabios output > " > Booting from Hard Disk... > Booting from :7c00 > " > > In the original commit, > " Therefore, the VAPI is enabled by default, even in TCG mode. Here, no > speedup can be achieved as the emulation overhead of the TPR register is > marginal compared to instruction emulation." > > Seems no performance benefit for TCG mode > What's the other benefit of using kvmvapic in TCG mode? > > Should we make TCG mode work without kvmvapic? > > If we don't need to use kvmvapic in TCG mode, > we can make kvmvapic code clearer by removing all TCG related code in > kvmvapic.c. > > > Anthony
Re: [Qemu-devel] [PATCH 4/4] pam: setup pc.bios
> I think this is wrong, the high copy should remain read-only or pflash > stops working when you remove PAM. I tried to set pc.bios as read-only and isa.bios as read, it doesn't work. render_memory_region doesn't honor readonly field of alias MemoryRegion. Two FlatRanges created for pc.bios and isa.bios point to the same MemoryRegion pc.bios. Both get readonly from pc.bios. Is this a bug or by design? Pc.bios and isa.bios are backed by the same memory block, so it may cause CPU TLB alias, any issue here? > > The comment only explains the "what" but not the "why" and the "why" is > not in the commit message. See also here: > Will do. Anthony
Re: [Qemu-devel] [PATCH 1/4] pam:refactor PAM related code
> On 08/04/2017 08:45, Anthony Xu wrote: > > split PAM SMRAM functions in piix.c > > create mch_init_pam in q35.c > > Could you further move > > MemoryRegion *ram_memory; > MemoryRegion *system_memory; > MemoryRegion *pci_address_space; > PAMMemoryRegion pam_regions[13]; > > from the northbridge devices up to PCMachineState, and move the common > code for PIIX and Q35 to hw/pci-host/pam.c? > > It looks like you can define a better API than what pam.c currently > provides: > > void pc_init_pam(PCMachineState *pc_machine, DeviceState *d); > void pc_update_pam(PCMachineState *pc_machine, uint8_t *pam_config); > > or even remove "DeviceState *d" because the owner of the memory regions > can be the PCMachineState. > Good idea! Will do. Anthony
Re: [Qemu-devel] [PATCH]Enable kvmvapic only when host doesn't support VAPIC capability in KVM mode
> I think we shouldn't read it like that. It seems that KVM is always > returning the VAPIC capability except when the CPU is providing a > special acceleration [0]. > > I would say you can't really refer yourself at this bit to enable or > not kvmapic in QEMU. > > Does that make sense? > > [0] > https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?id=774ead3ad9bc > bc05ef6aaebb9bdf8b4c3126923b >From this patch and commit message, "Disable vapic support on Intel machines with FlexPriority" If the CPU has accelerated tpr, vapic should be disabled. The function pointer has the name cpu_has_accelerated_tpr. kvmvapic was created for cpu which doesn't have accelerated tpr, https://lists.nongnu.org/archive/html/qemu-devel/2012-02/msg00515.html from the commit message, I can infer VAPIC means kvmvapic. My understanding is they are referring to the same thing, though the return value of KVM_CAP_VAPIC is confusing. Didn't find explanation about the return value of KVM_CAP_VAPIC, We can interpret it as following from QEMU perspective, If KVM_CAP_VAPIC return 1, QEMU should enable VAPIC(kvmvapic). If KVM_CAP_VAPIC return 0, QEMU should disable VAPIC(kvmvapic). Anthony
Re: [Qemu-devel] [PATCH]Enable kvmvapic only when host doesn't support VAPIC capability in KVM mode
> > --- a/kvm-all.c > > +++ b/kvm-all.c > > @@ -2232,6 +2232,10 @@ int kvm_has_sync_mmu(void) > > return kvm_check_extension(kvm_state, KVM_CAP_SYNC_MMU); > > } > > > > +int kvm_has_vapic(void){ > > +return !kvm_check_extension(kvm_state, KVM_CAP_VAPIC); > > +} > > + > > Is that function shouldn't return true if KVM is providing VAPIC > capability? > It should, but it doesn't. see below KVM kernel code segment, it returns false if KVM provides VAPIC. Thanks for your comments, Will resend the patch. Anthony arch/x86/kvm/x86.c case KVM_CAP_VAPIC: r = !kvm_x86_ops->cpu_has_accelerated_tpr(); break;
Re: [Qemu-devel] [PATCH 1/3] move xen-common.c to hw/xen/
Hi Eric, Will resend the patch. Thanks, Anthony > -Original Message- > From: Eric Blake [mailto:ebl...@redhat.com] > Sent: Wednesday, April 5, 2017 11:50 AM > To: Xu, Anthony <anthony...@intel.com>; 'qemu-devel@nongnu.org' > <qemu-devel@nongnu.org> > Cc: anthony.per...@citrix.com; 'Paolo Bonzini' <pbonz...@redhat.com>; > sstabell...@kernel.org > Subject: Re: [Qemu-devel] [PATCH 1/3] move xen-common.c to hw/xen/ > > On 04/05/2017 01:39 PM, Xu, Anthony wrote: > > move xen-common.c to hw/xen/ > > > > Your message failed to set the 'In-Reply-To:' header, and was therefore > not threaded to your 0/3 cover letter. Please see if you can fix that > before posting your next series. > > > > > Signed-off -by: Anthony Xu <anthony...@intel.com> > > > > > > > > --- > > hw/xen/xen-common.c | 158 > > > stubs/xen-common.c | 14 + > > xen-common-stub.c | 14 - > > xen-common.c| 158 > > > > 4 files changed, 172 insertions(+), 172 deletions(-) > > Please run 'git config diff.renames true' so that patches like this will > be a LOT smaller on the mailing list, and will more easily focus on the > (few) things that have to change due to the new name rather than being a > bulk delete/insert operation that hides the changes. > > -- > Eric Blake eblake redhat com+1-919-301-3266 > Libvirt virtualization library http://libvirt.org
[Qemu-devel] [PATCH 3/3] move xen-mapcache.c to hw/i386/xen/
move xen-mapcache.c to hw/i386/xen/ Signed-off -by: Anthony Xu <anthony...@intel.com> --- Makefile.target| 3 - default-configs/i386-softmmu.mak | 1 - default-configs/x86_64-softmmu.mak | 1 - hw/i386/xen/Makefile.objs | 2 +- hw/i386/xen/trace-events | 6 + hw/i386/xen/xen-mapcache.c | 459 + trace-events | 5 - xen-mapcache.c | 459 - 8 files changed, 466 insertions(+), 470 deletions(-) create mode 100644 hw/i386/xen/xen-mapcache.c delete mode 100644 xen-mapcache.c diff --git a/Makefile.target b/Makefile.target index d5ff0c7..a535980 100644 --- a/Makefile.target +++ b/Makefile.target @@ -149,9 +149,6 @@ obj-y += dump.o obj-y += migration/ram.o migration/savevm.o LIBS := $(libs_softmmu) $(LIBS) -# xen support -obj-$(CONFIG_XEN_I386) += xen-mapcache.o - # Hardware support ifeq ($(TARGET_NAME), sparc64) obj-y += hw/sparc64/ diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak index 029e952..d2ab2f6 100644 --- a/default-configs/i386-softmmu.mak +++ b/default-configs/i386-softmmu.mak @@ -39,7 +39,6 @@ CONFIG_TPM_TIS=$(CONFIG_TPM) CONFIG_MC146818RTC=y CONFIG_PCI_PIIX=y CONFIG_WDT_IB700=y -CONFIG_XEN_I386=$(CONFIG_XEN) CONFIG_ISA_DEBUG=y CONFIG_ISA_TESTDEV=y CONFIG_VMPORT=y diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak index d1d7432..9bde2f1 100644 --- a/default-configs/x86_64-softmmu.mak +++ b/default-configs/x86_64-softmmu.mak @@ -39,7 +39,6 @@ CONFIG_TPM_TIS=$(CONFIG_TPM) CONFIG_MC146818RTC=y CONFIG_PCI_PIIX=y CONFIG_WDT_IB700=y -CONFIG_XEN_I386=$(CONFIG_XEN) CONFIG_ISA_DEBUG=y CONFIG_ISA_TESTDEV=y CONFIG_VMPORT=y diff --git a/hw/i386/xen/Makefile.objs b/hw/i386/xen/Makefile.objs index daf4f53..be9d10c 100644 --- a/hw/i386/xen/Makefile.objs +++ b/hw/i386/xen/Makefile.objs @@ -1 +1 @@ -obj-y += xen_platform.o xen_apic.o xen_pvdevice.o xen-hvm.o +obj-y += xen_platform.o xen_apic.o xen_pvdevice.o xen-hvm.o xen-mapcache.o diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events index f25d622..547438d 100644 --- a/hw/i386/xen/trace-events +++ b/hw/i386/xen/trace-events @@ -15,3 +15,9 @@ cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64 cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) "I/O=%p pio read reg data=%#"PRIx64" port=%#"PRIx64" size=%d" cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) "I/O=%p pio write reg data=%#"PRIx64" port=%#"PRIx64" size=%d" cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" + +# xen-mapcache.c +xen_map_cache(uint64_t phys_addr) "want %#"PRIx64 +xen_remap_bucket(uint64_t index) "index %#"PRIx64 +xen_map_cache_return(void* ptr) "%p" + diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c new file mode 100644 index 000..31debdf --- /dev/null +++ b/hw/i386/xen/xen-mapcache.c @@ -0,0 +1,459 @@ +/* + * Copyright (C) 2011 Citrix Ltd. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ + +#include "qemu/osdep.h" + +#include + +#include "hw/xen/xen_backend.h" +#include "sysemu/blockdev.h" +#include "qemu/bitmap.h" + +#include + +#include "sysemu/xen-mapcache.h" +#include "trace.h" + + +//#define MAPCACHE_DEBUG + +#ifdef MAPCACHE_DEBUG +# define DPRINTF(fmt, ...) do { \ +fprintf(stderr, "xen_mapcache: " fmt, ## __VA_ARGS__); \ +} while (0) +#else +# define DPRINTF(fmt, ...) do { } while (0) +#endif + +#if HOST_LONG_BITS == 32 +# define MCACHE_BUCKET_SHIFT 16 +# define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */ +#else +# define MCACHE_BUCKET_SHIFT 20 +# define MCACHE_MAX_SIZE (1UL<<35) /* 32GB Cap */ +#endif +#define MCACHE_BUCKET_SIZE (1UL << MCACHE_BUCKET_SHIFT) + +/* This is the size of the virtual address space reserve to QEMU that will not + * be use by MapCache. + * From empirical tests I observed that qemu use 75MB more than the + * max_mcache_size. + */ +#define NON_MCACHE_MEMORY_SIZE (80 * 1024 * 1024) + +typedef struct MapCacheEntry { +hwaddr paddr_index; +uint8_t *vaddr_base; +unsigned long *valid_mapping; +uint8_t lock; +hwaddr size; +struct MapCacheEntry *next; +} MapCacheEntry; + +typedef struct MapCacheRev { +uint8_t *vaddr_req; +hwaddr p
[Qemu-devel] [PATCH 1/3] move xen-common.c to hw/xen/
move xen-common.c to hw/xen/ Signed-off -by: Anthony Xu <anthony...@intel.com> --- hw/xen/xen-common.c | 158 stubs/xen-common.c | 14 + xen-common-stub.c | 14 - xen-common.c| 158 4 files changed, 172 insertions(+), 172 deletions(-) create mode 100644 hw/xen/xen-common.c create mode 100644 stubs/xen-common.c delete mode 100644 xen-common-stub.c delete mode 100644 xen-common.c diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c new file mode 100644 index 000..fd2c928 --- /dev/null +++ b/hw/xen/xen-common.c @@ -0,0 +1,158 @@ +/* + * Copyright (C) 2014 Citrix Systems UK Ltd. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ + +#include "qemu/osdep.h" +#include "hw/xen/xen_backend.h" +#include "qmp-commands.h" +#include "sysemu/char.h" +#include "sysemu/accel.h" +#include "migration/migration.h" + +//#define DEBUG_XEN + +#ifdef DEBUG_XEN +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "xen: " fmt, ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) \ +do { } while (0) +#endif + +static int store_dev_info(int domid, Chardev *cs, const char *string) +{ +struct xs_handle *xs = NULL; +char *path = NULL; +char *newpath = NULL; +char *pts = NULL; +int ret = -1; + +/* Only continue if we're talking to a pty. */ +if (strncmp(cs->filename, "pty:", 4)) { +return 0; +} +pts = cs->filename + 4; + +/* We now have everything we need to set the xenstore entry. */ +xs = xs_open(0); +if (xs == NULL) { +fprintf(stderr, "Could not contact XenStore\n"); +goto out; +} + +path = xs_get_domain_path(xs, domid); +if (path == NULL) { +fprintf(stderr, "xs_get_domain_path() error\n"); +goto out; +} +newpath = realloc(path, (strlen(path) + strlen(string) + +strlen("/tty") + 1)); +if (newpath == NULL) { +fprintf(stderr, "realloc error\n"); +goto out; +} +path = newpath; + +strcat(path, string); +strcat(path, "/tty"); +if (!xs_write(xs, XBT_NULL, path, pts, strlen(pts))) { +fprintf(stderr, "xs_write for '%s' fail", string); +goto out; +} +ret = 0; + +out: +free(path); +xs_close(xs); + +return ret; +} + +void xenstore_store_pv_console_info(int i, Chardev *chr) +{ +if (i == 0) { +store_dev_info(xen_domid, chr, "/console"); +} else { +char buf[32]; +snprintf(buf, sizeof(buf), "/device/console/%d", i); +store_dev_info(xen_domid, chr, buf); +} +} + + +static void xenstore_record_dm_state(struct xs_handle *xs, const char *state) +{ +char path[50]; + +if (xs == NULL) { +fprintf(stderr, "xenstore connection not initialized\n"); +exit(1); +} + +snprintf(path, sizeof (path), "device-model/%u/state", xen_domid); +if (!xs_write(xs, XBT_NULL, path, state, strlen(state))) { +fprintf(stderr, "error recording dm state\n"); +exit(1); +} +} + + +static void xen_change_state_handler(void *opaque, int running, + RunState state) +{ +if (running) { +/* record state running */ +xenstore_record_dm_state(xenstore, "running"); +} +} + +static int xen_init(MachineState *ms) +{ +xen_xc = xc_interface_open(0, 0, 0); +if (xen_xc == NULL) { +xen_pv_printf(NULL, 0, "can't open xen interface\n"); +return -1; +} +xen_fmem = xenforeignmemory_open(0, 0); +if (xen_fmem == NULL) { +xen_pv_printf(NULL, 0, "can't open xen fmem interface\n"); +xc_interface_close(xen_xc); +return -1; +} +qemu_add_vm_change_state_handler(xen_change_state_handler, NULL); + +global_state_set_optional(); +savevm_skip_configuration(); +savevm_skip_section_footers(); + +return 0; +} + +static void xen_accel_class_init(ObjectClass *oc, void *data) +{ +AccelClass *ac = ACCEL_CLASS(oc); +ac->name = "Xen"; +ac->init_machine = xen_init; +ac->allowed = _allowed; +} + +#define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen") + +static const TypeInfo xen_accel_type = { +.name = TYPE_XEN_ACCEL, +.parent = TYPE_ACCEL, +.class_init = xen_accel_class_init, +}; + +static void xen_type_init(void) +{ +type_register_static(_accel_type); +} + +type_init(xen_type_init); diff --git a/stubs/xen-common.c b/stubs/xen-commo
[Qemu-devel] [PATCH 2/3] move xen-hvm.c to hw/i386/xen/
move xen-hvm.c to hw/i386/xen/ Signed-off -by: Anthony Xu <anthony...@intel.com> --- Makefile.target |5 +- hw/i386/xen/Makefile.objs |2 +- hw/i386/xen/trace-events | 11 + hw/i386/xen/xen-hvm.c | 1422 + hw/xen/Makefile.objs |2 +- stubs/Makefile.objs |2 + stubs/xen-hvm.c | 63 ++ trace-events | 11 - xen-hvm-stub.c| 63 -- xen-hvm.c | 1422 - 10 files changed, 1501 insertions(+), 1502 deletions(-) create mode 100644 hw/i386/xen/xen-hvm.c create mode 100644 stubs/xen-hvm.c delete mode 100644 xen-hvm-stub.c delete mode 100644 xen-hvm.c diff --git a/Makefile.target b/Makefile.target index 7df2b8c..d5ff0c7 100644 --- a/Makefile.target +++ b/Makefile.target @@ -150,10 +150,7 @@ obj-y += migration/ram.o migration/savevm.o LIBS := $(libs_softmmu) $(LIBS) # xen support -obj-$(CONFIG_XEN) += xen-common.o -obj-$(CONFIG_XEN_I386) += xen-hvm.o xen-mapcache.o -obj-$(call lnot,$(CONFIG_XEN)) += xen-common-stub.o -obj-$(call lnot,$(CONFIG_XEN_I386)) += xen-hvm-stub.o +obj-$(CONFIG_XEN_I386) += xen-mapcache.o # Hardware support ifeq ($(TARGET_NAME), sparc64) diff --git a/hw/i386/xen/Makefile.objs b/hw/i386/xen/Makefile.objs index 801a68d..daf4f53 100644 --- a/hw/i386/xen/Makefile.objs +++ b/hw/i386/xen/Makefile.objs @@ -1 +1 @@ -obj-y += xen_platform.o xen_apic.o xen_pvdevice.o +obj-y += xen_platform.o xen_apic.o xen_pvdevice.o xen-hvm.o diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events index 321fe60..f25d622 100644 --- a/hw/i386/xen/trace-events +++ b/hw/i386/xen/trace-events @@ -4,3 +4,14 @@ xen_platform_log(char *s) "xen platform: %s" # hw/i386/xen/xen_pvdevice.c xen_pv_mmio_read(uint64_t addr) "WARNING: read from Xen PV Device MMIO space (address %"PRIx64")" xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space (address %"PRIx64")" + +# xen-hvm.c +xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx" +xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i" +handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" +handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" +handle_ioreq_write(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p write type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" +cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p pio dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" +cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) "I/O=%p pio read reg data=%#"PRIx64" port=%#"PRIx64" size=%d" +cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) "I/O=%p pio write reg data=%#"PRIx64" port=%#"PRIx64" size=%d" +cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c new file mode 100644 index 000..0892361 --- /dev/null +++ b/hw/i386/xen/xen-hvm.c @@ -0,0 +1,1422 @@ +/* + * Copyright (C) 2010 Citrix Ltd. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ + +#include "qemu/osdep.h" + +#include "cpu.h" +#include "hw/pci/pci.h" +#include "hw/i386/pc.h" +#include "hw/i386/apic-msidef.h" +#include "hw/xen/xen_common.h" +#include "hw/xen/xen_backend.h" +#include "qmp-commands.h" + +#include "sysemu/char.h" +#include "qemu/error-report.h" +#include "qemu/range.h" +#include "sysemu/xen-mapcache.h" +#include "trace.h" +#include "exec/address-spaces.h" + +#include +#include +#include
[Qemu-devel] [PATCH 0/3] move xen related files to corresponding xen directory
move xen related files to corresponding xen directory. move xen-common.c to hw/xen/ move xen-hvm.c to hw/i386/xen/ move xen-mapcache.c to hw/i386/xen/ Signed-off -by: Anthony Xu <anthony...@intel.com> Makefile.target|6 - default-configs/i386-softmmu.mak |1 - default-configs/x86_64-softmmu.mak |1 - hw/i386/xen/Makefile.objs |2 +- hw/i386/xen/trace-events | 17 + hw/i386/xen/xen-hvm.c | 1422 hw/i386/xen/xen-mapcache.c | 459 hw/xen/Makefile.objs |2 +- hw/xen/xen-common.c| 158 stubs/Makefile.objs|2 + stubs/xen-common.c | 14 + stubs/xen-hvm.c| 63 ++ trace-events | 16 - xen-common-stub.c | 14 - xen-common.c | 158 xen-hvm-stub.c | 63 -- xen-hvm.c | 1422 xen-mapcache.c | 459 18 files changed, 2137 insertions(+), 2142 deletions(-) create mode 100644 hw/i386/xen/xen-hvm.c create mode 100644 hw/i386/xen/xen-mapcache.c create mode 100644 hw/xen/xen-common.c create mode 100644 stubs/xen-common.c create mode 100644 stubs/xen-hvm.c delete mode 100644 xen-common-stub.c delete mode 100644 xen-common.c delete mode 100644 xen-hvm-stub.c delete mode 100644 xen-hvm.c delete mode 100644 xen-mapcache.c
[Qemu-devel] [PATCH]Enable kvmvapic only when host doesn't support VAPIC capability in KVM mode
In KVM mode, enable kvmvapic only when host doesn't support VAPIC capability. Save the time to set up kvmvapic in some hosts. Signed-off -by: Anthony Xu <anthony...@intel.com> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index c3829e3..d5c53af 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -317,8 +317,11 @@ static void apic_common_realize(DeviceState *dev, Error **errp) info = APIC_COMMON_GET_CLASS(s); info->realize(dev, errp); -/* Note: We need at least 1M to map the VAPIC option ROM */ +/* Note: We need at least 1M to map the VAPIC option ROM, + if it is KVM, enable kvmvapic only when KVM doesn't have + VAPIC capability*/ if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK && +(!kvm_enabled() || (kvm_enabled() && !kvm_has_vapic())) && !hax_enabled() && ram_size >= 1024 * 1024) { vapic = sysbus_create_simple("kvmvapic", -1, NULL); } diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 24281fc..43e0e4c 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -215,6 +215,7 @@ extern KVMState *kvm_state; bool kvm_has_free_slot(MachineState *ms); int kvm_has_sync_mmu(void); +int kvm_has_vapic(void); int kvm_has_vcpu_events(void); int kvm_has_robust_singlestep(void); int kvm_has_debugregs(void); diff --git a/kvm-all.c b/kvm-all.c index 90b8573..edcb6ea 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -2232,6 +2232,10 @@ int kvm_has_sync_mmu(void) return kvm_check_extension(kvm_state, KVM_CAP_SYNC_MMU); } +int kvm_has_vapic(void){ +return !kvm_check_extension(kvm_state, KVM_CAP_VAPIC); +} + int kvm_has_vcpu_events(void) { return kvm_state->vcpu_events;
[Qemu-devel] centos 7.2 guest doesn't boot without kvmvapic in TCG mode
I disabled kvmvapic by diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index c3829e3..52be2b0 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -440,7 +440,7 @@ static const VMStateDescription vmstate_apic_common = { static Property apic_properties_common[] = { DEFINE_PROP_UINT8("version", APICCommonState, version, 0x14), DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control, VAPIC_ENABLE_BIT, -true), +false), DEFINE_PROP_BOOL("legacy-instance-id", APICCommonState, legacy_instance_id, false), DEFINE_PROP_END_OF_LIST(), Tried to boot centos guest with this command ./x86_64-softmmu/qemu-system-x86_64 -bios /home/root/guest/seabios.bin -machine q35 -m 1G -drive format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0 -nographic -nodefaults -serial pty -monitor pty -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios Seabios complained Seabios output " Booting from Hard Disk... WARNING - Timeout at ahci_command:154! Boot failed: could not read the boot disk enter handle_18: NULL Booting from Floppy... Boot failed: could not read the boot disk enter handle_18: NULL No bootable device. " It can boot with kvmvapic enabled by using the same command. Seabios output " Booting from Hard Disk... Booting from :7c00 " In the original commit, " Therefore, the VAPI is enabled by default, even in TCG mode. Here, no speedup can be achieved as the emulation overhead of the TPR register is marginal compared to instruction emulation." Seems no performance benefit for TCG mode What's the other benefit of using kvmvapic in TCG mode? Should we make TCG mode work without kvmvapic? If we don't need to use kvmvapic in TCG mode, we can make kvmvapic code clearer by removing all TCG related code in kvmvapic.c. Anthony
[Qemu-devel] [PATCH] Put all trace.o into libqemuutil.a
Put all trace.o into libqemuutil.a Currently all trace.o are linked into qemu-system, qemu-img, qemu-nbd, qemu-io etc., even the corresponding components are not included. Put all trace.o into libqemuutil.a that the linker would only pull in .o files containing symbols that are actually referenced by the program. Signed-off -by: Anthony Xu <anthony...@intel.com> diff --git a/Makefile b/Makefile index 6c359b2..31d41a7 100644 --- a/Makefile +++ b/Makefile @@ -346,7 +346,7 @@ dtc/%: mkdir -p $@ $(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \ - $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) $(trace-obj-y) + $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS)) # Only keep -O and -g cflags @@ -366,11 +366,11 @@ Makefile: $(version-obj-y) # Build libraries libqemustub.a: $(stub-obj-y) -libqemuutil.a: $(util-obj-y) +libqemuutil.a: $(util-obj-y) $(trace-obj-y) ## -COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a +COMMON_LDADDS = libqemuutil.a libqemustub.a qemu-img.o: qemu-img-cmds.h diff --git a/Makefile.target b/Makefile.target index d5ff0c7..69239e0 100644 --- a/Makefile.target +++ b/Makefile.target @@ -185,8 +185,7 @@ dummy := $(call unnest-vars,.., \ qom-obj-y \ io-obj-y \ common-obj-y \ - common-obj-m \ - trace-obj-y) + common-obj-m) target-obj-y := $(target-obj-y-save) all-obj-y += $(common-obj-y) all-obj-y += $(target-obj-y) @@ -198,7 +197,7 @@ all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y) $(QEMU_PROG_BUILD): config-devices.mak -COMMON_LDADDS = $(trace-obj-y) ../libqemuutil.a ../libqemustub.a +COMMON_LDADDS = ../libqemuutil.a ../libqemustub.a # build either PROG or PROGW $(QEMU_PROG_BUILD): $(all-obj-y) $(COMMON_LDADDS) diff --git a/tests/Makefile.include b/tests/Makefile.include index f3de81f..579ec07 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -519,7 +519,7 @@ QEMU_CFLAGS += -I$(SRC_PATH)/tests # Deps that are common to various different sets of tests below -test-util-obj-y = $(trace-obj-y) libqemuutil.a libqemustub.a +test-util-obj-y = libqemuutil.a libqemustub.a test-qom-obj-y = $(qom-obj-y) $(test-util-obj-y) test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \ tests/test-qapi-event.o tests/test-qmp-introspect.o \
Re: [Qemu-devel] [PATCH] Create libqemutrace.a for all trace.o
> >>> ./trace.o, ./qapi/trace.o and ./util/trace.o are added into > >>> libqemuutil.a to avoid recursive dependencies between > >>> libqemuutil.a and libqemutrace.a. > >> Why would libqemutrace.a depend on libqemuutil.a? > > Each trace.c calls trace_event_register_group to register events, > > trace_event_register_group is defined in trace/control.c , which > > is linked into libqemuutil.a. > > Ah: > > util-obj-$(CONFIG_TRACE_SIMPLE) += simple.o > util-obj-$(CONFIG_TRACE_FTRACE) += ftrace.o > util-obj-y += control.o > util-obj-y += qmp.o > > With the introduction of libqemutrace.a, I believe these should be moved > into libqemutrace.a. Agreed, But it doesn't solve infinite recursion issue. register_module_init is needed by libqemutrace.a, which is defined util/module.c. it is hard to remove libqemutrace.a dependency on libqemuutil.a. Removing libqemuutil.a dependency on libqemutrace.a is feasible. Just like what I did in this patch, include all util related trace.o to libqemuutila. The other simple way is to include all trace.o into libqemuutil.a What's your opinion? Thanks Anthony
Re: [Qemu-devel] [PATCH] Create libqemutrace.a for all trace.o
> > ./trace.o, ./qapi/trace.o and ./util/trace.o are added into > > libqemuutil.a to avoid recursive dependencies between > > libqemuutil.a and libqemutrace.a. > > Why would libqemutrace.a depend on libqemuutil.a? Each trace.c calls trace_event_register_group to register events, trace_event_register_group is defined in trace/control.c , which is linked into libqemuutil.a. > Tracing code shouldn't call other QEMU code. That would could create > infinite recursion when a trace event is fired. If all trace.o needed by libqemuutil.a are linked into libqemuutil.a, libqemuutil.a will not depend on libqemutrace.a. This is what this patch take to break the infinite recursion. Or we can link trace/*.o to libqemutrace.a, hope it breaks the infinite recursion. But trace/*.o may still depend on libqemuutil.a Or we can just link all trace.o to libqemuutil.a. Anthony
[Qemu-devel] [PATCH] Create libqemutrace.a for all trace.o
Create libqemutrace.a for all trace.o Currently all trace.o are linked into qemu-system, qemu-img, qemu-nbd, qemu-io etc., even the corresponding components are not included. Create a libqemutrace.a that the linker would only pull in .o files containing symbols that are actually referenced by the program. ./trace.o, ./qapi/trace.o and ./util/trace.o are added into libqemuutil.a to avoid recursive dependencies between libqemuutil.a and libqemutrace.a. Signed-off -by: Anthony Xu <anthony...@intel.com> diff --git a/Makefile b/Makefile index 6c359b2..565f5c7 100644 --- a/Makefile +++ b/Makefile @@ -345,8 +345,8 @@ subdir-dtc:dtc/libfdt dtc/tests dtc/%: mkdir -p $@ -$(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \ - $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) $(trace-obj-y) +$(SUBDIR_RULES): libqemutrace.a libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \ + $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS)) # Only keep -O and -g cflags @@ -367,10 +367,11 @@ Makefile: $(version-obj-y) libqemustub.a: $(stub-obj-y) libqemuutil.a: $(util-obj-y) +libqemutrace.a: $(trace-obj-y) ## -COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a +COMMON_LDADDS = libqemutrace.a libqemuutil.a libqemustub.a qemu-img.o: qemu-img-cmds.h diff --git a/Makefile.objs b/Makefile.objs index 6167e7b..4289ef9 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -3,6 +3,7 @@ stub-obj-y = stubs/ crypto/ util-obj-y = util/ qobject/ qapi/ util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o qapi-event.o +util-obj-y += $(trace-root-obj-y) chardev-obj-y = chardev/ @@ -167,8 +168,9 @@ trace-events-subdirs += qapi trace-events-files = $(SRC_PATH)/trace-events $(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events) -trace-obj-y = trace-root.o +trace-root-obj-y = trace-root.o +trace-root-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o +trace-obj-y = $(trace-root-obj-y) trace-obj-y += $(trace-events-subdirs:%=%/trace.o) trace-obj-$(CONFIG_TRACE_UST) += trace-ust-all.o -trace-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o trace-obj-$(CONFIG_TRACE_DTRACE) += $(trace-events-subdirs:%=%/trace-dtrace.o) diff --git a/Makefile.target b/Makefile.target index 7df2b8c..6f508c8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -201,7 +201,7 @@ all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y) $(QEMU_PROG_BUILD): config-devices.mak -COMMON_LDADDS = $(trace-obj-y) ../libqemuutil.a ../libqemustub.a +COMMON_LDADDS = ../libqemutrace.a ../libqemuutil.a ../libqemustub.a # build either PROG or PROGW $(QEMU_PROG_BUILD): $(all-obj-y) $(COMMON_LDADDS) diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs index 33906ff..d543d56 100644 --- a/qapi/Makefile.objs +++ b/qapi/Makefile.objs @@ -4,3 +4,5 @@ util-obj-y += string-input-visitor.o string-output-visitor.o util-obj-y += opts-visitor.o qapi-clone-visitor.o util-obj-y += qmp-event.o util-obj-y += qapi-util.o +util-obj-y += trace.o +util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o diff --git a/tests/Makefile.include b/tests/Makefile.include index f3de81f..6cbd602 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -519,7 +519,7 @@ QEMU_CFLAGS += -I$(SRC_PATH)/tests # Deps that are common to various different sets of tests below -test-util-obj-y = $(trace-obj-y) libqemuutil.a libqemustub.a +test-util-obj-y = libqemutrace.a libqemuutil.a libqemustub.a test-qom-obj-y = $(qom-obj-y) $(test-util-obj-y) test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \ tests/test-qapi-event.o tests/test-qmp-introspect.o \ diff --git a/util/Makefile.objs b/util/Makefile.objs index c6205eb..e38b91e 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -43,3 +43,5 @@ util-obj-y += qdist.o util-obj-y += qht.o util-obj-y += range.o util-obj-y += systemd.o +util-obj-y += trace.o +util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o
Re: [Qemu-devel] [PATCH] only link current target arch traces to qemu-system
> Perhaps all trace.o files should be put into their own .a instead of > being added directly to the linker line: > > COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a > > I think the linker would only pull in .o files containing symbols that are > actually referenced by the program. Hi Stefan, That's a good idea! Below patch creates libqemutrace.a. ./trace.o, ./qapi/trace.o and ./util/trace.o are added into libqemuutil.a to avoid recursive dependencies between libqemuutil.a and libqemutrace.a. Anthony diff --git a/Makefile b/Makefile index 6c359b2..565f5c7 100644 --- a/Makefile +++ b/Makefile @@ -345,8 +345,8 @@ subdir-dtc:dtc/libfdt dtc/tests dtc/%: mkdir -p $@ -$(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \ - $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) $(trace-obj-y) +$(SUBDIR_RULES): libqemutrace.a libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \ + $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS)) # Only keep -O and -g cflags @@ -367,10 +367,11 @@ Makefile: $(version-obj-y) libqemustub.a: $(stub-obj-y) libqemuutil.a: $(util-obj-y) +libqemutrace.a: $(trace-obj-y) ## -COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a +COMMON_LDADDS = libqemutrace.a libqemuutil.a libqemustub.a qemu-img.o: qemu-img-cmds.h diff --git a/Makefile.objs b/Makefile.objs index 6167e7b..4289ef9 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -3,6 +3,7 @@ stub-obj-y = stubs/ crypto/ util-obj-y = util/ qobject/ qapi/ util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o qapi-event.o +util-obj-y += $(trace-root-obj-y) chardev-obj-y = chardev/ @@ -167,8 +168,9 @@ trace-events-subdirs += qapi trace-events-files = $(SRC_PATH)/trace-events $(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events) -trace-obj-y = trace-root.o +trace-root-obj-y = trace-root.o +trace-root-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o +trace-obj-y = $(trace-root-obj-y) trace-obj-y += $(trace-events-subdirs:%=%/trace.o) trace-obj-$(CONFIG_TRACE_UST) += trace-ust-all.o -trace-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o trace-obj-$(CONFIG_TRACE_DTRACE) += $(trace-events-subdirs:%=%/trace-dtrace.o) diff --git a/Makefile.target b/Makefile.target index 7df2b8c..6f508c8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -201,7 +201,7 @@ all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y) $(QEMU_PROG_BUILD): config-devices.mak -COMMON_LDADDS = $(trace-obj-y) ../libqemuutil.a ../libqemustub.a +COMMON_LDADDS = ../libqemutrace.a ../libqemuutil.a ../libqemustub.a # build either PROG or PROGW $(QEMU_PROG_BUILD): $(all-obj-y) $(COMMON_LDADDS) diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs index 33906ff..d543d56 100644 --- a/qapi/Makefile.objs +++ b/qapi/Makefile.objs @@ -4,3 +4,5 @@ util-obj-y += string-input-visitor.o string-output-visitor.o util-obj-y += opts-visitor.o qapi-clone-visitor.o util-obj-y += qmp-event.o util-obj-y += qapi-util.o +util-obj-y += trace.o +util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o diff --git a/tests/Makefile.include b/tests/Makefile.include index f3de81f..6cbd602 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -519,7 +519,7 @@ QEMU_CFLAGS += -I$(SRC_PATH)/tests # Deps that are common to various different sets of tests below -test-util-obj-y = $(trace-obj-y) libqemuutil.a libqemustub.a +test-util-obj-y = libqemutrace.a libqemuutil.a libqemustub.a test-qom-obj-y = $(qom-obj-y) $(test-util-obj-y) test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \ tests/test-qapi-event.o tests/test-qmp-introspect.o \ diff --git a/util/Makefile.objs b/util/Makefile.objs index c6205eb..e38b91e 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -43,3 +43,5 @@ util-obj-y += qdist.o util-obj-y += qht.o util-obj-y += range.o util-obj-y += systemd.o +util-obj-y += trace.o +util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o
Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> So please send it to the list with Signed-off-by line. Thanks, > > DPRINTF("handle_mmio\n"); > /* Called outside BQL */ > address_space_rw(_space_memory, > run->mmio.phys_addr, attrs, > run->mmio.data, > run->mmio.len, > run->mmio.is_write); > > and adding a simple assert() would have quickly disproved your theory. You are right here. If it is a PCI bar write, a memory region operation(del/add) may be called inside address_space_rw, and memory_region_transaction_commit will be called. If address_space_rw is called without the global lock, memory_region_transaction_commit is called with the global lock. It conflicts with what you said before. >No, you don't need to check it. Most functions (in this case, >memory_region_transaction_commit) can only be called under the global lock. And if two vcpus program the different PCI bars at the same time (it is unlikely, but QEMU should not assume it), without the global lock, region operations may be called at the same time. Are memory_region_del_subregion and memory_region_add_subregion_overlap thread-safe? > It's not a fix if the code is not thread-safe anymore! But I think you > have the answer now as to why you cannot use synchronize_rcu. Can you elaborate why the code is not thread-safe? Thanks Anthony
[Qemu-devel] [PATCH] clear pending status before calling memory commit
clear pending status before calling memory commit. Otherwise when memory_region_finalize is called, memory_region_transaction_depth is 0 and memory_region_update_pending is true. That's wrong. Signed-off -by: Anthony Xu <anthony...@intel.com> diff --git a/memory.c b/memory.c index 64b0a60..4c95aaf 100644 --- a/memory.c +++ b/memory.c @@ -906,12 +906,6 @@ void memory_region_transaction_begin(void) ++memory_region_transaction_depth; } -static void memory_region_clear_pending(void) -{ -memory_region_update_pending = false; -ioeventfd_update_pending = false; -} - void memory_region_transaction_commit(void) { AddressSpace *as; @@ -927,14 +921,14 @@ void memory_region_transaction_commit(void) QTAILQ_FOREACH(as, _spaces, address_spaces_link) { address_space_update_topology(as); } - +memory_region_update_pending = false; MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); } else if (ioeventfd_update_pending) { QTAILQ_FOREACH(as, _spaces, address_spaces_link) { address_space_update_ioeventfds(as); } +ioeventfd_update_pending = false; } -memory_region_clear_pending(); } }
[Qemu-devel] [PATCH] only link current target arch traces to qemu-system
When building target x86_64-softmmu, all other architectures' trace.o are linked into x86_64-softmmu/qemu-system-x86_64, like hw/arm/trace.o, hw/mips/trace.o etc., that is not necessary. Same thing happens when building other targets. Only current target arch traces should be linked into qemu-system. Signed-off -by: Anthony Xu <anthony...@intel.com> diff --git a/Makefile.target b/Makefile.target index 7df2b8c..638e044 100644 --- a/Makefile.target +++ b/Makefile.target @@ -177,6 +177,41 @@ block-obj-y := common-obj-y := chardev-obj-y := include $(SRC_PATH)/Makefile.objs + +# remove all arch related trace +trace-obj-y := $(filter-out hw/alpha/trace.o hw/alpha/trace-dtrace.o,$(trace-obj-y)) +trace-obj-y := $(filter-out target/alpha/trace.o target/alpha/trace-dtrace.o,$(trace-obj-y)) + +trace-obj-y := $(filter-out hw/arm/trace.o hw/arm/trace-dtrace.o,$(trace-obj-y)) +trace-obj-y := $(filter-out target/arm/trace.o target/arm/trace-dtrace.o,$(trace-obj-y)) + +trace-obj-y := $(filter-out hw/i386/trace.o hw/i386/trace-dtrace.o,$(trace-obj-y)) +trace-obj-y := $(filter-out target/i386/trace.o target/i386/trace-dtrace.o,$(trace-obj-y)) + +trace-obj-y := $(filter-out hw/mips/trace.o hw/mips/trace-dtrace.o,$(trace-obj-y)) +trace-obj-y := $(filter-out target/mips/trace.o target/mips/trace-dtrace.o,$(trace-obj-y)) + +trace-obj-y := $(filter-out hw/sparc/trace.o hw/sparc/trace-dtrace.o,$(trace-obj-y)) +trace-obj-y := $(filter-out target/sparc/trace.o target/sparc/trace-dtrace.o,$(trace-obj-y)) + +trace-obj-y := $(filter-out hw/s390x/trace.o hw/s390x/trace-dtrace.o,$(trace-obj-y)) +trace-obj-y := $(filter-out target/s390x/trace.o target/s390x/trace-dtrace.o,$(trace-obj-y)) + +trace-obj-y := $(filter-out hw/ppc/trace.o hw/ppc/trace-dtrace.o,$(trace-obj-y)) +trace-obj-y := $(filter-out target/ppc/trace.o target/ppc/trace-dtrace.o,$(trace-obj-y)) + +# add current arch related trace +carch := $(TARGET_BASE_ARCH) +ifneq ($(wildcard $(SRC_PATH)/hw/$(carch)/trace-events),) +trace-obj-y += hw/$(carch)/trace.o +trace-obj-$(CONFIG_TRACE_DTRACE) += hw/$(carch)/trace-dtrace.o +endif + +ifneq ($(wildcard $(SRC_PATH)/target/$(carch)/trace-events),) +trace-obj-y += target/$(carch)/trace.o +trace-obj-$(CONFIG_TRACE_DTRACE) += target/$(carch)/trace-dtrace.o +endif + dummy := $(call unnest-vars,,target-obj-y) target-obj-y-save := $(target-obj-y) dummy := $(call unnest-vars,.., \
Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> > memory_region_finalize. > > Let me know if you think otherwise. > > Yes, you can replace memory_region_del_subregion in > memory_region_finalize > with special code that does > > assert(!mr->enabled); > assert(subregion->container == mr); > subregion->container = NULL; > QTAILQ_REMOVE(>subregions, subregion, subregions_link); > memory_region_unref(subregion); > > The last four lines are shared with memory_region_del_subregion, so please > factor them in a new function, for example > memory_region_del_subregion_internal. After adding synchronize_rcu, I saw an infinite recursive call, mem_commit-> memory_region_finalize-> mem_commit-> memory_region_finalize-> .. it caused a segment fault, because 8M stack space is used up, and found when memory_region_finalize is called, memory_region_transaction_depth is 0 and memory_region_update_pending is true. That's not normal! As you mentioned in your previous email, that should not happen. >" But if memory_region_transaction_depth is == 0, there should be no >update pending because the loop has never run" The root cause is, MEMORY_LISTENER_CALL_GLOBAL(commit, Forward) is called before memory_region_clear_pending() in memory_region_transaction_commit. so when MEMORY_LISTENER_CALL_GLOBAL(commit, Forward) is called, memory_region_transaction_depth is 0 and memory_region_update_pending is true. mem_commit may call memory_region_finalize, it causes infinite recursive call. my previous fix for this is not complete. We may clear memory_region_update_pending before calling MEMORY_LISTENER_CALL_GLOBAL(commit, Forward) Please review below patch diff --git a/memory.c b/memory.c index 64b0a60..4c95aaf 100644 --- a/memory.c +++ b/memory.c @@ -906,12 +906,6 @@ void memory_region_transaction_begin(void) ++memory_region_transaction_depth; } -static void memory_region_clear_pending(void) -{ -memory_region_update_pending = false; -ioeventfd_update_pending = false; -} - void memory_region_transaction_commit(void) { AddressSpace *as; @@ -927,14 +921,14 @@ void memory_region_transaction_commit(void) QTAILQ_FOREACH(as, _spaces, address_spaces_link) { address_space_update_topology(as); } - +memory_region_update_pending = false; MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); } else if (ioeventfd_update_pending) { QTAILQ_FOREACH(as, _spaces, address_spaces_link) { address_space_update_ioeventfds(as); } +ioeventfd_update_pending = false; } -memory_region_clear_pending(); } } > > It is not slow, the contention is not high. Yes we can test. > > It's not a matter of contention. It's a matter of how long an RCU > critical section can be---not just at startup, but at any point > during QEMU execution. The thing is, seems both address_space_translate and address_space_dispatch_free are called under the global lock. When synchronize_rcu is called, no other threads are in RCU critical section. Seems RCU is not that useful for address space. > > Have you tried tcmalloc or jemalloc? They use the brk region > less and sometimes are more aggressive in releasing mmap-ed memory > areas. They may be aggressive. But if memory are freed afterward, it may not help in some cases, for example, starting a lot of VMs at the same time. > > > Please review below patch, MemoryRegion is protected by RCU. > > Removing transaction begin/commit in memory_region_finalize makes > little sense because memory_region_del_subregion calls those two > functions again. See above for an alternative. Apart from this, > the patch is at least correct, though I'm not sure it's a good > idea (synchronize_rcu needs testing). > I'm not sure I understand the > address_space_write change). After adding synchronize_rcu, we noticed a RCU dead loop. synchronize_rcu is called inside RCU critical section. It happened when guest OS programmed the PCI bar. The call trace is like, address_space_write-> pci_host_config_write_common -> memory_region_transaction_commit ->mem_commit-> synchronize_rcu pci_host_config_write_common is called inside RCU critical section. The address_space_write change fixed this issue. Thanks, Anthony
Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> The first unref is done after as->current_map is overwritten. > as->current_map is accessed under RCU, so it needs call_rcu. It > balances the initial reference that is present since flatview_init. Got it, thanks for explanation. > > but it is not clear to me, is this a bug or by design? Is flatview_destroy > > called > in current thread > > or RCU thread? > > You cannot know, because there are also other callers of > address_space_get_flatview. Usually it's from the RCU thread. If it is from current thread, do we need to check if the global lock is acquired? As you mentioned flatview_unref may call memory_region_unref. > - let memory_region_finalize remove subregions (commit 2e2b8eb, > "memory: > allow destroying a non-empty MemoryRegion", 2015-10-09) > > - let memory_region_finalize disable coalesced I/O (in fact there are no > callers of memory_region_clear_coalescing outside memory.c) Okay, It may have sub regions, In the comments of memory_region_finalize " /* We know the region is not visible in any address space (it * does not have a container and cannot be a root either because * it has no references, " We know the memory region is not a root region, and the memory region has already been removed from address space even it has sub regions. memory_region_transaction_commit should be called when the memory region is removed from address space. memory_region_transaction_commit seems not be needed in memory_region_finalize. Let me know if you think otherwise. > > How about fall back to synchronize_rcu? > > I'm afraid it would be too slow, but you can test. It is not slow, the contention is not high. Yes we can test. > Nope. The RCU read lock protects all MemoryRegions through > flatview_unref: RCU keeps the FlatView alive, and the FlatView keeps > MemoryRegions alive. I see, RCU needs to protect MemoryRegions as well. Please review below patch, MemoryRegion is protected by RCU. Thanks, Anthony diff --git a/exec.c b/exec.c index a22f5a0..6631668 100644 --- a/exec.c +++ b/exec.c @@ -2521,7 +2521,8 @@ static void mem_commit(MemoryListener *listener) atomic_rcu_set(>dispatch, next); if (cur) { -call_rcu(cur, address_space_dispatch_free, rcu); +synchronize_rcu(); +address_space_dispatch_free(cur); } } @@ -2567,7 +2568,8 @@ void address_space_destroy_dispatch(AddressSpace *as) atomic_rcu_set(>dispatch, NULL); if (d) { -call_rcu(d, address_space_dispatch_free, rcu); +synchronize_rcu(); +address_space_dispatch_free(d); } } @@ -2712,19 +2714,22 @@ static bool prepare_mmio_access(MemoryRegion *mr) return release_lock; } -/* Called within RCU critical section. */ -static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr, -MemTxAttrs attrs, -const uint8_t *buf, -int len, hwaddr addr1, -hwaddr l, MemoryRegion *mr) +MemTxResult address_space_write(AddressSpace *as, hwaddr addr, + MemTxAttrs attrs, const uint8_t *buf, int len) { uint8_t *ptr; uint64_t val; +hwaddr l=len; +hwaddr addr1; +MemoryRegion *mr; MemTxResult result = MEMTX_OK; bool release_lock = false; for (;;) { +rcu_read_lock(); +mr = address_space_translate(as, addr, , , true); +memory_region_ref(mr); +rcu_read_unlock(); if (!memory_access_is_direct(mr, true)) { release_lock |= prepare_mmio_access(mr); l = memory_access_size(mr, l, addr1); @@ -2764,7 +2769,7 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr, memcpy(ptr, buf, l); invalidate_and_set_dirty(mr, addr1, l); } - +memory_region_unref(mr); if (release_lock) { qemu_mutex_unlock_iothread(); release_lock = false; @@ -2779,27 +2784,6 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr, } l = len; -mr = address_space_translate(as, addr, , , true); -} - -return result; -} - -MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs attrs, -const uint8_t *buf, int len) -{ -hwaddr l; -hwaddr addr1; -MemoryRegion *mr; -MemTxResult result = MEMTX_OK; - -if (len > 0) { -rcu_read_lock(); -l = len; -mr = address_space_translate(as, addr, , , true); -result = address_space_write_continue(as, addr, attrs, buf, len, - addr1, l, mr); -rcu_read_unlock(); } return result; diff --git a/memory.c b/memory.c index 64b0a60..d12437c 100644 --- a/memory.c +++ b/memory.c @@ -1505,13 +1505,10 @@ static void
Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Tuesday, March 14, 2017 3:15 AM > To: Xu, Anthony <anthony...@intel.com> > Cc: Zhong, Yang <yang.zh...@intel.com>; qemu-devel@nongnu.org; Peng, > Chao P <chao.p.p...@intel.com> > Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from > 12252kB to 2752KB > > > > On 14/03/2017 06:14, Xu, Anthony wrote: > > Below functions are registered in RCU thread > > address_space_dispatch_free, > > do_address_space_destroy > > flatview_unref > > reclaim_ramblock, > > qht_map_destroy, > > migration_bitmap_free > > > > first three are address space related, should work without global lock per > above analysis. > > The rest are very simple, seems doesn't need global lock. > > flatview_unref can call object_unref and thus reach: Okay, flatview_unref is the one you worried about, Flatview_unref is registered as a RCU callback only in address_space_update_topology, Strangely, it is registered as a RCU callback, and is called directly in this function. Basially flatview_unref is called twice for old view. There are some comments, but it is not clear to me, is this a bug or by design? Is flatview_destroy called in current thread or RCU thread? static void address_space_update_topology(AddressSpace *as) { FlatView *old_view = address_space_get_flatview(as); FlatView *new_view = generate_memory_topology(as->root); address_space_update_topology_pass(as, old_view, new_view, false); address_space_update_topology_pass(as, old_view, new_view, true); /* Writes are protected by the BQL. */ atomic_rcu_set(>current_map, new_view); call_rcu(old_view, flatview_unref, rcu); /* Note that all the old MemoryRegions are still alive up to this * point. This relieves most MemoryListeners from the need to * ref/unref the MemoryRegions they get---unless they use them * outside the iothread mutex, in which case precise reference * counting is necessary. */ flatview_unref(old_view); address_space_update_ioeventfds(as); } Let me split the patch, Do you think below patch is correct? --- a/memory.c +++ b/memory.c @@ -1503,15 +1503,9 @@ static void memory_region_finalize(Object *obj) * and cause an infinite loop. */ mr->enabled = false; -memory_region_transaction_begin(); -while (!QTAILQ_EMPTY(>subregions)) { -MemoryRegion *subregion = QTAILQ_FIRST(>subregions); -memory_region_del_subregion(mr, subregion); -} -memory_region_transaction_commit(); - +assert(QTAILQ_EMPTY(>subregions)); mr->destructor(mr); -memory_region_clear_coalescing(m +assert(QTAILQ_EMPTY(>coalesced)); g_free((char *)mr->name); g_free(mr->ioeventfds); } Then let us take a look at flatview_unref, memory_region_unref may call any QOM finalize through Object_unref(mr->owner), that's your concern, I got it. It is way too complicated to look at each QOM object release callbacks and each QOM property release callbacks. I gave up this path. How about fall back to synchronize_rcu? As for address space, the RCU read lock is used to protect PhysPageMap, but not the regular MemoryRegions, because that are not allocated in PhysPageMap build. Subpage MemoryRegion is an exception, because that is allocated in PhysPageMap build. The MemoryRegions returned from address_space_translate are regular MemoryRegions, so address_space_write_continue and address_space_read_continue don't need RCU read lock. Below patch reclaim address space in synchronized way, it reduces heap size from ~12MB to ~3MB. You need to apply this patch with above patch. And when address_space_dispatch_free is called it already holds the global lock, we don't need to acquire the global lock in address_space_dispatch_free. Please review this patch. Thanks, Anthony diff --git a/cputlb.c b/cputlb.c index 6c39927..98bd21f 100644 --- a/cputlb.c +++ b/cputlb.c @@ -347,6 +347,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr, tlb_add_large_page(env, vaddr, size); } +rcu_read_lock(); sz = size; section = address_space_translate_for_iotlb(cpu, asidx, paddr, , ); assert(sz >= TARGET_PAGE_SIZE); @@ -406,6 +407,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr, } else { te->addr_write = -1; } +rcu_read_unlock(); } /* Add a new TLB entry, but without specifying the memory diff --git a/exec.c b/exec.c index 6fa337b..446d622 100644 --- a/exec.c +++ b/exec.c @@ -455,7 +455,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, IOMMUTLBEntry iotlb = {0}; MemoryRegionSection *section; MemoryRegion *mr; - +rcu_read_lock(); for (;;)
Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> > > Subpages never have subregions, so the loop never runs. The > begin/commit > > > pair then becomes: > > > > > > ++memory_region_transaction_depth; > > > --memory_region_transaction_depth; > > > if (!memory_region_transaction_depth) { > > > if (memory_region_update_pending) { > > > ... > > > } else if (ioeventfd_update_pending) { > > > ... > > > } > > > // memory_region_clear_pending() > > > memory_region_update_pending = false; > > > ioeventfd_update_pending = false; > > >} > > > > > > If memory_region_transaction_depth is > 0 the begin/commit pair does > > > nothing. > > > > > > But if memory_region_transaction_depth is == 0, there should be no > update > > > pending because the loop has never run. So I don't see what your patch > can > > > change. > > > > As I mentioned in PATCH1, this patch is used to fix an issue after we > remove > > the global lock in RCU callback. After global lock is removed, other thread > > may set up update pending, so memory_region_transaction_commit > > may try to rebuild PhysPageMap even the loop doesn’t run, other thread > may > > try to rebuild PhysPageMap at the same time, it is a race condition. > > subpage MemoryRegion is a specific MemoryRegion, it doesn't belong to > any > > address space, it is only used to handle subpage. We may use a new > structure > > other than MemoryRegion to handle subpage to make the logic more > clearer. After > > the change, RCU callback will not free any MemoryRegion. > > This is not true. Try hot-unplugging a device. You are right, hot-unplugging does cause memory region(not for subpage) freed in RCU thread. I tried pci device hot plug/unplug, When plug a device, handle_hmp_command->qmp_device_add->qdev_device_add-> virtio_pci_dc_realize->pci_qdev_realize->virtio_pci_realize-> virtio_device_realize->virtio_bus_device_plugged->virtio_pci_device_plugged are called. when unplug a device, kvm_cpu_exec->memory_region_write_accessor->pci_write-> virtio_device_unrealize->virtio_pci_device_unplugged are called. memory region addition and remove happen in virtio_pci_device_plugged and virtio_pci_device_unplugged respectively, memory region operation needs to acquire the global lock, but none of them happens in RCU thread. When memory_region_finalize is called, the memory region has been removed from the address space (removed in virtio_pci_device_unplugged), both mr->subregions and mr->coalesced are empty. It makes sense to me, when memory_region_finalize is called, means this memory region is not used any more. Please correct me if I'm wrong here, I only tried pci device hot plug/unplug. If above assumption is correct, seems we don't need the global lock for memory region reclamation in RCU thread. Please let me know if other memory reclamation in RCU thread need the global lock. Under the assumption, I have below patch to remove the global lock in RCU thread, I tested vm boot ,reboot, shutdown and pci device hot plug/unplug. Please review the patch, If no further issue, I will send out an official patch later. Thanks, Anthony diff --git a/memory.c b/memory.c index 6c58373..43e06e9 100644 --- a/memory.c +++ b/memory.c @@ -1503,15 +1503,9 @@ static void memory_region_finalize(Object *obj) * and cause an infinite loop. */ mr->enabled = false; -memory_region_transaction_begin(); -while (!QTAILQ_EMPTY(>subregions)) { -MemoryRegion *subregion = QTAILQ_FIRST(>subregions); -memory_region_del_subregion(mr, subregion); -} -memory_region_transaction_commit(); - +assert(QTAILQ_EMPTY(>subregions)); mr->destructor(mr); -memory_region_clear_coalescing(mr); +assert(QTAILQ_EMPTY(>coalesced)); g_free((char *)mr->name); g_free(mr->ioeventfds); } diff --git a/util/rcu.c b/util/rcu.c index 9adc5e4..51e0248 100644 --- a/util/rcu.c +++ b/util/rcu.c @@ -254,24 +254,20 @@ static void *call_rcu_thread(void *opaque) atomic_sub(_call_count, n); synchronize_rcu(); -qemu_mutex_lock_iothread(); while (n > 0) { node = try_dequeue(); while (!node) { -qemu_mutex_unlock_iothread(); qemu_event_reset(_call_ready_event); node = try_dequeue(); if (!node) { qemu_event_wait(_call_ready_event); node = try_dequeue(); } -qemu_mutex_lock_iothread(); } n--; node->func(node); } -qemu_mutex_unlock_iothread(); } abort(); } > > I'm all for reducing the scope of the global QEMU lock, Thanks, >but this needs a plan > and a careful analysis of the involved data structures across _all_ > instance_finalize > implementations. Agreed, Below functions are registered in RCU thread address_space_dispatch_free,
Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> > Ideally, freeing QOM object should not require a global lock. > > If you see any other QOM requiring a global lock, please let us know, we > are willing to fix it. > > All of them. When unplugging a device, the device object will be freed > from an RCU callback. > Thanks for your reply, Some objects may not need lock at all to be freed, Some objects may just need a small lock to be freed, Should we let RCU callbacks to decide which lock they need instead of enforcing the global lock in RCU thread? As for the device object, can we get the global lock inside the object free/destroy function for now? If we can remove the global lock inside RCU thread, we can save 9MB heap memory, that's a lot! Please share with us if you have other idea to do this. Thanks Anthony
Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Friday, March 10, 2017 1:14 AM > To: Zhong, Yang <yang.zh...@intel.com>; qemu-devel@nongnu.org > Cc: Xu, Anthony <anthony...@intel.com>; Peng, Chao P > <chao.p.p...@intel.com> > Subject: Re: [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to > 2752KB > > > > On 10/03/2017 16:14, Yang Zhong wrote: > > There is no need to delete subregion and do memory begin/commit for > > subpage in memory_region_finalize(). > > > > This patch is from Anthony Xu <anthony...@intel.com>. > > > > Signed-off-by: Yang Zhong <yang.zh...@intel.com> > > --- > > memory.c | 13 +++-- > > 1 file changed, 7 insertions(+), 6 deletions(-) > > > > diff --git a/memory.c b/memory.c > > index 284894b..3e9bfff 100644 > > --- a/memory.c > > +++ b/memory.c > > @@ -1505,13 +1505,14 @@ static void memory_region_finalize(Object > *obj) > > * and cause an infinite loop. > > */ > > mr->enabled = false; > > -memory_region_transaction_begin(); > > -while (!QTAILQ_EMPTY(>subregions)) { > > -MemoryRegion *subregion = QTAILQ_FIRST(>subregions); > > -memory_region_del_subregion(mr, subregion); > > +if (!mr->subpage) { > > +memory_region_transaction_begin(); > > +while (!QTAILQ_EMPTY(>subregions)) { > > +MemoryRegion *subregion = QTAILQ_FIRST(>subregions); > > +memory_region_del_subregion(mr, subregion); > > +} > > +memory_region_transaction_commit(); > > Subpages never have subregions, so the loop never runs. The begin/commit > pair then becomes: > > ++memory_region_transaction_depth; > --memory_region_transaction_depth; > if (!memory_region_transaction_depth) { > if (memory_region_update_pending) { > ... > } else if (ioeventfd_update_pending) { > ... > } > // memory_region_clear_pending() > memory_region_update_pending = false; > ioeventfd_update_pending = false; >} > > If memory_region_transaction_depth is > 0 the begin/commit pair does > nothing. > > But if memory_region_transaction_depth is == 0, there should be no update > pending because the loop has never run. So I don't see what your patch can > change. As I mentioned in PATCH1, this patch is used to fix an issue after we remove the global lock in RCU callback. After global lock is removed, other thread may set up update pending, so memory_region_transaction_commit may try to rebuild PhysPageMap even the loop doesn’t run, other thread may try to rebuild PhysPageMap at the same time, it is a race condition. subpage MemoryRegion is a specific MemoryRegion, it doesn't belong to any address space, it is only used to handle subpage. We may use a new structure other than MemoryRegion to handle subpage to make the logic more clearer. After the change, RCU callback will not free any MemoryRegion. Anthony > > Of course there could be an update pending because of a bug elsewhere, > but I tried adding this patch: > > diff --git a/memory.c b/memory.c > index 284894b..2208a21 100644 > --- a/memory.c > +++ b/memory.c > @@ -903,6 +903,10 @@ static void > address_space_update_topology(AddressSpace *as) void > memory_region_transaction_begin(void) > { > qemu_flush_coalesced_mmio_buffer(); > +if (!memory_region_transaction_depth) { > +assert(!memory_region_update_pending); > +assert(!ioeventfd_update_pending); > +} > ++memory_region_transaction_depth; > } > > and at least a basic qemu-system-x86_64 run started just fine. So why does > this patch make a difference? > > Paolo > > > } > > -memory_region_transaction_commit(); > > -
Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Friday, March 10, 2017 12:42 AM > To: Zhong, Yang <yang.zh...@intel.com>; qemu-devel@nongnu.org > Cc: Xu, Anthony <anthony...@intel.com>; Peng, Chao P > <chao.p.p...@intel.com> > Subject: Re: [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to > 2752KB > > > > On 10/03/2017 16:14, Yang Zhong wrote: > > There are a lot of memory allocation during the qemu bootup, which are > > freed later by RCU thread,which means the heap size becomes biger and > > biger when allocation happens, but the heap may not be shrinked even > > after release happens,because some memory blocks in top of heap are > > still being used.Decreasing the sleep and removing > > qemu_mutex_unlock_iothread() lock, which make call_rcu_thread()thread > response the free memory in time. > > This patch will reduce heap Rss around 10M. > > > > This patch is from Anthony xu <anthony...@intel.com>. > > > > Signed-off-by: Yang Zhong <yang.zh...@intel.com> > > --- > > util/rcu.c | 8 ++-- > > 1 file changed, 2 insertions(+), 6 deletions(-) > > > > diff --git a/util/rcu.c b/util/rcu.c > > index 9adc5e4..c5c373c 100644 > > --- a/util/rcu.c > > +++ b/util/rcu.c > > @@ -167,7 +167,7 @@ void synchronize_rcu(void) } > > > > > > -#define RCU_CALL_MIN_SIZE30 > > +#define RCU_CALL_MIN_SIZE5 > > > > /* Multi-producer, single-consumer queue based on > urcu/static/wfqueue.h > > * from liburcu. Note that head is only used by the consumer. > > @@ -241,7 +241,7 @@ static void *call_rcu_thread(void *opaque) > > * added before synchronize_rcu() starts. > > */ > > while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) { > > -g_usleep(1); > > +g_usleep(100); > > if (n == 0) { > > qemu_event_reset(_call_ready_event); > > n = atomic_read(_call_count); @@ -254,24 +254,20 > > @@ static void *call_rcu_thread(void *opaque) > > > > atomic_sub(_call_count, n); > > synchronize_rcu(); > > -qemu_mutex_lock_iothread(); > > while (n > 0) { > > node = try_dequeue(); > > while (!node) { > > -qemu_mutex_unlock_iothread(); > > qemu_event_reset(_call_ready_event); > > node = try_dequeue(); > > if (!node) { > > qemu_event_wait(_call_ready_event); > > node = try_dequeue(); > > } > > -qemu_mutex_lock_iothread(); > > } > > > > n--; > > node->func(node); > > } > > -qemu_mutex_unlock_iothread(); > > This is wrong. RCU callbacks currently need the "big QEMU lock", because > they can free arbitrary QOM objects (including MemoryRegions). Using "big QEMU lock" in RCU callbacks is a little bit heavy, we'd like to remove it. We noticed an issue related to MemoryRegion popping up after we removed the global lock. The root cause is in MemoryRegion destruction memory_region_transaction_commit is called, which may cause issue. Freeing MemoryRegion seems not cause any issue. Patch2 in this series fixes this issue, We found out only subpage MemoryRegion is freed in RCU callbacks, and memory_region_transaction_commit is not needed for subpage MemoryRegion because it doesn't have sub regions. Ideally, freeing QOM object should not require a global lock. If you see any other QOM requiring a global lock, please let us know, we are willing to fix it. Thanks, Anthony > > Paolo > > > } > > abort(); > > } > >