Re: [Qemu-devel] [PATCH] SMM: disable smram region if smm is disabled

2017-05-18 Thread Xu, Anthony
> On 16/05/2017 22:22, Xu, Anthony wrote:
> >> On 16/05/2017 03:21, Anthony Xu wrote:
> >>> when smm is disabled, smram is not used, so disable it
> >>>
> >>> Signed-off-by: Anthony Xu <anthony...@intel.com>
> >>
> >> What is the benefit?
> >
> > This patch removes 1 memory region for i440 platform and 3 memory
> regions
> > for q35 platform. That makes functions which iterates memory region tree
> > a little bit fast even the memory regions are disabled.
> 
> Does it translate to anything measurable in benchmarks?  
Yes , we see boot time improvement with this patch in our setup (skip guest 
BIOS, 
disable guest PAM).

>Could you leave
> the regions there, but skip the creation of the SMRAM address space in
> register_smram_listener when the machine doesn't have SMM enabled?
Sounds like you have concerns on removing smram regions.
What are your concerns?

-Anthony


Re: [Qemu-devel] [PATCH] Memory: use memory address space for cpu-memory

2017-05-18 Thread Xu, Anthony
> On 18/05/2017 23:48, Xu, Anthony wrote:
> >> It should be called.  Alternatively you could try adding a new function
> >> to mark address_space_memory as a never-destroyed AddressSpace:
> >>
> > This patch would do it, could you please submit this patch?
> 
> If you have tested it (together with the change in the initialization of
> address_space_memory), I can do that.
> 

Based on your patch, I added the change in the initialization of 
address_space_memory. It works well in my setup, cpu-memory
address space doesn't show up as we expected.

Anthony

diff --git a/exec.c b/exec.c
index 96e3ac9..746dbbc 100644
--- a/exec.c
+++ b/exec.c
@@ -2712,7 +2712,7 @@ static void memory_map_init(void)
 system_memory = g_malloc(sizeof(*system_memory));

 memory_region_init(system_memory, NULL, "system", UINT64_MAX);
-address_space_init(_space_memory, system_memory, "memory");
+address_space_init_static(_space_memory, system_memory, "memory");

 system_io = g_malloc(sizeof(*system_io));
 memory_region_init_io(system_io, NULL, _io_ops, NULL, "io",
diff --git a/include/exec/memory.h b/include/exec/memory.h
index b27b288..6f44b79 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1395,6 +1395,17 @@ MemTxResult memory_region_dispatch_write(MemoryRegion 
*mr,
 void address_space_init(AddressSpace *as, MemoryRegion *root, const char 
*name);

 /**
+ * address_space_init_static: initializes an static address space
+ *
+ * @as: an uninitialized #AddressSpace
+ * @root: a #MemoryRegion that routes addresses for the address space
+ * @name: an address space name.  The name is only used for debugging
+ *output.
+ */
+void address_space_init_static(AddressSpace *as, MemoryRegion *root,
+   const char *name);
+
+/**
  * address_space_init_shareable: return an address space for a memory region,
  *   creating it if it does not already exist
  *
diff --git a/memory.c b/memory.c
index 190cd3d..6c933d8 100644
--- a/memory.c
+++ b/memory.c
@@ -2461,7 +2461,8 @@ static void do_address_space_destroy(AddressSpace *as)
 }
 }

-void address_space_init_static(AddressSpace *as, MemoryRegion *root, const 
char *name)
+void address_space_init_static(AddressSpace *as, MemoryRegion *root,
+   const char *name)
 {
 address_space_init(as, root, name);
 as->shared = true;



Re: [Qemu-devel] [PATCH] Memory: use memory address space for cpu-memory

2017-05-18 Thread Xu, Anthony
> >>> -AddressSpace *as = address_space_init_shareable(cpu->memory,
> >>> -"cpu-memory");
> >>> +AddressSpace *as;
> >>> +if (cpu->memory == address_space_memory.root) {
> >>> +address_space_memory.ref_count++;
> >> probably this would cause reference leak when vcpu is destroyed
> > I thought address_space_destroy is called when vcpu is unplugged,
> > seems that's not the case, then ref_count++ is not needed.
> 
> It should be called.  Alternatively you could try adding a new function
> to mark address_space_memory as a never-destroyed AddressSpace:
> 
This patch would do it, could you please submit this patch?

address_space_destroy is not called when vcpu is unplugged, that likely
causes memory leak. I will take a look when I have time.
If someone can take a look now, that'd be great.


Thanks,
Anthony



> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 99e0f54d86..b27b288c8f 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -290,6 +290,7 @@ struct AddressSpace {
>  MemoryRegion *root;
>  int ref_count;
>  bool malloced;
> +bool shared;
> 
>  /* Accessed via RCU.  */
>  struct FlatView *current_map;
> diff --git a/memory.c b/memory.c
> index b727f5ec0e..190cd3d5ce 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -2432,6 +2432,7 @@ void address_space_init(AddressSpace *as,
> MemoryRegion *root, const char *name)
>  as->ref_count = 1;
>  as->root = root;
>  as->malloced = false;
> +as->shared = false;
>  as->current_map = g_new(FlatView, 1);
>  flatview_init(as->current_map);
>  as->ioeventfd_nb = 0;
> @@ -2460,12 +2461,18 @@ static void
> do_address_space_destroy(AddressSpace *as)
>  }
>  }
> 
> +void address_space_init_static(AddressSpace *as, MemoryRegion *root,
> const char *name)
> +{
> +address_space_init(as, root, name);
> +as->shared = true;
> +}
> +
>  AddressSpace *address_space_init_shareable(MemoryRegion *root, const
> char *name)
>  {
>  AddressSpace *as;
> 
>  QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
> -if (root == as->root && as->malloced) {
> +if (root == as->root && as->shared) {
>  as->ref_count++;
>  return as;
>  }
> @@ -2474,6 +2481,7 @@ AddressSpace
> *address_space_init_shareable(MemoryRegion *root, const char *name)
>  as = g_malloc0(sizeof *as);
>  address_space_init(as, root, name);
>  as->malloced = true;
> +as->shared = true;
>  return as;
>  }
> 
> @@ -2485,6 +2493,8 @@ void address_space_destroy(AddressSpace *as)
>  if (as->ref_count) {
>  return;
>  }
> +assert(!as->shared || as->malloced);
> +
>  /* Flush out anything from MemoryListeners listening in on this */
>  memory_region_transaction_begin();
>  as->root = NULL;
> 
> 
> then CPUs can keep using address_space_init_shareable.
> 
> Paolo


Re: [Qemu-devel] [PATCH] Memory: use memory address space for cpu-memory

2017-05-17 Thread Xu, Anthony

> > If cpu-memory address space is same as memory address space,
> > use memory address space for cpu-memory address space.
> >
> > any memory region change causeaddress space to rebuild PhysPageMap,
> > rebuilding PhysPageMap is very expensive.
> >
> > removing cpu-memory address space reduces the guest boot time and
> > memory usage.
> >
> > Signed-off-by: Anthony Xu <anthony...@intel.com>
> > ---
> >  cpus.c | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/cpus.c b/cpus.c
> > index 740b8dc..15c7a6a 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -1748,8 +1748,13 @@ void qemu_init_vcpu(CPUState *cpu)
> >  /* If the target cpu hasn't set up any address spaces itself,
> >   * give it the default one.
> >   */
> > -AddressSpace *as = address_space_init_shareable(cpu->memory,
> > -"cpu-memory");
> > +AddressSpace *as;
> > +if (cpu->memory == address_space_memory.root) {
> > +address_space_memory.ref_count++;
> probably this would cause reference leak when vcpu is destroyed

I thought address_space_destroy is called when vcpu is unplugged,
seems that's not the case, then ref_count++ is not needed.

Any other comments?


Thanks,
Anthony










Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-16 Thread Xu, Anthony
> > > > With the above change,   I got below data
> > > >
> > > > Platformaccel   count of restoring A20 to 0
> > > > Q35 kvm 96
> > > > Q35 tcg 271
> > > > PC  kvm 3
> > > > PC  tcg 3
> > >
> > > Okay, thanks.  I think the number of a20 switches is due to
> > > differences in option rom execution interacting with the fact that
> > > some mode switches were occurring before SeaBIOS set
> > > call16_override().
> > >
> > > > But I still see a lot of PORT_A20 accesses in QEMU as I expected
> > >
> > > Yes, but it should be possible to significantly reduce the number of
> > > outb() calls by limiting them to when A20 changes.  This should also
> > > be useful to reduce the number of outb() calls needed to disable NMIs.
> > > I sent a patch series to the seabios mailing list to demonstrate the
> > > idea.
> >
> > If both TCG and KVM work by ignoring A20,  why not remove all PORT_A20
> > access in SeaBios when CONFIG_DISABLE_A20 is not defined?
> > Do you see any impact?
> 
> The SeaBIOS CONFIG_DISABLE_A20 build option does not mean "disable
> support for A20"; it means "start the initial operating system
> bootloader with A20 disabled".  CONFIG_DISABLE_A20=y is a
> pessimization, not an optimization.
Make sense, Thanks for explanation, 

> 
> As for adding a new SeaBIOS build option to compile out support for
> A20 - that seems like a very small optimization that would risk memory
> corruption and hard to diagnose crashes.  SeaBIOS runs natively on
> real hardware (with coreboot and as a CSM on UEFI) as well as on
> QEMU/KVM.
I heard new platform doesn't support A20.
What's the hardware SeaBIOS runs natively on needs A20 support?

It is just a build option, we can disable A20 support for QEMU/KVM and
enable A20 support for real hardware. Any concerns here?

BTW QEMU/KVM ignores A20 even SeaBIOS supports A20.

Or, we can add some logic in SeaBIOS to check if the platform supports A20,
if it doesn't support A20, SeaBIOS won't access PORT_A20 anymore.

The check logic is like,
write 0x55 to 0x00:0xeff0 (or other unused address)
disable A20
write 0xaa to 0x:0xf000
read from 0x00:0xeff0,
If the return value is 0x55, A20 is not supported by this platform.


Anthony




Re: [Qemu-devel] [PATCH] SMM: disable smram region if smm is disabled

2017-05-16 Thread Xu, Anthony
> On 16/05/2017 03:21, Anthony Xu wrote:
> > when smm is disabled, smram is not used, so disable it
> >
> > Signed-off-by: Anthony Xu <anthony...@intel.com>
> 
> What is the benefit?
This patch removes 1 memory region for i440 platform and 3 memory regions
for q35 platform. That makes functions which iterates memory region tree
a little bit fast even the memory regions are disabled.


Anthony


Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-16 Thread Xu, Anthony

> On Sat, May 13, 2017 at 01:24:30AM +0000, Xu, Anthony wrote:
> > I think it is related to accel and platform, the result I gave before is 
> > for q35
> tcg,
> >
> > With the above change,   I got below data
> >
> > Platformaccel   count of restoring A20 to 0
> > Q35 kvm 96
> > Q35 tcg 271
> > PC  kvm 3
> > PC  tcg 3
> 
> Okay, thanks.  I think the number of a20 switches is due to
> differences in option rom execution interacting with the fact that
> some mode switches were occurring before SeaBIOS set
> call16_override().
> 
> > But I still see a lot of PORT_A20 accesses in QEMU as I expected
> 
> Yes, but it should be possible to significantly reduce the number of
> outb() calls by limiting them to when A20 changes.  This should also
> be useful to reduce the number of outb() calls needed to disable NMIs.
> I sent a patch series to the seabios mailing list to demonstrate the
> idea.

If both TCG and KVM work by ignoring A20,  why not remove all PORT_A20
access in SeaBios when CONFIG_DISABLE_A20 is not defined?
Do you see any impact?


-Anthony





Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-12 Thread Xu, Anthony
> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Friday, May 12, 2017 5:02 PM
> To: Xu, Anthony <anthony...@intel.com>
> Cc: Paolo Bonzini <pbonz...@redhat.com>; qemu-devel@nongnu.org
> Subject: Re: [PATCH] target/i386: enable A20 automatically in system
> management mode
> 
> On Fri, May 12, 2017 at 11:19:00PM +, Xu, Anthony wrote:
> > > SeaBIOS defaults to enabling A20 and it's a rare beast that disables
> > > it.  One could change x86.h:set_a20 and romlayout.S:transition32 to
> > > only issue the outb() if the inb() indicates a change is needed.  That
> > > would likely eliminate half the accesses.
> >
> > The 350 port 92 access is for write operation only.
> > If include the inb(), it would be 700, and every time it actually has a 
> > change
> > To be precise, It is about 175 switches from 32 bit to 16 bit, then back to 
> > 32
> bit.
> > call16 is called 175 times during Seabios boot without any option rom,
> > It would be more if some option roms are included.
> >
> >
> > I think A20 is disabled by default in SeaBios.
> 
> I don't know why you think that.  One can check with:
> 
> --- a/src/stacks.c
> +++ b/src/stacks.c
> @@ -99,6 +99,8 @@ call32_post(void)
>  if (cr0_caching)
>  cr0_mask(CR0_CD|CR0_NW, cr0_caching);
>  }
> +if (!get_a20())
> +dprintf(1, "a20=0\n");
> 
>  // Restore cmos index register
>  outb(GET_LOW(Call16Data.cmosindex), PORT_CMOS_INDEX);
> 
> With the above I only see a handful of cases where SeaBIOS has to
> restore a20 to a disabled state.

I think it is related to accel and platform, the result I gave before is for 
q35 tcg,

With the above change,   I got below data

Platformaccel   count of restoring A20 to 0
Q35 kvm 96
Q35 tcg 271
PC  kvm 3
PC  tcg 3

A lot of A20 restoring happen when SeaBIOS scans AHCI links.


> 
> The handful I do see are due to cases where yield() is called prior to
> option rom initialization.  Those handful are eliminated for me with
> the following fix:
> 
> --- a/src/stacks.c
> +++ b/src/stacks.c
> @@ -496,6 +496,7 @@ void
>  thread_setup(void)
>  {
>  CanInterrupt = 1;
> +call16_override(1);
>  if (! CONFIG_THREADS)
>  return;
>  ThreadControl = romfile_loadint("etc/threads", 1);

But I still see a lot of PORT_A20 accesses in QEMU as I expected


> 
> What OS / bootloader are you running?

/x86_64-softmmu/qemu-system-x86_64 -bios /home/root/git/seabios/out/bios.bin 
-smp 1 
-machine q35,accel=tcg -m 1G -drive 
format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0 
-nographic  -nodefaults  -serial stdio -monitor pty

-Anthony



Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-12 Thread Xu, Anthony
wrote:
> > On 12/05/2017 20:55, Xu, Anthony wrote:
> > > If that's the case,  QEMU/TCG should work with SeaBios even with
> ignoring A20.
> > >
> > > During SeaBios boot, there are >350 port 92 access, if we don't need to
> handle A20,
> > > we can make A20 configurable in Seabios, It may reduce SeaBios boot
> time.
> >
> > Yes, that's a good idea.
> 
> SeaBIOS defaults to enabling A20 and it's a rare beast that disables
> it.  One could change x86.h:set_a20 and romlayout.S:transition32 to
> only issue the outb() if the inb() indicates a change is needed.  That
> would likely eliminate half the accesses.

The 350 port 92 access is for write operation only.
If include the inb(), it would be 700, and every time it actually has a change
To be precise, It is about 175 switches from 32 bit to 16 bit, then back to 32 
bit.
call16 is called 175 times during Seabios boot without any option rom,
It would be more if some option roms are included.


I think A20 is disabled by default in SeaBios.
Call16Data.a20 is initialized to 0,
call16_override may set Call16Data.a20 to 1,
call16 called before call16_override would disable and enable A20.

BTW, A20 is enabled in QEMU by default.



> 
> I'd be surprised if it would impact the overall boot time though.
> SeaBIOS only touches the port on a cpu mode switch and I would have
> thought that was heavier than an IO port access.  Maybe that is skewed
> on KVM though.

Maybe not, but if we have more these kind  of optimizations , we may see the 
impact.
A20 is already configurable in SeaBios,  CONFIG_DISABLE_A20.
Then the change is very small. No PORT_A20 access after the change.


-Anthony


diff --git a/src/x86.h b/src/x86.h
index a770e6f..8efb94a 100644
--- a/src/x86.h
+++ b/src/x86.h
@@ -21,6 +21,7 @@
 #ifndef __ASSEMBLY__

 #include "types.h" // u32
+#include "../out/autoconf.h"

 static inline void irq_disable(void)
 {
@@ -254,13 +255,19 @@ static inline void lgdt(struct descloc_s *desc) {
 }

 static inline u8 get_a20(void) {
-return (inb(PORT_A20) & A20_ENABLE_BIT) != 0;
+if (CONFIG_DISABLE_A20) {
+return (inb(PORT_A20) & A20_ENABLE_BIT) != 0;
+}
+return 1;
 }

 static inline u8 set_a20(u8 cond) {
-u8 val = inb(PORT_A20);
-outb((val & ~A20_ENABLE_BIT) | (cond ? A20_ENABLE_BIT : 0), PORT_A20);
-return (val & A20_ENABLE_BIT) != 0;
+if (CONFIG_DISABLE_A20) {
+u8 val = inb(PORT_A20);
+outb((val & ~A20_ENABLE_BIT) | (cond ? A20_ENABLE_BIT : 0), PORT_A20);
+return (val & A20_ENABLE_BIT) != 0;
+}
+return 1;
 }





Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-12 Thread Xu, Anthony
 
> On 12/05/2017 01:55, Xu, Anthony wrote:
> > Hi Paolo,
> >
> > In KVM mode, seems A20 is ignored.
> > Do you see any potential issue here?
> 
> No; recent processors don't have A20 at all.

I mean A20 in guest, not A20 in host.  
Guest is running on old platform, it tries to control A20  through port 92 like 
what SeaBios does.
QEMU/KVM does handle port 92 access to set correct env->a20_mask,
but QEMU/KVM ignores A20 status when handling guest memory access.

Since QEMU/KVM works well with SeaBios, does that imply SeaBios doesn't 
generate address
larger than 0x10 in real mode?

If that's the case,  QEMU/TCG should work with SeaBios even with ignoring A20.

During SeaBios boot, there are >350 port 92 access, if we don't need to handle 
A20, 
we can make A20 configurable in Seabios, It may reduce SeaBios boot time.


Anthony





Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-11 Thread Xu, Anthony
Hi Paolo,

In KVM mode, seems A20 is ignored.
Do you see any potential issue here?


Anthony 


> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Thursday, May 11, 2017 9:35 AM
> To: Paolo Bonzini <pbonz...@redhat.com>
> Cc: qemu-devel@nongnu.org; Xu, Anthony <anthony...@intel.com>
> Subject: Re: [PATCH] target/i386: enable A20 automatically in system
> management mode
> 
> On Thu, May 11, 2017 at 05:32:47PM +0200, Paolo Bonzini wrote:
> > On 11/05/2017 16:53, Kevin O'Connor wrote:
> > > On Thu, May 11, 2017 at 01:35:28PM +0200, Paolo Bonzini wrote:
> > >> Ignore env->a20_mask when running in system management mode.
> > >
> > > Thanks Paolo.  I don't think this patch will help SeaBIOS though.  The
> > > SeaBIOS SMM handler doesn't do much - it doesn't even access ram
> above
> > > 1MiB.  See SeaBIOS' code in src/fw/smm.c:handle_smi().
> > >
> > > Instead, the SeaBIOS code does a cpu state backup/restore to switch
> > > into 32bit mode.  I thought the A20 state would be part of that cpu
> > > backup/restore.  However, looking at the Intel SDM docs now, it's not
> > > really clear to me how the processor "inhibits" A20 when in SMM mode -
> > > does it save/restore that state on SMI/RSM or does it have special
> > > logic to ignore A20 while in SMM mode?
> >
> > There isn't any documented place for A20 in the state save map (I checked
> > AMD's BIOS/Kernel Developer Guide which is pretty comprehensive), so I
> > think the latter is more plausible.  What I'm doing in this patch is
> > ignoring A20 while in SMM mode.
> 
> Okay.
> 
> > Then you would have to add an A20 save/restore in handle_smi; since
> > CALL32SMM_ENTERID should not nest, I think you can just do this:
> 
> Yes, that should be fine.
> 
> > --- a/src/fw/smm.c
> > +++ b/src/fw/smm.c
> > @@ -54,7 +54,8 @@ struct smm_layout {
> >  struct smm_state backup2;
> >  u8 stack[0x7c00];
> >  u64 codeentry;
> > -u8 pad_8008[0x7df8];
> > +u8 a20;
> > +u8 pad_8009[0x7df7];
> >  struct smm_state cpu;
> >  };
> 
> In order to avoid mixing code and data in the same cache line we could
> do this instead:
> 
>  struct smm_layout {
>  struct smm_state backup1;
>  struct smm_state backup2;
> -u8 stack[0x7c00];
> +u32 backup_a20;
> +u8 stack[0x8000 - sizeof(struct smm_state)*2 - sizeof(u32)];
>  u64 codeentry;
>  u8 pad_8008[0x7df8];
>  struct smm_state cpu;
> 
> Thanks,
> -Kevin



Re: [Qemu-devel] [PATCH] trace: add sanity check

2017-05-10 Thread Xu, Anthony
> Please post steps for reproducing the abort.  I cannot reproduce this
> with qemu-system-x86_64.

The steps to reproduce the issue,

./configure --enable-trace-backend=nop --target-list=x86_64-softmmu

gdb -args ./x86_64-softmmu/qemu-system-x86_64 -bios 
/home/root/guest/seabios.bin -smp 1 -machine q35,accel=kvm
   -m 1G -drive format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0 
-nographic  -nodefaults
  -serial stdio -monitor pty


(gdb) bt
#0  0x704e25f7 in raise () from /lib64/libc.so.6
#1  0x704e3ce8 in abort () from /lib64/libc.so.6
#2  0x559de905 in bitmap_new (nbits=)
at /home/root/git/qemu2.git/include/qemu/bitmap.h:96
#3  cpu_common_initfn (obj=0x56621d30) at qom/cpu.c:399
#4  0x55a11869 in object_init_with_type (obj=0x56621d30, 
ti=0x5656bbb0) at qom/object.c:341
#5  0x55a11869 in object_init_with_type (obj=0x56621d30, 
ti=0x5656bd30) at qom/object.c:341
#6  0x55a11efc in object_initialize_with_type 
(data=data@entry=0x56621d30, size=76560,
type=type@entry=0x5656bd30) at qom/object.c:376
#7  0x55a12061 in object_new_with_type (type=0x5656bd30) at 
qom/object.c:484
#8  0x55a121c5 in object_new (typename=typename@entry=0x56550340 
"qemu64-x86_64-cpu")
at qom/object.c:494
#9  0x557f6e3d in pc_new_cpu (typename=typename@entry=0x56550340 
"qemu64-x86_64-cpu", apic_id=0,
errp=errp@entry=0x565391b0 ) at 
/home/root/git/qemu2.git/hw/i386/pc.c:1101
#10 0x557fa33e in pc_cpus_init (pcms=pcms@entry=0x565f9690)
at /home/root/git/qemu2.git/hw/i386/pc.c:1184
#11 0x557fe0f6 in pc_q35_init (machine=0x565f9690) at 
/home/root/git/qemu2.git/hw/i386/pc_q35.c:121
#12 0x5574fbad in main (argc=, argv=, 
envp=) at vl.c:4562


Anthony


> 
> > diff --git a/qom/cpu.c b/qom/cpu.c
> > index f02e9c0..f9111a0 100644
> > --- a/qom/cpu.c
> > +++ b/qom/cpu.c
> > @@ -382,6 +382,7 @@ static void cpu_common_unrealizefn(DeviceState
> *dev, Error **errp)
> >
> >  static void cpu_common_initfn(Object *obj)
> >  {
> > +uint32_t count;
> >  CPUState *cpu = CPU(obj);
> >  CPUClass *cc = CPU_GET_CLASS(obj);
> >
> > @@ -396,7 +397,10 @@ static void cpu_common_initfn(Object *obj)
> >  QTAILQ_INIT(>breakpoints);
> >  QTAILQ_INIT(>watchpoints);
> >
> > -cpu->trace_dstate = bitmap_new(trace_get_vcpu_event_count());
> > +count = trace_get_vcpu_event_count();
> > +if (count) {
> > +cpu->trace_dstate = bitmap_new(count);
> > +}
> >
> >  cpu_exec_initfn(cpu);
> >  }
> > --
> > 1.8.3.1
> >
> >



Re: [Qemu-devel] centos 7.2 guest doesn't boot without kvmvapic in TCG mode

2017-04-10 Thread Xu, Anthony
+ Richard Henderson

Anthony

> -Original Message-
> From: Xu, Anthony
> Sent: Tuesday, April 4, 2017 5:19 PM
> To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
> Cc: 'Paolo Bonzini' <pbonz...@redhat.com>; 'Stefan Hajnoczi'
> <stefa...@gmail.com>
> Subject: [Qemu-devel] centos 7.2 guest doesn't boot without kvmvapic in
> TCG mode
> 
> I disabled kvmvapic by
> 
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index c3829e3..52be2b0 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -440,7 +440,7 @@ static const VMStateDescription
> vmstate_apic_common = {
>  static Property apic_properties_common[] = {
>  DEFINE_PROP_UINT8("version", APICCommonState, version, 0x14),
>  DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control,
> VAPIC_ENABLE_BIT,
> -true),
> +false),
>  DEFINE_PROP_BOOL("legacy-instance-id", APICCommonState,
> legacy_instance_id,
>   false),
>  DEFINE_PROP_END_OF_LIST(),
> 
> 
> Tried to boot centos guest with this command
> 
> ./x86_64-softmmu/qemu-system-x86_64 -bios
> /home/root/guest/seabios.bin
> -machine q35 -m 1G -drive
> format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0
> -nographic  -nodefaults  -serial pty -monitor pty
> -chardev stdio,id=seabios -device isa-
> debugcon,iobase=0x402,chardev=seabios
> 
> 
> Seabios complained
> 
> Seabios output
> "
> Booting from Hard Disk...
> WARNING - Timeout at ahci_command:154!
> Boot failed: could not read the boot disk
> 
> enter handle_18:
>   NULL
> Booting from Floppy...
> Boot failed: could not read the boot disk
> 
> enter handle_18:
>   NULL
> No bootable device.
> "
> 
> It can boot with kvmvapic enabled by using the same command.
> 
> Seabios output
> "
> Booting from Hard Disk...
> Booting from :7c00
> "
> 
> In the original commit,
> " Therefore, the VAPI is enabled by default, even in TCG mode. Here, no
> speedup can be achieved as the emulation overhead of the TPR register is
> marginal compared to instruction emulation."
> 
> Seems no performance benefit for TCG mode
> What's the other benefit of using kvmvapic in TCG mode?
> 
> Should we make TCG mode work without kvmvapic?
> 
> If we don't need to use kvmvapic in TCG mode,
> we can make kvmvapic code clearer by removing all TCG related code in
> kvmvapic.c.
> 
> 
> Anthony



Re: [Qemu-devel] [PATCH 4/4] pam: setup pc.bios

2017-04-10 Thread Xu, Anthony
> I think this is wrong, the high copy should remain read-only or pflash
> stops working when you remove PAM.

I tried to set pc.bios as read-only and isa.bios as read,
it doesn't work. render_memory_region doesn't honor readonly
field of alias MemoryRegion. 

Two FlatRanges created for pc.bios and isa.bios point to the same
MemoryRegion pc.bios. Both get readonly from pc.bios.

Is this a bug or by design?

Pc.bios and isa.bios are backed by the same memory block,
so it may cause CPU TLB alias, any issue here?


> 
> The comment only explains the "what" but not the "why" and the "why" is
> not in the commit message.  See also here:
> 
Will do.


Anthony


Re: [Qemu-devel] [PATCH 1/4] pam:refactor PAM related code

2017-04-10 Thread Xu, Anthony
> On 08/04/2017 08:45, Anthony Xu wrote:
> > split PAM SMRAM functions in piix.c
> > create mch_init_pam in q35.c
> 
> Could you further move
> 
> MemoryRegion *ram_memory;
> MemoryRegion *system_memory;
> MemoryRegion *pci_address_space;
> PAMMemoryRegion pam_regions[13];
> 
> from the northbridge devices up to PCMachineState, and move the common
> code for PIIX and Q35 to hw/pci-host/pam.c?
> 
> It looks like you can define a better API than what pam.c currently
> provides:
> 
> void pc_init_pam(PCMachineState *pc_machine, DeviceState *d);
> void pc_update_pam(PCMachineState *pc_machine, uint8_t *pam_config);
> 
> or even remove "DeviceState *d" because the owner of the memory regions
> can be the PCMachineState.
> 

Good idea!
Will do.

Anthony


Re: [Qemu-devel] [PATCH]Enable kvmvapic only when host doesn't support VAPIC capability in KVM mode

2017-04-10 Thread Xu, Anthony
 
> I think we shouldn't read it like that. It seems that KVM is always
> returning the VAPIC capability except when the CPU is providing a
> special acceleration [0].
> 
> I would say you can't really refer yourself at this bit to enable or
> not kvmapic in QEMU.
> 
> Does that make sense?
> 
> [0]
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?id=774ead3ad9bc
> bc05ef6aaebb9bdf8b4c3126923b


>From this patch and commit message,

"Disable vapic support on Intel machines with FlexPriority"

If the CPU has accelerated tpr, vapic should be disabled.
The function pointer has the name cpu_has_accelerated_tpr.

kvmvapic was created for cpu which doesn't have accelerated tpr,
 https://lists.nongnu.org/archive/html/qemu-devel/2012-02/msg00515.html

from the commit message, I can infer VAPIC means kvmvapic.

My understanding is they are referring to the same thing, though
the return value of KVM_CAP_VAPIC is confusing.

Didn't find explanation about the return value of KVM_CAP_VAPIC,
We can interpret it as following from QEMU perspective,
If KVM_CAP_VAPIC return 1, QEMU should enable VAPIC(kvmvapic).
If KVM_CAP_VAPIC return 0, QEMU should disable VAPIC(kvmvapic).


Anthony






Re: [Qemu-devel] [PATCH]Enable kvmvapic only when host doesn't support VAPIC capability in KVM mode

2017-04-06 Thread Xu, Anthony
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -2232,6 +2232,10 @@ int kvm_has_sync_mmu(void)
> >  return kvm_check_extension(kvm_state, KVM_CAP_SYNC_MMU);
> >  }
> >
> > +int kvm_has_vapic(void){
> > +return !kvm_check_extension(kvm_state, KVM_CAP_VAPIC);
> > +}
> > +
> 
> Is that function shouldn't return true if KVM is providing VAPIC
> capability?
> 

It should, but it doesn't. see below KVM kernel code segment,
it returns false if KVM provides VAPIC.

Thanks for your comments,

Will resend the patch.


Anthony



arch/x86/kvm/x86.c

case KVM_CAP_VAPIC:
r = !kvm_x86_ops->cpu_has_accelerated_tpr();
break;





Re: [Qemu-devel] [PATCH 1/3] move xen-common.c to hw/xen/

2017-04-05 Thread Xu, Anthony
Hi Eric,

Will resend the patch.

Thanks,
Anthony

> -Original Message-
> From: Eric Blake [mailto:ebl...@redhat.com]
> Sent: Wednesday, April 5, 2017 11:50 AM
> To: Xu, Anthony <anthony...@intel.com>; 'qemu-devel@nongnu.org'
> <qemu-devel@nongnu.org>
> Cc: anthony.per...@citrix.com; 'Paolo Bonzini' <pbonz...@redhat.com>;
> sstabell...@kernel.org
> Subject: Re: [Qemu-devel] [PATCH 1/3] move xen-common.c to hw/xen/
> 
> On 04/05/2017 01:39 PM, Xu, Anthony wrote:
> > move xen-common.c to hw/xen/
> >
> 
> Your message failed to set the 'In-Reply-To:' header, and was therefore
> not threaded to your 0/3 cover letter.  Please see if you can fix that
> before posting your next series.
> 
> >
> > Signed-off -by: Anthony Xu <anthony...@intel.com>
> >
> >
> >
> > ---
> >  hw/xen/xen-common.c | 158
> 
> >  stubs/xen-common.c  |  14 +
> >  xen-common-stub.c   |  14 -
> >  xen-common.c| 158 
> > 
> >  4 files changed, 172 insertions(+), 172 deletions(-)
> 
> Please run 'git config diff.renames true' so that patches like this will
> be a LOT smaller on the mailing list, and will more easily focus on the
> (few) things that have to change due to the new name rather than being a
> bulk delete/insert operation that hides the changes.
> 
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org



[Qemu-devel] [PATCH 3/3] move xen-mapcache.c to hw/i386/xen/

2017-04-05 Thread Xu, Anthony
move xen-mapcache.c to hw/i386/xen/

Signed-off -by: Anthony Xu <anthony...@intel.com>



---
 Makefile.target|   3 -
 default-configs/i386-softmmu.mak   |   1 -
 default-configs/x86_64-softmmu.mak |   1 -
 hw/i386/xen/Makefile.objs  |   2 +-
 hw/i386/xen/trace-events   |   6 +
 hw/i386/xen/xen-mapcache.c | 459 +
 trace-events   |   5 -
 xen-mapcache.c | 459 -
 8 files changed, 466 insertions(+), 470 deletions(-)
 create mode 100644 hw/i386/xen/xen-mapcache.c
 delete mode 100644 xen-mapcache.c

diff --git a/Makefile.target b/Makefile.target
index d5ff0c7..a535980 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -149,9 +149,6 @@ obj-y += dump.o
 obj-y += migration/ram.o migration/savevm.o
 LIBS := $(libs_softmmu) $(LIBS)
 
-# xen support
-obj-$(CONFIG_XEN_I386) += xen-mapcache.o
-
 # Hardware support
 ifeq ($(TARGET_NAME), sparc64)
 obj-y += hw/sparc64/
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 029e952..d2ab2f6 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -39,7 +39,6 @@ CONFIG_TPM_TIS=$(CONFIG_TPM)
 CONFIG_MC146818RTC=y
 CONFIG_PCI_PIIX=y
 CONFIG_WDT_IB700=y
-CONFIG_XEN_I386=$(CONFIG_XEN)
 CONFIG_ISA_DEBUG=y
 CONFIG_ISA_TESTDEV=y
 CONFIG_VMPORT=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index d1d7432..9bde2f1 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -39,7 +39,6 @@ CONFIG_TPM_TIS=$(CONFIG_TPM)
 CONFIG_MC146818RTC=y
 CONFIG_PCI_PIIX=y
 CONFIG_WDT_IB700=y
-CONFIG_XEN_I386=$(CONFIG_XEN)
 CONFIG_ISA_DEBUG=y
 CONFIG_ISA_TESTDEV=y
 CONFIG_VMPORT=y
diff --git a/hw/i386/xen/Makefile.objs b/hw/i386/xen/Makefile.objs
index daf4f53..be9d10c 100644
--- a/hw/i386/xen/Makefile.objs
+++ b/hw/i386/xen/Makefile.objs
@@ -1 +1 @@
-obj-y += xen_platform.o xen_apic.o xen_pvdevice.o xen-hvm.o
+obj-y += xen_platform.o xen_apic.o xen_pvdevice.o xen-hvm.o xen-mapcache.o
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index f25d622..547438d 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -15,3 +15,9 @@ cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t 
data_is_ptr, uint64
 cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) 
"I/O=%p pio read reg data=%#"PRIx64" port=%#"PRIx64" size=%d"
 cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t 
size) "I/O=%p pio write reg data=%#"PRIx64" port=%#"PRIx64" size=%d"
 cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy 
dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
+xen_remap_bucket(uint64_t index) "index %#"PRIx64
+xen_map_cache_return(void* ptr) "%p"
+
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
new file mode 100644
index 000..31debdf
--- /dev/null
+++ b/hw/i386/xen/xen-mapcache.c
@@ -0,0 +1,459 @@
+/*
+ * Copyright (C) 2011   Citrix Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+
+#include 
+
+#include "hw/xen/xen_backend.h"
+#include "sysemu/blockdev.h"
+#include "qemu/bitmap.h"
+
+#include 
+
+#include "sysemu/xen-mapcache.h"
+#include "trace.h"
+
+
+//#define MAPCACHE_DEBUG
+
+#ifdef MAPCACHE_DEBUG
+#  define DPRINTF(fmt, ...) do { \
+fprintf(stderr, "xen_mapcache: " fmt, ## __VA_ARGS__); \
+} while (0)
+#else
+#  define DPRINTF(fmt, ...) do { } while (0)
+#endif
+
+#if HOST_LONG_BITS == 32
+#  define MCACHE_BUCKET_SHIFT 16
+#  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
+#else
+#  define MCACHE_BUCKET_SHIFT 20
+#  define MCACHE_MAX_SIZE (1UL<<35) /* 32GB Cap */
+#endif
+#define MCACHE_BUCKET_SIZE (1UL << MCACHE_BUCKET_SHIFT)
+
+/* This is the size of the virtual address space reserve to QEMU that will not
+ * be use by MapCache.
+ * From empirical tests I observed that qemu use 75MB more than the
+ * max_mcache_size.
+ */
+#define NON_MCACHE_MEMORY_SIZE (80 * 1024 * 1024)
+
+typedef struct MapCacheEntry {
+hwaddr paddr_index;
+uint8_t *vaddr_base;
+unsigned long *valid_mapping;
+uint8_t lock;
+hwaddr size;
+struct MapCacheEntry *next;
+} MapCacheEntry;
+
+typedef struct MapCacheRev {
+uint8_t *vaddr_req;
+hwaddr p

[Qemu-devel] [PATCH 1/3] move xen-common.c to hw/xen/

2017-04-05 Thread Xu, Anthony
move xen-common.c to hw/xen/


Signed-off -by: Anthony Xu <anthony...@intel.com>



---
 hw/xen/xen-common.c | 158 
 stubs/xen-common.c  |  14 +
 xen-common-stub.c   |  14 -
 xen-common.c| 158 
 4 files changed, 172 insertions(+), 172 deletions(-)
 create mode 100644 hw/xen/xen-common.c
 create mode 100644 stubs/xen-common.c
 delete mode 100644 xen-common-stub.c
 delete mode 100644 xen-common.c

diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c
new file mode 100644
index 000..fd2c928
--- /dev/null
+++ b/hw/xen/xen-common.c
@@ -0,0 +1,158 @@
+/*
+ * Copyright (C) 2014   Citrix Systems UK Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/xen/xen_backend.h"
+#include "qmp-commands.h"
+#include "sysemu/char.h"
+#include "sysemu/accel.h"
+#include "migration/migration.h"
+
+//#define DEBUG_XEN
+
+#ifdef DEBUG_XEN
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, "xen: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+static int store_dev_info(int domid, Chardev *cs, const char *string)
+{
+struct xs_handle *xs = NULL;
+char *path = NULL;
+char *newpath = NULL;
+char *pts = NULL;
+int ret = -1;
+
+/* Only continue if we're talking to a pty. */
+if (strncmp(cs->filename, "pty:", 4)) {
+return 0;
+}
+pts = cs->filename + 4;
+
+/* We now have everything we need to set the xenstore entry. */
+xs = xs_open(0);
+if (xs == NULL) {
+fprintf(stderr, "Could not contact XenStore\n");
+goto out;
+}
+
+path = xs_get_domain_path(xs, domid);
+if (path == NULL) {
+fprintf(stderr, "xs_get_domain_path() error\n");
+goto out;
+}
+newpath = realloc(path, (strlen(path) + strlen(string) +
+strlen("/tty") + 1));
+if (newpath == NULL) {
+fprintf(stderr, "realloc error\n");
+goto out;
+}
+path = newpath;
+
+strcat(path, string);
+strcat(path, "/tty");
+if (!xs_write(xs, XBT_NULL, path, pts, strlen(pts))) {
+fprintf(stderr, "xs_write for '%s' fail", string);
+goto out;
+}
+ret = 0;
+
+out:
+free(path);
+xs_close(xs);
+
+return ret;
+}
+
+void xenstore_store_pv_console_info(int i, Chardev *chr)
+{
+if (i == 0) {
+store_dev_info(xen_domid, chr, "/console");
+} else {
+char buf[32];
+snprintf(buf, sizeof(buf), "/device/console/%d", i);
+store_dev_info(xen_domid, chr, buf);
+}
+}
+
+
+static void xenstore_record_dm_state(struct xs_handle *xs, const char *state)
+{
+char path[50];
+
+if (xs == NULL) {
+fprintf(stderr, "xenstore connection not initialized\n");
+exit(1);
+}
+
+snprintf(path, sizeof (path), "device-model/%u/state", xen_domid);
+if (!xs_write(xs, XBT_NULL, path, state, strlen(state))) {
+fprintf(stderr, "error recording dm state\n");
+exit(1);
+}
+}
+
+
+static void xen_change_state_handler(void *opaque, int running,
+ RunState state)
+{
+if (running) {
+/* record state running */
+xenstore_record_dm_state(xenstore, "running");
+}
+}
+
+static int xen_init(MachineState *ms)
+{
+xen_xc = xc_interface_open(0, 0, 0);
+if (xen_xc == NULL) {
+xen_pv_printf(NULL, 0, "can't open xen interface\n");
+return -1;
+}
+xen_fmem = xenforeignmemory_open(0, 0);
+if (xen_fmem == NULL) {
+xen_pv_printf(NULL, 0, "can't open xen fmem interface\n");
+xc_interface_close(xen_xc);
+return -1;
+}
+qemu_add_vm_change_state_handler(xen_change_state_handler, NULL);
+
+global_state_set_optional();
+savevm_skip_configuration();
+savevm_skip_section_footers();
+
+return 0;
+}
+
+static void xen_accel_class_init(ObjectClass *oc, void *data)
+{
+AccelClass *ac = ACCEL_CLASS(oc);
+ac->name = "Xen";
+ac->init_machine = xen_init;
+ac->allowed = _allowed;
+}
+
+#define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen")
+
+static const TypeInfo xen_accel_type = {
+.name = TYPE_XEN_ACCEL,
+.parent = TYPE_ACCEL,
+.class_init = xen_accel_class_init,
+};
+
+static void xen_type_init(void)
+{
+type_register_static(_accel_type);
+}
+
+type_init(xen_type_init);
diff --git a/stubs/xen-common.c b/stubs/xen-commo

[Qemu-devel] [PATCH 2/3] move xen-hvm.c to hw/i386/xen/

2017-04-05 Thread Xu, Anthony
move xen-hvm.c to hw/i386/xen/


Signed-off -by: Anthony Xu <anthony...@intel.com>


---
 Makefile.target   |5 +-
 hw/i386/xen/Makefile.objs |2 +-
 hw/i386/xen/trace-events  |   11 +
 hw/i386/xen/xen-hvm.c | 1422 +
 hw/xen/Makefile.objs  |2 +-
 stubs/Makefile.objs   |2 +
 stubs/xen-hvm.c   |   63 ++
 trace-events  |   11 -
 xen-hvm-stub.c|   63 --
 xen-hvm.c | 1422 -
 10 files changed, 1501 insertions(+), 1502 deletions(-)
 create mode 100644 hw/i386/xen/xen-hvm.c
 create mode 100644 stubs/xen-hvm.c
 delete mode 100644 xen-hvm-stub.c
 delete mode 100644 xen-hvm.c

diff --git a/Makefile.target b/Makefile.target
index 7df2b8c..d5ff0c7 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -150,10 +150,7 @@ obj-y += migration/ram.o migration/savevm.o
 LIBS := $(libs_softmmu) $(LIBS)
 
 # xen support
-obj-$(CONFIG_XEN) += xen-common.o
-obj-$(CONFIG_XEN_I386) += xen-hvm.o xen-mapcache.o
-obj-$(call lnot,$(CONFIG_XEN)) += xen-common-stub.o
-obj-$(call lnot,$(CONFIG_XEN_I386)) += xen-hvm-stub.o
+obj-$(CONFIG_XEN_I386) += xen-mapcache.o
 
 # Hardware support
 ifeq ($(TARGET_NAME), sparc64)
diff --git a/hw/i386/xen/Makefile.objs b/hw/i386/xen/Makefile.objs
index 801a68d..daf4f53 100644
--- a/hw/i386/xen/Makefile.objs
+++ b/hw/i386/xen/Makefile.objs
@@ -1 +1 @@
-obj-y += xen_platform.o xen_apic.o xen_pvdevice.o
+obj-y += xen_platform.o xen_apic.o xen_pvdevice.o xen-hvm.o
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 321fe60..f25d622 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -4,3 +4,14 @@ xen_platform_log(char *s) "xen platform: %s"
 # hw/i386/xen/xen_pvdevice.c
 xen_pv_mmio_read(uint64_t addr) "WARNING: read from Xen PV Device MMIO space 
(address %"PRIx64")"
 xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space 
(address %"PRIx64")"
+
+# xen-hvm.c
+xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, 
size %#lx"
+xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) 
"%#"PRIx64" size %#lx, log_dirty %i"
+handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d 
size=%d"
+handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p read 
type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d"
+handle_ioreq_write(void *req, uint32_t type, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p write type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d 
size=%d"
+cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p pio dir=%d 
df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d"
+cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) 
"I/O=%p pio read reg data=%#"PRIx64" port=%#"PRIx64" size=%d"
+cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t 
size) "I/O=%p pio write reg data=%#"PRIx64" port=%#"PRIx64" size=%d"
+cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy 
dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d"
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
new file mode 100644
index 000..0892361
--- /dev/null
+++ b/hw/i386/xen/xen-hvm.c
@@ -0,0 +1,1422 @@
+/*
+ * Copyright (C) 2010   Citrix Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#include "hw/pci/pci.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/apic-msidef.h"
+#include "hw/xen/xen_common.h"
+#include "hw/xen/xen_backend.h"
+#include "qmp-commands.h"
+
+#include "sysemu/char.h"
+#include "qemu/error-report.h"
+#include "qemu/range.h"
+#include "sysemu/xen-mapcache.h"
+#include "trace.h"
+#include "exec/address-spaces.h"
+
+#include 
+#include 
+#include 

[Qemu-devel] [PATCH 0/3] move xen related files to corresponding xen directory

2017-04-05 Thread Xu, Anthony
move xen related files to corresponding xen directory.

move xen-common.c to hw/xen/
move xen-hvm.c to hw/i386/xen/
move xen-mapcache.c to hw/i386/xen/


Signed-off -by: Anthony Xu <anthony...@intel.com>



Makefile.target|6 -
 default-configs/i386-softmmu.mak   |1 -
 default-configs/x86_64-softmmu.mak |1 -
 hw/i386/xen/Makefile.objs  |2 +-
 hw/i386/xen/trace-events   |   17 +
 hw/i386/xen/xen-hvm.c  | 1422 
 hw/i386/xen/xen-mapcache.c |  459 
 hw/xen/Makefile.objs   |2 +-
 hw/xen/xen-common.c|  158 
 stubs/Makefile.objs|2 +
 stubs/xen-common.c |   14 +
 stubs/xen-hvm.c|   63 ++
 trace-events   |   16 -
 xen-common-stub.c  |   14 -
 xen-common.c   |  158 
 xen-hvm-stub.c |   63 --
 xen-hvm.c  | 1422 
 xen-mapcache.c |  459 
 18 files changed, 2137 insertions(+), 2142 deletions(-)
 create mode 100644 hw/i386/xen/xen-hvm.c
 create mode 100644 hw/i386/xen/xen-mapcache.c
 create mode 100644 hw/xen/xen-common.c
 create mode 100644 stubs/xen-common.c
 create mode 100644 stubs/xen-hvm.c
 delete mode 100644 xen-common-stub.c
 delete mode 100644 xen-common.c
 delete mode 100644 xen-hvm-stub.c
 delete mode 100644 xen-hvm.c
 delete mode 100644 xen-mapcache.c






[Qemu-devel] [PATCH]Enable kvmvapic only when host doesn't support VAPIC capability in KVM mode

2017-04-04 Thread Xu, Anthony
In KVM mode, enable kvmvapic only when host doesn't support 
VAPIC capability.

Save the time to set up kvmvapic in some hosts.


Signed-off -by: Anthony Xu <anthony...@intel.com>


diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index c3829e3..d5c53af 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -317,8 +317,11 @@ static void apic_common_realize(DeviceState *dev, Error 
**errp)
 info = APIC_COMMON_GET_CLASS(s);
 info->realize(dev, errp);
 
-/* Note: We need at least 1M to map the VAPIC option ROM */
+/* Note: We need at least 1M to map the VAPIC option ROM,
+   if it is KVM, enable kvmvapic only when KVM doesn't have 
+   VAPIC capability*/ 
 if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK &&
+(!kvm_enabled() || (kvm_enabled() && !kvm_has_vapic())) &&
 !hax_enabled() && ram_size >= 1024 * 1024) {
 vapic = sysbus_create_simple("kvmvapic", -1, NULL);
 }
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 24281fc..43e0e4c 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -215,6 +215,7 @@ extern KVMState *kvm_state;
 
 bool kvm_has_free_slot(MachineState *ms);
 int kvm_has_sync_mmu(void);
+int kvm_has_vapic(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
diff --git a/kvm-all.c b/kvm-all.c
index 90b8573..edcb6ea 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2232,6 +2232,10 @@ int kvm_has_sync_mmu(void)
 return kvm_check_extension(kvm_state, KVM_CAP_SYNC_MMU);
 }
 
+int kvm_has_vapic(void){
+return !kvm_check_extension(kvm_state, KVM_CAP_VAPIC);
+}
+
 int kvm_has_vcpu_events(void)
 {
 return kvm_state->vcpu_events;



[Qemu-devel] centos 7.2 guest doesn't boot without kvmvapic in TCG mode

2017-04-04 Thread Xu, Anthony
I disabled kvmvapic by 

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index c3829e3..52be2b0 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -440,7 +440,7 @@ static const VMStateDescription vmstate_apic_common = {
 static Property apic_properties_common[] = {
 DEFINE_PROP_UINT8("version", APICCommonState, version, 0x14),
 DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control, VAPIC_ENABLE_BIT,
-true),
+false),
 DEFINE_PROP_BOOL("legacy-instance-id", APICCommonState, legacy_instance_id,
  false),
 DEFINE_PROP_END_OF_LIST(),


Tried to boot centos guest with this command

./x86_64-softmmu/qemu-system-x86_64 -bios /home/root/guest/seabios.bin 
-machine q35 -m 1G -drive 
format=raw,file=/home/root/images/centos7.2.img,if=ide,index=0 
-nographic  -nodefaults  -serial pty -monitor pty 
-chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios


Seabios complained 

Seabios output
"
Booting from Hard Disk...
WARNING - Timeout at ahci_command:154!
Boot failed: could not read the boot disk

enter handle_18:
  NULL
Booting from Floppy...
Boot failed: could not read the boot disk

enter handle_18:
  NULL
No bootable device.
"

It can boot with kvmvapic enabled by using the same command.

Seabios output
"
Booting from Hard Disk...
Booting from :7c00
"

In the original commit,
" Therefore, the VAPI is enabled by default, even in TCG mode. Here, no
speedup can be achieved as the emulation overhead of the TPR register is
marginal compared to instruction emulation."

Seems no performance benefit for TCG mode
What's the other benefit of using kvmvapic in TCG mode?

Should we make TCG mode work without kvmvapic?

If we don't need to use kvmvapic in TCG mode, 
we can make kvmvapic code clearer by removing all TCG related code in 
kvmvapic.c.


Anthony



[Qemu-devel] [PATCH] Put all trace.o into libqemuutil.a

2017-04-04 Thread Xu, Anthony
Put all trace.o into libqemuutil.a

Currently all trace.o are linked into qemu-system, qemu-img, 
qemu-nbd, qemu-io etc., even the corresponding components 
are not included.
Put all trace.o into libqemuutil.a that the linker would only pull in .o 
files containing symbols that are actually referenced by the 
program.


Signed-off -by: Anthony Xu <anthony...@intel.com>



diff --git a/Makefile b/Makefile
index 6c359b2..31d41a7 100644
--- a/Makefile
+++ b/Makefile
@@ -346,7 +346,7 @@ dtc/%:
mkdir -p $@

 $(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \
-   $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) $(trace-obj-y)
+   $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY))

 ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS))
 # Only keep -O and -g cflags
@@ -366,11 +366,11 @@ Makefile: $(version-obj-y)
 # Build libraries

 libqemustub.a: $(stub-obj-y)
-libqemuutil.a: $(util-obj-y)
+libqemuutil.a: $(util-obj-y) $(trace-obj-y)

 ##

-COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a
+COMMON_LDADDS = libqemuutil.a libqemustub.a

 qemu-img.o: qemu-img-cmds.h

diff --git a/Makefile.target b/Makefile.target
index d5ff0c7..69239e0 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -185,8 +185,7 @@ dummy := $(call unnest-vars,.., \
qom-obj-y \
io-obj-y \
common-obj-y \
-   common-obj-m \
-   trace-obj-y)
+   common-obj-m)
 target-obj-y := $(target-obj-y-save)
 all-obj-y += $(common-obj-y)
 all-obj-y += $(target-obj-y)
@@ -198,7 +197,7 @@ all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y)

 $(QEMU_PROG_BUILD): config-devices.mak

-COMMON_LDADDS = $(trace-obj-y) ../libqemuutil.a ../libqemustub.a
+COMMON_LDADDS = ../libqemuutil.a ../libqemustub.a

 # build either PROG or PROGW
 $(QEMU_PROG_BUILD): $(all-obj-y) $(COMMON_LDADDS)
diff --git a/tests/Makefile.include b/tests/Makefile.include
index f3de81f..579ec07 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -519,7 +519,7 @@ QEMU_CFLAGS += -I$(SRC_PATH)/tests


 # Deps that are common to various different sets of tests below
-test-util-obj-y = $(trace-obj-y) libqemuutil.a libqemustub.a
+test-util-obj-y = libqemuutil.a libqemustub.a
 test-qom-obj-y = $(qom-obj-y) $(test-util-obj-y)
 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
tests/test-qapi-event.o tests/test-qmp-introspect.o \



Re: [Qemu-devel] [PATCH] Create libqemutrace.a for all trace.o

2017-03-28 Thread Xu, Anthony
> >>> ./trace.o, ./qapi/trace.o and ./util/trace.o are added into
> >>> libqemuutil.a  to avoid recursive dependencies between
> >>> libqemuutil.a and libqemutrace.a.
> >> Why would libqemutrace.a depend on libqemuutil.a?
> > Each trace.c calls trace_event_register_group to register events,
> > trace_event_register_group is defined in trace/control.c , which
> > is linked into libqemuutil.a.
> 
> Ah:
> 
> util-obj-$(CONFIG_TRACE_SIMPLE) += simple.o
> util-obj-$(CONFIG_TRACE_FTRACE) += ftrace.o
> util-obj-y += control.o
> util-obj-y += qmp.o
> 
> With the introduction of libqemutrace.a, I believe these should be moved
> into libqemutrace.a.

Agreed,
But it doesn't solve infinite recursion issue. register_module_init is 
needed by libqemutrace.a, which is defined util/module.c.

it is hard to remove libqemutrace.a dependency on libqemuutil.a.

Removing libqemuutil.a dependency on libqemutrace.a is feasible.
Just like what I did in this patch, include all util related trace.o 
to libqemuutila.

The other simple way is to include all trace.o into libqemuutil.a

What's your opinion?

Thanks
Anthony



Re: [Qemu-devel] [PATCH] Create libqemutrace.a for all trace.o

2017-03-27 Thread Xu, Anthony
> > ./trace.o, ./qapi/trace.o and ./util/trace.o are added into
> > libqemuutil.a  to avoid recursive dependencies between
> > libqemuutil.a and libqemutrace.a.
> 
> Why would libqemutrace.a depend on libqemuutil.a?

Each trace.c calls trace_event_register_group to register events, 
trace_event_register_group is defined in trace/control.c , which 
is linked into libqemuutil.a.


> Tracing code shouldn't call other QEMU code.  That would could create
> infinite recursion when a trace event is fired.

If all trace.o needed by libqemuutil.a are linked into libqemuutil.a, 
libqemuutil.a will not depend on libqemutrace.a. This is what this patch 
take to break the infinite recursion.

Or we can link trace/*.o to libqemutrace.a, hope it breaks the infinite 
recursion. But trace/*.o may still depend on libqemuutil.a

Or we can just link all trace.o to libqemuutil.a.


Anthony



[Qemu-devel] [PATCH] Create libqemutrace.a for all trace.o

2017-03-24 Thread Xu, Anthony
Create libqemutrace.a for all trace.o
Currently all trace.o are linked into qemu-system, qemu-img, 
qemu-nbd, qemu-io etc., even the corresponding components 
are not included.
Create a libqemutrace.a that the linker would only pull in .o 
files containing symbols that are actually referenced by the 
program.
./trace.o, ./qapi/trace.o and ./util/trace.o are added into 
libqemuutil.a  to avoid recursive dependencies between 
libqemuutil.a and libqemutrace.a.

Signed-off -by: Anthony Xu <anthony...@intel.com>


diff --git a/Makefile b/Makefile
index 6c359b2..565f5c7 100644
--- a/Makefile
+++ b/Makefile
@@ -345,8 +345,8 @@ subdir-dtc:dtc/libfdt dtc/tests
 dtc/%:
mkdir -p $@

-$(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \
-   $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) $(trace-obj-y)
+$(SUBDIR_RULES): libqemutrace.a libqemuutil.a libqemustub.a $(common-obj-y) 
$(chardev-obj-y) \
+   $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY))

 ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS))
 # Only keep -O and -g cflags
@@ -367,10 +367,11 @@ Makefile: $(version-obj-y)

 libqemustub.a: $(stub-obj-y)
 libqemuutil.a: $(util-obj-y)
+libqemutrace.a: $(trace-obj-y)

 ##

-COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a
+COMMON_LDADDS = libqemutrace.a libqemuutil.a libqemustub.a

 qemu-img.o: qemu-img-cmds.h

diff --git a/Makefile.objs b/Makefile.objs
index 6167e7b..4289ef9 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -3,6 +3,7 @@
 stub-obj-y = stubs/ crypto/
 util-obj-y = util/ qobject/ qapi/
 util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o qapi-event.o
+util-obj-y += $(trace-root-obj-y)

 chardev-obj-y = chardev/

@@ -167,8 +168,9 @@ trace-events-subdirs += qapi

 trace-events-files = $(SRC_PATH)/trace-events 
$(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events)

-trace-obj-y = trace-root.o
+trace-root-obj-y = trace-root.o
+trace-root-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o
+trace-obj-y = $(trace-root-obj-y)
 trace-obj-y += $(trace-events-subdirs:%=%/trace.o)
 trace-obj-$(CONFIG_TRACE_UST) += trace-ust-all.o
-trace-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o
 trace-obj-$(CONFIG_TRACE_DTRACE) += $(trace-events-subdirs:%=%/trace-dtrace.o)
diff --git a/Makefile.target b/Makefile.target
index 7df2b8c..6f508c8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -201,7 +201,7 @@ all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y)

 $(QEMU_PROG_BUILD): config-devices.mak

-COMMON_LDADDS = $(trace-obj-y) ../libqemuutil.a ../libqemustub.a
+COMMON_LDADDS = ../libqemutrace.a ../libqemuutil.a ../libqemustub.a

 # build either PROG or PROGW
 $(QEMU_PROG_BUILD): $(all-obj-y) $(COMMON_LDADDS)
diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs
index 33906ff..d543d56 100644
--- a/qapi/Makefile.objs
+++ b/qapi/Makefile.objs
@@ -4,3 +4,5 @@ util-obj-y += string-input-visitor.o string-output-visitor.o
 util-obj-y += opts-visitor.o qapi-clone-visitor.o
 util-obj-y += qmp-event.o
 util-obj-y += qapi-util.o
+util-obj-y +=  trace.o
+util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o
diff --git a/tests/Makefile.include b/tests/Makefile.include
index f3de81f..6cbd602 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -519,7 +519,7 @@ QEMU_CFLAGS += -I$(SRC_PATH)/tests


 # Deps that are common to various different sets of tests below
-test-util-obj-y = $(trace-obj-y) libqemuutil.a libqemustub.a
+test-util-obj-y = libqemutrace.a libqemuutil.a libqemustub.a
 test-qom-obj-y = $(qom-obj-y) $(test-util-obj-y)
 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
tests/test-qapi-event.o tests/test-qmp-introspect.o \
diff --git a/util/Makefile.objs b/util/Makefile.objs
index c6205eb..e38b91e 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -43,3 +43,5 @@ util-obj-y += qdist.o
 util-obj-y += qht.o
 util-obj-y += range.o
 util-obj-y += systemd.o
+util-obj-y +=  trace.o
+util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o
   





Re: [Qemu-devel] [PATCH] only link current target arch traces to qemu-system

2017-03-23 Thread Xu, Anthony
> Perhaps all trace.o files should be put into their own .a instead of
> being added directly to the linker line:
> 
> COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a
> 
> I think the linker would only pull in .o files containing symbols that are
> actually referenced by the program.

Hi Stefan,

That's a good idea!

Below patch creates libqemutrace.a.  ./trace.o, ./qapi/trace.o and 
./util/trace.o are added into libqemuutil.a  to avoid recursive dependencies 
between libqemuutil.a and libqemutrace.a.


Anthony




diff --git a/Makefile b/Makefile
index 6c359b2..565f5c7 100644
--- a/Makefile
+++ b/Makefile
@@ -345,8 +345,8 @@ subdir-dtc:dtc/libfdt dtc/tests
 dtc/%:
mkdir -p $@

-$(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \
-   $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY)) $(trace-obj-y)
+$(SUBDIR_RULES): libqemutrace.a libqemuutil.a libqemustub.a $(common-obj-y) 
$(chardev-obj-y) \
+   $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY))

 ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS))
 # Only keep -O and -g cflags
@@ -367,10 +367,11 @@ Makefile: $(version-obj-y)

 libqemustub.a: $(stub-obj-y)
 libqemuutil.a: $(util-obj-y)
+libqemutrace.a: $(trace-obj-y)

 ##

-COMMON_LDADDS = $(trace-obj-y) libqemuutil.a libqemustub.a
+COMMON_LDADDS = libqemutrace.a libqemuutil.a libqemustub.a

 qemu-img.o: qemu-img-cmds.h

diff --git a/Makefile.objs b/Makefile.objs
index 6167e7b..4289ef9 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -3,6 +3,7 @@
 stub-obj-y = stubs/ crypto/
 util-obj-y = util/ qobject/ qapi/
 util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o qapi-event.o
+util-obj-y += $(trace-root-obj-y)

 chardev-obj-y = chardev/

@@ -167,8 +168,9 @@ trace-events-subdirs += qapi

 trace-events-files = $(SRC_PATH)/trace-events 
$(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events)

-trace-obj-y = trace-root.o
+trace-root-obj-y = trace-root.o
+trace-root-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o
+trace-obj-y = $(trace-root-obj-y)
 trace-obj-y += $(trace-events-subdirs:%=%/trace.o)
 trace-obj-$(CONFIG_TRACE_UST) += trace-ust-all.o
-trace-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o
 trace-obj-$(CONFIG_TRACE_DTRACE) += $(trace-events-subdirs:%=%/trace-dtrace.o)
diff --git a/Makefile.target b/Makefile.target
index 7df2b8c..6f508c8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -201,7 +201,7 @@ all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y)

 $(QEMU_PROG_BUILD): config-devices.mak

-COMMON_LDADDS = $(trace-obj-y) ../libqemuutil.a ../libqemustub.a
+COMMON_LDADDS = ../libqemutrace.a ../libqemuutil.a ../libqemustub.a

 # build either PROG or PROGW
 $(QEMU_PROG_BUILD): $(all-obj-y) $(COMMON_LDADDS)
diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs
index 33906ff..d543d56 100644
--- a/qapi/Makefile.objs
+++ b/qapi/Makefile.objs
@@ -4,3 +4,5 @@ util-obj-y += string-input-visitor.o string-output-visitor.o
 util-obj-y += opts-visitor.o qapi-clone-visitor.o
 util-obj-y += qmp-event.o
 util-obj-y += qapi-util.o
+util-obj-y +=  trace.o
+util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o
diff --git a/tests/Makefile.include b/tests/Makefile.include
index f3de81f..6cbd602 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -519,7 +519,7 @@ QEMU_CFLAGS += -I$(SRC_PATH)/tests


 # Deps that are common to various different sets of tests below
-test-util-obj-y = $(trace-obj-y) libqemuutil.a libqemustub.a
+test-util-obj-y = libqemutrace.a libqemuutil.a libqemustub.a
 test-qom-obj-y = $(qom-obj-y) $(test-util-obj-y)
 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
tests/test-qapi-event.o tests/test-qmp-introspect.o \
diff --git a/util/Makefile.objs b/util/Makefile.objs
index c6205eb..e38b91e 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -43,3 +43,5 @@ util-obj-y += qdist.o
 util-obj-y += qht.o
 util-obj-y += range.o
 util-obj-y += systemd.o
+util-obj-y +=  trace.o
+util-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace.o
   




Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-22 Thread Xu, Anthony
> So please send it to the list with Signed-off-by line.
Thanks, 

> 
> DPRINTF("handle_mmio\n");
> /* Called outside BQL */
> address_space_rw(_space_memory,
>  run->mmio.phys_addr, attrs,
>  run->mmio.data,
>  run->mmio.len,
>  run->mmio.is_write);
> 
> and adding a simple assert() would have quickly disproved your theory.
You are right here.
If it is a PCI bar write, a memory region operation(del/add) may 
be called inside address_space_rw, and 
memory_region_transaction_commit will be called.
If address_space_rw is called  without the global lock, 
memory_region_transaction_commit is called with the global lock.
 It conflicts with what you said before.

>No, you don't need to check it.  Most functions (in this case,
>memory_region_transaction_commit) can only be called under the global lock.

And if two vcpus program the different PCI bars at the same time 
(it is unlikely, but QEMU should not assume it), without the global lock, 
region operations may be called at the same time. 
Are memory_region_del_subregion and 
memory_region_add_subregion_overlap thread-safe?

> It's not a fix if the code is not thread-safe anymore!  But I think you
> have the answer now as to why you cannot use synchronize_rcu.

Can you elaborate why the code is not thread-safe?

Thanks
Anthony


[Qemu-devel] [PATCH] clear pending status before calling memory commit

2017-03-22 Thread Xu, Anthony
clear pending status before calling memory commit.
Otherwise when memory_region_finalize is called, 
memory_region_transaction_depth is 0 and 
memory_region_update_pending is true. 
That's wrong.


Signed-off -by: Anthony Xu <anthony...@intel.com>



diff --git a/memory.c b/memory.c
index 64b0a60..4c95aaf 100644
--- a/memory.c
+++ b/memory.c
@@ -906,12 +906,6 @@ void memory_region_transaction_begin(void)
 ++memory_region_transaction_depth;
 }

-static void memory_region_clear_pending(void)
-{
-memory_region_update_pending = false;
-ioeventfd_update_pending = false;
-}
-
 void memory_region_transaction_commit(void)
 {
 AddressSpace *as;
@@ -927,14 +921,14 @@ void memory_region_transaction_commit(void)
 QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
 address_space_update_topology(as);
 }
-
+memory_region_update_pending = false;
 MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
 } else if (ioeventfd_update_pending) {
 QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
 address_space_update_ioeventfds(as);
 }
+ioeventfd_update_pending = false;
 }
-memory_region_clear_pending();
}
 }




[Qemu-devel] [PATCH] only link current target arch traces to qemu-system

2017-03-21 Thread Xu, Anthony
When building target x86_64-softmmu, all other architectures' trace.o are 
linked into 
x86_64-softmmu/qemu-system-x86_64, like hw/arm/trace.o, hw/mips/trace.o etc., 
that is not necessary.
 Same thing happens when building other targets.

Only current target arch traces should be linked into qemu-system.

Signed-off -by: Anthony Xu <anthony...@intel.com>




diff --git a/Makefile.target b/Makefile.target
index 7df2b8c..638e044 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -177,6 +177,41 @@ block-obj-y :=
 common-obj-y :=
 chardev-obj-y :=
 include $(SRC_PATH)/Makefile.objs
+
+# remove all arch related trace
+trace-obj-y := $(filter-out hw/alpha/trace.o 
hw/alpha/trace-dtrace.o,$(trace-obj-y))
+trace-obj-y := $(filter-out target/alpha/trace.o 
target/alpha/trace-dtrace.o,$(trace-obj-y))
+
+trace-obj-y := $(filter-out hw/arm/trace.o 
hw/arm/trace-dtrace.o,$(trace-obj-y))
+trace-obj-y := $(filter-out target/arm/trace.o 
target/arm/trace-dtrace.o,$(trace-obj-y))
+
+trace-obj-y := $(filter-out hw/i386/trace.o 
hw/i386/trace-dtrace.o,$(trace-obj-y))
+trace-obj-y := $(filter-out target/i386/trace.o 
target/i386/trace-dtrace.o,$(trace-obj-y))
+
+trace-obj-y := $(filter-out hw/mips/trace.o 
hw/mips/trace-dtrace.o,$(trace-obj-y))
+trace-obj-y := $(filter-out target/mips/trace.o 
target/mips/trace-dtrace.o,$(trace-obj-y))
+
+trace-obj-y := $(filter-out hw/sparc/trace.o 
hw/sparc/trace-dtrace.o,$(trace-obj-y))
+trace-obj-y := $(filter-out target/sparc/trace.o 
target/sparc/trace-dtrace.o,$(trace-obj-y))
+
+trace-obj-y := $(filter-out hw/s390x/trace.o 
hw/s390x/trace-dtrace.o,$(trace-obj-y))
+trace-obj-y := $(filter-out target/s390x/trace.o 
target/s390x/trace-dtrace.o,$(trace-obj-y))
+
+trace-obj-y := $(filter-out hw/ppc/trace.o 
hw/ppc/trace-dtrace.o,$(trace-obj-y))
+trace-obj-y := $(filter-out target/ppc/trace.o 
target/ppc/trace-dtrace.o,$(trace-obj-y))
+
+# add current arch related trace
+carch := $(TARGET_BASE_ARCH)
+ifneq ($(wildcard $(SRC_PATH)/hw/$(carch)/trace-events),)
+trace-obj-y += hw/$(carch)/trace.o
+trace-obj-$(CONFIG_TRACE_DTRACE) += hw/$(carch)/trace-dtrace.o
+endif
+
+ifneq ($(wildcard $(SRC_PATH)/target/$(carch)/trace-events),)
+trace-obj-y += target/$(carch)/trace.o
+trace-obj-$(CONFIG_TRACE_DTRACE) += target/$(carch)/trace-dtrace.o
+endif
+
 dummy := $(call unnest-vars,,target-obj-y)
 target-obj-y-save := $(target-obj-y)
 dummy := $(call unnest-vars,.., \



Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-16 Thread Xu, Anthony
> > memory_region_finalize.
> > Let me know if you think otherwise.
> 
> Yes, you can replace memory_region_del_subregion in
> memory_region_finalize
> with special code that does
> 
> assert(!mr->enabled);
> assert(subregion->container == mr);
> subregion->container = NULL;
> QTAILQ_REMOVE(>subregions, subregion, subregions_link);
> memory_region_unref(subregion);
> 
> The last four lines are shared with memory_region_del_subregion, so please
> factor them in a new function, for example
> memory_region_del_subregion_internal.


After adding synchronize_rcu, I saw an infinite recursive call,
  mem_commit-> memory_region_finalize-> mem_commit->
memory_region_finalize-> ..
it caused a segment fault, because 8M stack space is used up, and found when 
memory_region_finalize is called, memory_region_transaction_depth is 0 and 
memory_region_update_pending is true. That's not normal!

As you mentioned in your previous email, that should  not happen.
>" But if memory_region_transaction_depth is == 0, there should be no
>update pending because the loop has never run"

The root cause is,
MEMORY_LISTENER_CALL_GLOBAL(commit, Forward) is called before 
memory_region_clear_pending() in memory_region_transaction_commit.
so when MEMORY_LISTENER_CALL_GLOBAL(commit, Forward) is called,
memory_region_transaction_depth is 0 and memory_region_update_pending is true.
mem_commit may call memory_region_finalize, it causes infinite recursive call.

my previous fix for this is not complete.
We may clear memory_region_update_pending before calling 
MEMORY_LISTENER_CALL_GLOBAL(commit, Forward)

Please review below patch

diff --git a/memory.c b/memory.c
index 64b0a60..4c95aaf 100644
--- a/memory.c
+++ b/memory.c
@@ -906,12 +906,6 @@ void memory_region_transaction_begin(void)
 ++memory_region_transaction_depth;
 }

-static void memory_region_clear_pending(void)
-{
-memory_region_update_pending = false;
-ioeventfd_update_pending = false;
-}
-
 void memory_region_transaction_commit(void)
 {
 AddressSpace *as;
@@ -927,14 +921,14 @@ void memory_region_transaction_commit(void)
 QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
 address_space_update_topology(as);
 }
-
+memory_region_update_pending = false;
 MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
 } else if (ioeventfd_update_pending) {
 QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
 address_space_update_ioeventfds(as);
 }
+ioeventfd_update_pending = false;
 }
-memory_region_clear_pending();
}
 }




> > It is not slow, the contention is not high. Yes we can test.
> 
> It's not a matter of contention.  It's a matter of how long an RCU
> critical section can be---not just at startup, but at any point
> during QEMU execution.
The thing is, seems both address_space_translate and 
address_space_dispatch_free 
are called under the global lock. When synchronize_rcu is called, no other 
threads 
are in RCU critical section.

Seems RCU is not that useful for address space.


> 
> Have you tried tcmalloc or jemalloc?  They use the brk region
> less and sometimes are more aggressive in releasing mmap-ed memory
> areas.

They may be aggressive. But if memory are freed afterward, it may not help in 
some cases,
for example, starting a lot of VMs at the same time.


> 
> > Please review below patch, MemoryRegion is protected by RCU.
> 
> Removing transaction begin/commit in memory_region_finalize makes
> little sense because memory_region_del_subregion calls those two
> functions again.  See above for an alternative.  Apart from this,
> the patch is at least correct, though I'm not sure it's a good
> idea (synchronize_rcu needs testing). 
> I'm not sure I understand the
> address_space_write change).

After adding synchronize_rcu, we noticed a RCU dead loop.  synchronize_rcu is 
called 
inside RCU critical section. It happened when guest OS programmed the PCI bar.
The call trace is like,
address_space_write-> pci_host_config_write_common -> 
memory_region_transaction_commit ->mem_commit-> synchronize_rcu
pci_host_config_write_common is called inside RCU critical section.

The address_space_write change fixed this issue.


Thanks,
Anthony


Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-15 Thread Xu, Anthony
> The first unref is done after as->current_map is overwritten.
> as->current_map is accessed under RCU, so it needs call_rcu.  It
> balances the initial reference that is present since flatview_init.

Got it, thanks for explanation.

> > but it is not clear to me, is this a bug or by design? Is flatview_destroy 
> > called
> in current thread
> > or RCU thread?
> 
> You cannot know, because there are also other callers of
> address_space_get_flatview.  Usually it's from the RCU thread.

If it is from current thread, do we need to check if the global lock is 
acquired?
As you mentioned flatview_unref may call memory_region_unref.


> - let memory_region_finalize remove subregions (commit 2e2b8eb,
> "memory:
> allow destroying a non-empty MemoryRegion", 2015-10-09)
> 
> - let memory_region_finalize disable coalesced I/O (in fact there are no
> callers of memory_region_clear_coalescing outside memory.c)

Okay, It may have sub regions,

In the comments of memory_region_finalize
"   /* We know the region is not visible in any address space (it
 * does not have a container and cannot be a root either because
 * it has no references,
"

We know the memory region is not a root region, and the memory region has 
already been removed from address space even it has sub regions.
memory_region_transaction_commit should be called when the 
memory region is removed from address space.
memory_region_transaction_commit seems not be needed in memory_region_finalize.
Let me know if you think otherwise.


> > How about fall back to synchronize_rcu?
> 
> I'm afraid it would be too slow, but you can test.

It is not slow, the contention is not high. Yes we can test.


> Nope.  The RCU read lock protects all MemoryRegions through
> flatview_unref: RCU keeps the FlatView alive, and the FlatView keeps
> MemoryRegions alive.

I see, RCU needs to protect MemoryRegions as well.



Please review below patch, MemoryRegion is protected by RCU.

Thanks,
Anthony



diff --git a/exec.c b/exec.c
index a22f5a0..6631668 100644
--- a/exec.c
+++ b/exec.c
@@ -2521,7 +2521,8 @@ static void mem_commit(MemoryListener *listener)

 atomic_rcu_set(>dispatch, next);
 if (cur) {
-call_rcu(cur, address_space_dispatch_free, rcu);
+synchronize_rcu();
+address_space_dispatch_free(cur);
 }
 }

@@ -2567,7 +2568,8 @@ void address_space_destroy_dispatch(AddressSpace *as)

 atomic_rcu_set(>dispatch, NULL);
 if (d) {
-call_rcu(d, address_space_dispatch_free, rcu);
+synchronize_rcu();
+address_space_dispatch_free(d);
 }
 }

@@ -2712,19 +2714,22 @@ static bool prepare_mmio_access(MemoryRegion *mr)
 return release_lock;
 }

-/* Called within RCU critical section.  */
-static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
-MemTxAttrs attrs,
-const uint8_t *buf,
-int len, hwaddr addr1,
-hwaddr l, MemoryRegion *mr)
+MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
+  MemTxAttrs attrs, const uint8_t *buf, int len)
 {
 uint8_t *ptr;
 uint64_t val;
+hwaddr l=len;
+hwaddr addr1;
+MemoryRegion *mr;
 MemTxResult result = MEMTX_OK;
 bool release_lock = false;

 for (;;) {
+rcu_read_lock();
+mr = address_space_translate(as, addr, , , true);
+memory_region_ref(mr);
+rcu_read_unlock();
 if (!memory_access_is_direct(mr, true)) {
 release_lock |= prepare_mmio_access(mr);
 l = memory_access_size(mr, l, addr1);
@@ -2764,7 +2769,7 @@ static MemTxResult 
address_space_write_continue(AddressSpace *as, hwaddr addr,
 memcpy(ptr, buf, l);
 invalidate_and_set_dirty(mr, addr1, l);
 }
-
+memory_region_unref(mr);
 if (release_lock) {
 qemu_mutex_unlock_iothread();
 release_lock = false;
@@ -2779,27 +2784,6 @@ static MemTxResult 
address_space_write_continue(AddressSpace *as, hwaddr addr,
 }

 l = len;
-mr = address_space_translate(as, addr, , , true);
-}
-
-return result;
-}
-
-MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs 
attrs,
-const uint8_t *buf, int len)
-{
-hwaddr l;
-hwaddr addr1;
-MemoryRegion *mr;
-MemTxResult result = MEMTX_OK;
-
-if (len > 0) {
-rcu_read_lock();
-l = len;
-mr = address_space_translate(as, addr, , , true);
-result = address_space_write_continue(as, addr, attrs, buf, len,
-  addr1, l, mr);
-rcu_read_unlock();
 }

 return result;
diff --git a/memory.c b/memory.c
index 64b0a60..d12437c 100644
--- a/memory.c
+++ b/memory.c
@@ -1505,13 +1505,10 @@ static void 

Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-14 Thread Xu, Anthony
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, March 14, 2017 3:15 AM
> To: Xu, Anthony <anthony...@intel.com>
> Cc: Zhong, Yang <yang.zh...@intel.com>; qemu-devel@nongnu.org; Peng,
> Chao P <chao.p.p...@intel.com>
> Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from
> 12252kB to 2752KB
> 
> 
> 
> On 14/03/2017 06:14, Xu, Anthony wrote:
> > Below functions are registered in RCU thread
> > address_space_dispatch_free,
> > do_address_space_destroy
> > flatview_unref
> > reclaim_ramblock,
> > qht_map_destroy,
> > migration_bitmap_free
> >
> > first three are address space related, should work without global lock per
> above analysis.
> > The rest are very simple, seems doesn't need global lock.
> 
> flatview_unref can call object_unref and thus reach:

Okay,  flatview_unref is the one you worried about,

Flatview_unref is registered as a RCU callback only in 
address_space_update_topology,
Strangely, it is registered as a RCU callback, and is called directly in this 
function. 
Basially flatview_unref is called twice for old view. There are some comments, 
but it is not clear to me, is this a bug or by design? Is flatview_destroy 
called in current thread 
or RCU thread?


static void address_space_update_topology(AddressSpace *as)
{
FlatView *old_view = address_space_get_flatview(as);
FlatView *new_view = generate_memory_topology(as->root);

address_space_update_topology_pass(as, old_view, new_view, false);
address_space_update_topology_pass(as, old_view, new_view, true);

/* Writes are protected by the BQL.  */
atomic_rcu_set(>current_map, new_view);
call_rcu(old_view, flatview_unref, rcu);

/* Note that all the old MemoryRegions are still alive up to this
 * point.  This relieves most MemoryListeners from the need to
 * ref/unref the MemoryRegions they get---unless they use them
 * outside the iothread mutex, in which case precise reference
 * counting is necessary.
 */
flatview_unref(old_view);

address_space_update_ioeventfds(as);
}


Let me split the patch, Do you think below patch is correct?

--- a/memory.c
+++ b/memory.c
@@ -1503,15 +1503,9 @@ static void memory_region_finalize(Object *obj)
  * and cause an infinite loop.
  */
 mr->enabled = false;
-memory_region_transaction_begin();
-while (!QTAILQ_EMPTY(>subregions)) {
-MemoryRegion *subregion = QTAILQ_FIRST(>subregions);
-memory_region_del_subregion(mr, subregion);
-}
-memory_region_transaction_commit();
-
+assert(QTAILQ_EMPTY(>subregions));
 mr->destructor(mr);
-memory_region_clear_coalescing(m
+assert(QTAILQ_EMPTY(>coalesced));
 g_free((char *)mr->name);
 g_free(mr->ioeventfds);
 }


Then let us take a look at flatview_unref, memory_region_unref may call any QOM 
finalize through 
Object_unref(mr->owner), that's your concern, I got it. It is way too 
complicated to look at each QOM 
object release callbacks and each QOM property release callbacks.  I gave up 
this path.

How about fall back to synchronize_rcu?
As for address space, the RCU read lock is used to protect PhysPageMap, but not 
the regular MemoryRegions, 
because that are not allocated in PhysPageMap build. Subpage MemoryRegion is an 
exception, because that is 
allocated in PhysPageMap build.

The MemoryRegions returned from address_space_translate are regular 
MemoryRegions, so 
address_space_write_continue and address_space_read_continue don't need RCU 
read lock.

Below patch reclaim address space in synchronized way, it reduces heap size 
from ~12MB to ~3MB.
You need to apply this patch with above patch. And when 
address_space_dispatch_free is called 
it already holds the global lock, we don't need to acquire the global lock in 
address_space_dispatch_free.
Please review this patch.

Thanks,
Anthony


diff --git a/cputlb.c b/cputlb.c
index 6c39927..98bd21f 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -347,6 +347,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 tlb_add_large_page(env, vaddr, size);
 }

+rcu_read_lock();
 sz = size;
 section = address_space_translate_for_iotlb(cpu, asidx, paddr, , );
 assert(sz >= TARGET_PAGE_SIZE);
@@ -406,6 +407,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 } else {
 te->addr_write = -1;
 }
+rcu_read_unlock();
 }

 /* Add a new TLB entry, but without specifying the memory
diff --git a/exec.c b/exec.c
index 6fa337b..446d622 100644
--- a/exec.c
+++ b/exec.c
@@ -455,7 +455,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
*as, hwaddr addr,
 IOMMUTLBEntry iotlb = {0};
 MemoryRegionSection *section;
 MemoryRegion *mr;
-
+rcu_read_lock();
 for (;;) 

Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-13 Thread Xu, Anthony
> > > Subpages never have subregions, so the loop never runs.  The
> begin/commit
> > > pair then becomes:
> > >
> > > ++memory_region_transaction_depth;
> > > --memory_region_transaction_depth;
> > > if (!memory_region_transaction_depth) {
> > > if (memory_region_update_pending) {
> > > ...
> > > } else if (ioeventfd_update_pending) {
> > > ...
> > > }
> > > // memory_region_clear_pending()
> > > memory_region_update_pending = false;
> > > ioeventfd_update_pending = false;
> > >}
> > >
> > > If memory_region_transaction_depth is > 0 the begin/commit pair does
> > > nothing.
> > >
> > > But if memory_region_transaction_depth is == 0, there should be no
> update
> > > pending because the loop has never run.  So I don't see what your patch
> can
> > > change.
> >
> > As I mentioned in PATCH1, this patch is used to fix an issue after we
> remove
> > the global lock in RCU callback. After global lock is removed, other thread
> > may set up update pending, so memory_region_transaction_commit
> > may try to rebuild PhysPageMap even the loop doesn’t run, other thread
> may
> > try to rebuild PhysPageMap at the same time, it is a race condition.
> > subpage MemoryRegion is a specific MemoryRegion, it doesn't belong to
> any
> > address space, it is only used to handle subpage. We may use a new
> structure
> > other than MemoryRegion to handle subpage to make the logic more
> clearer. After
> > the change, RCU callback will not free any MemoryRegion.
> 
> This is not true.  Try hot-unplugging a device.

You are right, hot-unplugging does cause memory region(not for subpage) freed 
in RCU thread.

I tried pci device hot plug/unplug,
When plug a device,
handle_hmp_command->qmp_device_add->qdev_device_add->
virtio_pci_dc_realize->pci_qdev_realize->virtio_pci_realize->
virtio_device_realize->virtio_bus_device_plugged->virtio_pci_device_plugged 
are called.
when unplug a device,
kvm_cpu_exec->memory_region_write_accessor->pci_write->
virtio_device_unrealize->virtio_pci_device_unplugged 
are called.

memory region addition and remove happen in virtio_pci_device_plugged 
and virtio_pci_device_unplugged respectively, memory region operation needs
 to acquire the global lock, but none of them happens in RCU thread.

When memory_region_finalize is called, the memory region has been removed 
from the address space (removed in virtio_pci_device_unplugged), both 
mr->subregions 
and mr->coalesced are empty. It makes sense to me, when memory_region_finalize 
is called, 
means this memory region is not used any more.
Please correct me if I'm wrong here, I only tried pci device hot plug/unplug.

If above assumption is correct, seems we don't need the global lock for memory 
region  
reclamation in RCU thread. Please let me know if other  memory reclamation in 
RCU thread 
need the global lock.

Under the assumption, I have below patch to remove the global lock in RCU 
thread, 
I tested vm boot ,reboot, shutdown and pci device hot plug/unplug.
Please review the patch, 
If no further issue, I will send out an official patch later.

Thanks,
Anthony


diff --git a/memory.c b/memory.c
index 6c58373..43e06e9 100644
--- a/memory.c
+++ b/memory.c
@@ -1503,15 +1503,9 @@ static void memory_region_finalize(Object *obj)
  * and cause an infinite loop.
  */
 mr->enabled = false;
-memory_region_transaction_begin();
-while (!QTAILQ_EMPTY(>subregions)) {
-MemoryRegion *subregion = QTAILQ_FIRST(>subregions);
-memory_region_del_subregion(mr, subregion);
-}
-memory_region_transaction_commit();
-
+assert(QTAILQ_EMPTY(>subregions));
 mr->destructor(mr);
-memory_region_clear_coalescing(mr);
+assert(QTAILQ_EMPTY(>coalesced));
 g_free((char *)mr->name);
 g_free(mr->ioeventfds);
 }
diff --git a/util/rcu.c b/util/rcu.c
index 9adc5e4..51e0248 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -254,24 +254,20 @@ static void *call_rcu_thread(void *opaque)

 atomic_sub(_call_count, n);
 synchronize_rcu();
-qemu_mutex_lock_iothread();
 while (n > 0) {
 node = try_dequeue();
 while (!node) {
-qemu_mutex_unlock_iothread();
 qemu_event_reset(_call_ready_event);
 node = try_dequeue();
 if (!node) {
 qemu_event_wait(_call_ready_event);
 node = try_dequeue();
 }
-qemu_mutex_lock_iothread();
 }

 n--;
 node->func(node);
 }
-qemu_mutex_unlock_iothread();
 }
 abort();
 }





> 
> I'm all for reducing the scope of the global QEMU lock, 
Thanks,


>but this needs a plan
> and a careful analysis of the involved data structures across _all_
> instance_finalize
> implementations.
Agreed,

Below functions are registered in RCU thread
address_space_dispatch_free,

Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-10 Thread Xu, Anthony
> > Ideally, freeing QOM object should not require a global lock.
> > If you see any other QOM requiring a global lock, please let us know, we
> are willing to fix it.
> 
> All of them.  When unplugging a device, the device object will be freed
> from an RCU callback.
> 

Thanks for your reply,
Some objects may not need lock at all to be freed,
Some objects may just need a small lock to be freed,

Should we let RCU callbacks to decide which lock they need instead of enforcing 
 
the global lock in RCU thread?

As for the device object, can we get the global lock inside the object 
free/destroy function for now?

If we can remove the global lock inside RCU thread, we can save 9MB heap 
memory, that's a lot!

Please share with us if you have other idea to do this.

Thanks
Anthony





Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-10 Thread Xu, Anthony
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Friday, March 10, 2017 1:14 AM
> To: Zhong, Yang <yang.zh...@intel.com>; qemu-devel@nongnu.org
> Cc: Xu, Anthony <anthony...@intel.com>; Peng, Chao P
> <chao.p.p...@intel.com>
> Subject: Re: [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to
> 2752KB
> 
> 
> 
> On 10/03/2017 16:14, Yang Zhong wrote:
> > There is no need to delete subregion and do memory begin/commit for
> > subpage in memory_region_finalize().
> >
> > This patch is from Anthony Xu <anthony...@intel.com>.
> >
> > Signed-off-by: Yang Zhong <yang.zh...@intel.com>
> > ---
> >  memory.c | 13 +++--
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/memory.c b/memory.c
> > index 284894b..3e9bfff 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1505,13 +1505,14 @@ static void memory_region_finalize(Object
> *obj)
> >   * and cause an infinite loop.
> >   */
> >  mr->enabled = false;
> > -memory_region_transaction_begin();
> > -while (!QTAILQ_EMPTY(>subregions)) {
> > -MemoryRegion *subregion = QTAILQ_FIRST(>subregions);
> > -memory_region_del_subregion(mr, subregion);
> > +if (!mr->subpage) {
> > +memory_region_transaction_begin();
> > +while (!QTAILQ_EMPTY(>subregions)) {
> > +MemoryRegion *subregion = QTAILQ_FIRST(>subregions);
> > +memory_region_del_subregion(mr, subregion);
> > +}
> > +memory_region_transaction_commit();
> 
> Subpages never have subregions, so the loop never runs.  The begin/commit
> pair then becomes:
> 
> ++memory_region_transaction_depth;
> --memory_region_transaction_depth;
> if (!memory_region_transaction_depth) {
> if (memory_region_update_pending) {
> ...
> } else if (ioeventfd_update_pending) {
> ...
> }
> // memory_region_clear_pending()
> memory_region_update_pending = false;
> ioeventfd_update_pending = false;
>}
> 
> If memory_region_transaction_depth is > 0 the begin/commit pair does
> nothing.
> 
> But if memory_region_transaction_depth is == 0, there should be no update
> pending because the loop has never run.  So I don't see what your patch can
> change.

As I mentioned in PATCH1, this patch is used to fix an issue after we remove
the global lock in RCU callback. After global lock is removed, other thread may 
set up update pending, so memory_region_transaction_commit 
may try to rebuild PhysPageMap even the loop doesn’t run, other thread may try 
to 
rebuild PhysPageMap at the same time, it is a race condition.
subpage MemoryRegion is a specific MemoryRegion, it doesn't belong to any 
address 
space, it is only used to handle subpage. We may use a new structure other than 
MemoryRegion to handle subpage to make the logic more clearer. After the 
change, 
RCU callback will not free any MemoryRegion. 


Anthony

> 
> Of course there could be an update pending because of a bug elsewhere,
> but I tried adding this patch:
> 
> diff --git a/memory.c b/memory.c
> index 284894b..2208a21 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -903,6 +903,10 @@ static void
> address_space_update_topology(AddressSpace *as)  void
> memory_region_transaction_begin(void)
>  {
>  qemu_flush_coalesced_mmio_buffer();
> +if (!memory_region_transaction_depth) {
> +assert(!memory_region_update_pending);
> +assert(!ioeventfd_update_pending);
> +}
>  ++memory_region_transaction_depth;
>  }
> 
> and at least a basic qemu-system-x86_64 run started just fine.  So why does
> this patch make a difference?
> 
> Paolo
> 
> >  }
> > -memory_region_transaction_commit();
> > -


Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-10 Thread Xu, Anthony
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Friday, March 10, 2017 12:42 AM
> To: Zhong, Yang <yang.zh...@intel.com>; qemu-devel@nongnu.org
> Cc: Xu, Anthony <anthony...@intel.com>; Peng, Chao P
> <chao.p.p...@intel.com>
> Subject: Re: [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to
> 2752KB
> 
> 
> 
> On 10/03/2017 16:14, Yang Zhong wrote:
> > There are a lot of memory allocation during the qemu bootup, which are
> > freed later by RCU thread,which means the heap size becomes biger and
> > biger when allocation happens, but the heap may not be shrinked even
> > after release happens,because some memory blocks in top of heap are
> > still being used.Decreasing the sleep and removing
> > qemu_mutex_unlock_iothread() lock, which make call_rcu_thread()thread
> response the free memory in time.
> > This patch will reduce heap Rss around 10M.
> >
> > This patch is from Anthony xu <anthony...@intel.com>.
> >
> > Signed-off-by: Yang Zhong <yang.zh...@intel.com>
> > ---
> >  util/rcu.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/util/rcu.c b/util/rcu.c
> > index 9adc5e4..c5c373c 100644
> > --- a/util/rcu.c
> > +++ b/util/rcu.c
> > @@ -167,7 +167,7 @@ void synchronize_rcu(void)  }
> >
> >
> > -#define RCU_CALL_MIN_SIZE30
> > +#define RCU_CALL_MIN_SIZE5
> >
> >  /* Multi-producer, single-consumer queue based on
> urcu/static/wfqueue.h
> >   * from liburcu.  Note that head is only used by the consumer.
> > @@ -241,7 +241,7 @@ static void *call_rcu_thread(void *opaque)
> >   * added before synchronize_rcu() starts.
> >   */
> >  while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
> > -g_usleep(1);
> > +g_usleep(100);
> >  if (n == 0) {
> >  qemu_event_reset(_call_ready_event);
> >  n = atomic_read(_call_count); @@ -254,24 +254,20
> > @@ static void *call_rcu_thread(void *opaque)
> >
> >  atomic_sub(_call_count, n);
> >  synchronize_rcu();
> > -qemu_mutex_lock_iothread();
> >  while (n > 0) {
> >  node = try_dequeue();
> >  while (!node) {
> > -qemu_mutex_unlock_iothread();
> >  qemu_event_reset(_call_ready_event);
> >  node = try_dequeue();
> >  if (!node) {
> >  qemu_event_wait(_call_ready_event);
> >  node = try_dequeue();
> >  }
> > -qemu_mutex_lock_iothread();
> >  }
> >
> >  n--;
> >  node->func(node);
> >  }
> > -qemu_mutex_unlock_iothread();
> 
> This is wrong.  RCU callbacks currently need the "big QEMU lock", because
> they can free arbitrary QOM objects (including MemoryRegions).

Using "big QEMU lock" in RCU callbacks is a little bit heavy, we'd like to 
remove it.

We noticed an issue related to MemoryRegion popping up after we removed the 
global lock.
The root cause is in MemoryRegion destruction memory_region_transaction_commit 
is called, which may cause issue.
Freeing MemoryRegion seems not cause any issue.
Patch2 in this series fixes this issue,
We found out only subpage MemoryRegion is freed in RCU callbacks, and 
memory_region_transaction_commit
is not needed for subpage MemoryRegion because it doesn't have sub regions.

Ideally, freeing QOM object should not require a global lock. 
If you see any other QOM requiring a global lock, please let us know, we are 
willing to fix it.


Thanks,
Anthony


> 
> Paolo
> 
> >  }
> >  abort();
> >  }
> >