Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-11 Thread Paolo Bonzini


- Original Message -
> From: "Anthony Xu" 
> To: "Paolo Bonzini" , "Yang Zhong" 
> , qemu-devel@nongnu.org
> Cc: "Chao P Peng" 
> Sent: Friday, March 10, 2017 8:30:06 PM
> Subject: RE: [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB
> 
> > > Ideally, freeing QOM object should not require a global lock.
> > > If you see any other QOM requiring a global lock, please let us know, we
> > are willing to fix it.
> > 
> > All of them.  When unplugging a device, the device object will be freed
> > from an RCU callback.
> > 
> 
> Thanks for your reply,
> Some objects may not need lock at all to be freed,
> Some objects may just need a small lock to be freed,
> 
> Should we let RCU callbacks to decide which lock they need instead of
> enforcing the global lock in RCU thread?

Splitting the global lock and deciding which object requires which lock
is very, very hard.

The problem is that everything calling object_unref potentially needs the
global lock, and that includes the following call chain: object_unref
-> memory_region_unref -> flatview_destroy -> flatview_unref ->
address_space_update_topology -> memory_region_transaction_commit.

Paolo

> As for the device object, can we get the global lock inside the object
> free/destroy function for now?
> 
> If we can remove the global lock inside RCU thread, we can save 9MB heap
> memory, that's a lot!
> 
> Please share with us if you have other idea to do this.
> 
> Thanks
> Anthony
> 
> 
> 
> 



Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-10 Thread Xu, Anthony
> > Ideally, freeing QOM object should not require a global lock.
> > If you see any other QOM requiring a global lock, please let us know, we
> are willing to fix it.
> 
> All of them.  When unplugging a device, the device object will be freed
> from an RCU callback.
> 

Thanks for your reply,
Some objects may not need lock at all to be freed,
Some objects may just need a small lock to be freed,

Should we let RCU callbacks to decide which lock they need instead of enforcing 
 
the global lock in RCU thread?

As for the device object, can we get the global lock inside the object 
free/destroy function for now?

If we can remove the global lock inside RCU thread, we can save 9MB heap 
memory, that's a lot!

Please share with us if you have other idea to do this.

Thanks
Anthony





Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-10 Thread Paolo Bonzini


On 10/03/2017 17:05, Xu, Anthony wrote:
> Ideally, freeing QOM object should not require a global lock. 
> If you see any other QOM requiring a global lock, please let us know, we are 
> willing to fix it.

All of them.  When unplugging a device, the device object will be freed
from an RCU callback.

Paolo



Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-10 Thread Xu, Anthony
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Friday, March 10, 2017 12:42 AM
> To: Zhong, Yang ; qemu-devel@nongnu.org
> Cc: Xu, Anthony ; Peng, Chao P
> 
> Subject: Re: [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to
> 2752KB
> 
> 
> 
> On 10/03/2017 16:14, Yang Zhong wrote:
> > There are a lot of memory allocation during the qemu bootup, which are
> > freed later by RCU thread,which means the heap size becomes biger and
> > biger when allocation happens, but the heap may not be shrinked even
> > after release happens,because some memory blocks in top of heap are
> > still being used.Decreasing the sleep and removing
> > qemu_mutex_unlock_iothread() lock, which make call_rcu_thread()thread
> response the free memory in time.
> > This patch will reduce heap Rss around 10M.
> >
> > This patch is from Anthony xu .
> >
> > Signed-off-by: Yang Zhong 
> > ---
> >  util/rcu.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/util/rcu.c b/util/rcu.c
> > index 9adc5e4..c5c373c 100644
> > --- a/util/rcu.c
> > +++ b/util/rcu.c
> > @@ -167,7 +167,7 @@ void synchronize_rcu(void)  }
> >
> >
> > -#define RCU_CALL_MIN_SIZE30
> > +#define RCU_CALL_MIN_SIZE5
> >
> >  /* Multi-producer, single-consumer queue based on
> urcu/static/wfqueue.h
> >   * from liburcu.  Note that head is only used by the consumer.
> > @@ -241,7 +241,7 @@ static void *call_rcu_thread(void *opaque)
> >   * added before synchronize_rcu() starts.
> >   */
> >  while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
> > -g_usleep(1);
> > +g_usleep(100);
> >  if (n == 0) {
> >  qemu_event_reset(_call_ready_event);
> >  n = atomic_read(_call_count); @@ -254,24 +254,20
> > @@ static void *call_rcu_thread(void *opaque)
> >
> >  atomic_sub(_call_count, n);
> >  synchronize_rcu();
> > -qemu_mutex_lock_iothread();
> >  while (n > 0) {
> >  node = try_dequeue();
> >  while (!node) {
> > -qemu_mutex_unlock_iothread();
> >  qemu_event_reset(_call_ready_event);
> >  node = try_dequeue();
> >  if (!node) {
> >  qemu_event_wait(_call_ready_event);
> >  node = try_dequeue();
> >  }
> > -qemu_mutex_lock_iothread();
> >  }
> >
> >  n--;
> >  node->func(node);
> >  }
> > -qemu_mutex_unlock_iothread();
> 
> This is wrong.  RCU callbacks currently need the "big QEMU lock", because
> they can free arbitrary QOM objects (including MemoryRegions).

Using "big QEMU lock" in RCU callbacks is a little bit heavy, we'd like to 
remove it.

We noticed an issue related to MemoryRegion popping up after we removed the 
global lock.
The root cause is in MemoryRegion destruction memory_region_transaction_commit 
is called, which may cause issue.
Freeing MemoryRegion seems not cause any issue.
Patch2 in this series fixes this issue,
We found out only subpage MemoryRegion is freed in RCU callbacks, and 
memory_region_transaction_commit
is not needed for subpage MemoryRegion because it doesn't have sub regions.

Ideally, freeing QOM object should not require a global lock. 
If you see any other QOM requiring a global lock, please let us know, we are 
willing to fix it.


Thanks,
Anthony


> 
> Paolo
> 
> >  }
> >  abort();
> >  }
> >


Re: [Qemu-devel] [PATCH v1 1/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-10 Thread Paolo Bonzini


On 10/03/2017 16:14, Yang Zhong wrote:
> There are a lot of memory allocation during the qemu bootup, which are
> freed later by RCU thread,which means the heap size becomes biger and
> biger when allocation happens, but the heap may not be shrinked even
> after release happens,because some memory blocks in top of heap are still
> being used.Decreasing the sleep and removing qemu_mutex_unlock_iothread()
> lock, which make call_rcu_thread()thread response the free memory in time.
> This patch will reduce heap Rss around 10M.
> 
> This patch is from Anthony xu .
> 
> Signed-off-by: Yang Zhong 
> ---
>  util/rcu.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/util/rcu.c b/util/rcu.c
> index 9adc5e4..c5c373c 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -167,7 +167,7 @@ void synchronize_rcu(void)
>  }
>  
>  
> -#define RCU_CALL_MIN_SIZE30
> +#define RCU_CALL_MIN_SIZE5
>  
>  /* Multi-producer, single-consumer queue based on urcu/static/wfqueue.h
>   * from liburcu.  Note that head is only used by the consumer.
> @@ -241,7 +241,7 @@ static void *call_rcu_thread(void *opaque)
>   * added before synchronize_rcu() starts.
>   */
>  while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
> -g_usleep(1);
> +g_usleep(100);
>  if (n == 0) {
>  qemu_event_reset(_call_ready_event);
>  n = atomic_read(_call_count);
> @@ -254,24 +254,20 @@ static void *call_rcu_thread(void *opaque)
>  
>  atomic_sub(_call_count, n);
>  synchronize_rcu();
> -qemu_mutex_lock_iothread();
>  while (n > 0) {
>  node = try_dequeue();
>  while (!node) {
> -qemu_mutex_unlock_iothread();
>  qemu_event_reset(_call_ready_event);
>  node = try_dequeue();
>  if (!node) {
>  qemu_event_wait(_call_ready_event);
>  node = try_dequeue();
>  }
> -qemu_mutex_lock_iothread();
>  }
>  
>  n--;
>  node->func(node);
>  }
> -qemu_mutex_unlock_iothread();

This is wrong.  RCU callbacks currently need the "big QEMU lock",
because they can free arbitrary QOM objects (including MemoryRegions).

Paolo

>  }
>  abort();
>  }
>