On Thu, Sep 16, 2021 at 02:30:39PM +0200, Jan Beulich wrote:
> On 16.09.2021 13:10, Dmitry Isaikin wrote:
> > From: Dmitry Isaykin <[email protected]>
> > 
> > This significantly speeds up concurrent destruction of multiple domains on 
> > x86.
> 
> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
> improve locking during domain destruction"). There it was found to
> actually improve things; now you're claiming the opposite. It'll
> take more justification, clearly identifying that you actually
> revert an earlier change, and an explanation why then you don't
> revert that change altogether. You will want to specifically also
> consider the cleaning up of huge VMs, where use of the (global)
> domctl lock may hamper progress of other (parallel) operations on
> the system.
> 
> > I identify the place taking the most time:
> > 
> >     do_domctl(case XEN_DOMCTL_destroydomain)
> >       -> domain_kill()
> >            -> domain_relinquish_resources()
> >                 -> relinquish_memory(d, &d->page_list, PGT_l4_page_table)
> > 
> > My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14.
> > 
> > I use this command for test:
> > 
> >     for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done
> > 
> > Without holding the lock all calls of `relinquish_memory(d, &d->page_list, 
> > PGT_l4_page_table)`
> > took on my setup (for HVM with 2GB of memory) about 3 seconds for each 
> > destroying domain.
> > 
> > With holding the lock it took only 100 ms.
> 
> I'm further afraid I can't make the connection. Do you have an
> explanation for why there would be such a massive difference?
> What would prevent progress of relinquish_memory() with the
> domctl lock not held?

I would recommend to Dmitry to use lock profiling with and without
this change applied and try to spot which lock is causing the
contention as a starting point. That should be fairly easy and could
share some light.

Regards, Roger.

Reply via email to