Peter Xu <pet...@redhat.com> wrote:
> On Thu, Jun 01, 2023 at 04:55:25PM +0100, Daniel P. Berrangé wrote:
>> On Thu, Jun 01, 2023 at 11:53:17AM -0400, Peter Xu wrote:
>> > On Thu, Jun 01, 2023 at 04:39:48PM +0100, Daniel P. Berrangé wrote:
>> > > On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
>> > > > Thanks for looking into this.. definitely worthwhile.
>> > > > 
>> > > > On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
>> > > > > There are 27 pre-copy live migration scenarios being tested. In all 
>> > > > > of
>> > > > > these we force non-convergance and run for one iteration, then let it
>> > > > > converge and wait for completion during the second (or following)
>> > > > > iterations. At 3 mbps bandwidth limit the first iteration takes a 
>> > > > > very
>> > > > > long time (~30 seconds).
>> > > > > 
>> > > > > While it is important to test the migration passes and convergance
>> > > > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
>> > > > > TLS migration scenarios in particular are merely exercising different
>> > > > > code paths during connection establishment.
>> > > > > 
>> > > > > To optimize time taken, switch most of the test scenarios to run
>> > > > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
>> > > > > a massive speed up for most of the test scenarios.
>> > > > > 
>> > > > > For test coverage the following scenarios are unchanged
>> > > > 
>> > > > Curious how are below chosen?  I assume..
>> > > 
>> > > Chosen based on whether they exercise code paths that are unique
>> > > and interesting during the RAM transfer phase.
>> > > 
>> > > Essentially the goal is that if we have N% code coverage before this
>> > > patch, then we should still have the same N% code coverage after this
>> > > patch.
>> > > 
>> > > The TLS tests exercise code paths that are unique during the migration
>> > > establishment phase. Once establishd they don't exercise anything
>> > > "interesting" during RAM transfer phase. Thus we don't loose code 
>> > > coverage
>> > > by runing TLS tests non-live.
>> > > 
>> > > > 
>> > > > > 
>> > > > >  * Precopy with UNIX sockets
>> > > > 
>> > > > this one verifies dirty log.
>> > > > 
>> > > > >  * Precopy with UNIX sockets and dirty ring tracking
>> > > > 
>> > > > ... dirty ring...
>> > > > 
>> > > > >  * Precopy with XBZRLE
>> > > > 
>> > > > ... xbzrle I think needs a diff on old/new, makes sense.
>> > > > 
>> > > > >  * Precopy with UNIX compress
>> > > > >  * Precopy with UNIX compress (nowait)
>> > > > >  * Precopy with multifd
>> > > > 
>> > > > What about the rest three?  Especially for two compression tests.
>> > > 
>> > > The compress thread logic is unique/interesting during RAM transfer
>> > > so benefits from running live. The wait vs non-wait scenario tests
>> > > a distinct codepath/logic.
>> > 
>> > I assume you mean e.g. when compressing with guest page being modified and
>> > we should survive that rather than crashing the compressor?
>> 
>> No, i mean the compression code has a significant behaviour difference
>> between its two tests, because they toggle:
>> 
>>  @compress-wait-thread: Controls behavior when all compression
>>      threads are currently busy.  If true (default), wait for a free
>>      compression thread to become available; otherwise, send the page
>>      uncompressed.  (Since 3.1)
>> 
>> so we need to exercise the code path that falls back to sending
>> uncompressed, as well as the code path that waits for free threads.
>
> But then the question is why live is needed?
>
> IIUC whether the wait thing triggers have nothing directly related to VM is
> live or not, but whether all compress thread busy.  IOW, IIUC all compress
> paths will be tested even if non-live as long as we feed enough pages to
> the compressor threads.

It is even wrong.

We didn't fix this for compression:

commit 007e179ef0e97eafda4c9ff2a9d665a1947c7c6d
Author: Ilya Leoshkevich <i...@linux.ibm.com>
Date:   Tue Jul 5 22:35:59 2022 +0200

    multifd: Copy pages before compressing them with zlib
    
    zlib_send_prepare() compresses pages of a running VM. zlib does not
    make any thread-safety guarantees with respect to changing deflate()
    input concurrently with deflate() [1].


Not that anyone is going to use any accelerator to run zlib when we are
compression just 4k.

Intel AT engine had to also move to 64 pages at a time to make it a
difference.  As said, I can't think of a single scenary where
compression is a good option.

Later, Juan.


Reply via email to