Hi this is the v10 version of the multifd patches,
Lots of changes from previous versions:
a - everything is sent now through the multifd channels, nothing is sent
through main channel
b - locking is band new, I was getting into a hole with the previous approach,
right now, there is a single way to
do locking (both source and destination)
main thread : sets a ->sync variable for each thread and wakeps it
multifd threads: clean the variable and signal (sem) back to main thread
using this for either:
- all threads have started
- we need to synchronize after each round through memory
- all threads have finished
c - I have to use a qio watcher for a thread to wait for ready data to read
d - lots of cleanups
e - to make things easier, I have included the missing tests stuff on
this round of patches, because they build on top of them
f - lots of traces, it is now much easier to follow what is happening
Now, why it is an RFC:
- in the last patch, there is still race between the whatcher, the
->quit of the threads and the last synchronization. Techinically they
are done in oder, but in practice, they are hanging sometimes.
- I *know* I can optimize the synchronization of the threads sending
the "we start a new round" through the multifd channels, have to add a flag
here.
- Not having a thread on the incoming side is a mess, I can't block waiting
for things to happen :-(
- When doing the synchronization, I need to optimize the sending of the "not
finished packet" of pages, working on that.
please, take a look and review.
Thanks, Juan.
[v9]
This series is on top of my migration test series just sent, only reject should
be on the test system, though.
On v9 series for you:
- qobject_unref() as requested by dan
Yes he was right, I had a reference leak for _non_ multifd, I
*thought* he mean for multifd, and that took a while to understand
(and then find when/where).
- multifd page count: it is dropped for good
- uuid handling: we use the default qemu uuid of ...
- uuid handling: using and struct and sending the struct
* idea is to add a size field and add more parameter after that
* anyone has a good idea how to "ouptut" info
migrate_capabilities/parameters json into a string and how to read it back?
- changed how we test that all threads/channels are already created.
Should be more robust.
- Add tests multifd. Still not ported on top of migration-tests series sent
early
waiting for review on the ideas there.
- Rebase and remove al the integrated patches (back at 12)
Please, review.
Later, Juan.
[v8]
Things NOT done yet:
- drop x-multifd-page-count? We can use performance to set a default value
- paolo suggestion of not having a control channel
needs iyet more cleanups to be able to have more than one ramstate, trying it.
- still not performance done, but it has been very stable
On v8:
- use connect_async
- rename multifd-group to multifd-page-count (danp suggestion)
- rename multifd-threads to multifd-channels (danp suggestion)
- use new qio*channel functions
- Address rest of comments left
So, please review.
My idea will be to pull this changes and continue performance changes
for inside, basically everything is already reviewed.
Thanks, Juan.
On v7:
- tests fixed as danp wanted
- have to revert danp qio_*_all patches, as they break multifd, I have to
investigate why.
- error_abort is gone. After several tries about getting errors, I ended
having a single error
proceted by a lock and first error wins.
- Addressed basically all reviews (see on ToDo)
- Pointers to struct are done now
- fix lots of leaks
- lots of small fixes
[v6]
- Improve migration_ioc_porcess_incoming
- teach about G_SOURCE_REMOVE/CONTINUE
- Add test for migration_has_all_channels
- use DEFIN_PROP*
- change recv_state to use pointers to parameters
make easier to receive channels out of order
- use g_strdup_printf()
- improve count of threads to know when we have to finish
- report channel id's on errors
- Use last_page parameter for multifd_send_page() sooner
- Improve commets for address
- use g_new0() instead of g_malloc()
- create MULTIFD_CONTINUE instead of using UINT16_MAX
- clear memory used by group of pages
once there, pass everything to the global state variables instead of being
local to the function. This way it works if we cancel migration and start
a new one
- Really wait to create the migration_thread until all channels are created
- split initial_bytes setup to make clearer following patches.
- createRAM_SAVE_FLAG_MULTIFD_SYNC macro, to make clear what we are doing
- move setting of need_flush to inside bitmap_sync
- Lots of other small changes & reorderings
Please, comment.
[v5]
- tests from qio functions (a.k.a. make danp happy)
- 1st message from one channel to the other contains:
multifd
This would allow us to create more channels as we want them.
a.k.a. Making dave happy
- Waiting in reception