I am working in a project in which we try to switch domain's underlying
machine memory(MFNs) for another "chunk" of the same size while the VM is
running. This can be useful for example when a domain running a memory
intensive load experiences performance penalties(e.g: lot of cache misses);
by switching domain's memory for a "chunk" of memory that is
allocated(assuming the allocator is able to take into account the relation
between memory address and the cache lines) such that the number of misses
The implementation is mostly done, however we have some issues that we're
stuck on, so I'm asking for some help.
So, the implementation works in the following way:
S1. Pause domain
S2. Allocate new memory for domain
S3. Setup the 'new P2M' table for the newly allocated memory using 'old
P2M' table (based on a 1-to-1 mapping of oldP2M[i] and newP2M[i]).
For this purpose the pages a domain owns are split into 3 types: PT, WR and
P2M - which in fact ar also WR pages but they store the P2M(using the
pfn_to_mfn_frame_list_list field of the shared data between Xen and
domains). Accordingly, WR pages need only to be copied to the corresponding
new page while PT and P2M pages require for each entry/element in the page
to find their mapping in the new P2M and write it down at the same entry
location on the PT or P2M page in the "new" memory.
S4. For each page copy old page's metadata information(like count and type
info) to the matching new page.
S5. Update the fields of domain's data structures pointing to MFNs (i.e.:
S6. Release domain's old memory by using relinquish_memory() in a loop, in
a similar manner like in relinquish_resources() but for memory(L4, L3, L2)
S7. Update M2P table to reflect the changes
S8. Assign the new memory pages to domain.
S9. Unpause the domain.
The problems we face:
P1. It seems that translation of PTs and P2Ms works well - we wrote some
test scripts dumping and comparing domain's memory before and after the
memory switch(in order for this test to work we pause domain from console
at the beginning of the test - the unpause_doimain function called in
implementation should have no effect). However, for some RAW pages, ~ 1% of
the total number of pages of a 64MB domain, we can see some differences in
The question is, does somebody else touch a domain's memory once it is
It is a 1VCPU domain.
P2. Some of the old pages(~ 2-3%) doesn't seem to be released. It looks
that this happens due to count/type info constraints, with the following
d0v1 Error pfn 42b625: rd = 1, od = 32756 caf = 1c00000000000000
I guess it is in response to a page_get on a page that does not belong to a
domain anymore, but that shouldn't normally happen ... or am I wrong ?
P3. Trying to connect to a domain's console after it has been unpaused
doesn't work. We also run ping, but machine is not reachable. How can we
debug this issue ? Are there any changes to be done in the PV OS(Linux)
running on the domain ? Our assumptions were that as long as the domain has
direct memory access and uses P2M and M2P tables, the changes will be
visible to the OS.
Thank you in advance.
Xen-devel mailing list