On 27/5/25 19:06, Chenyi Qiang wrote:
On 5/27/2025 3:35 PM, Alexey Kardashevskiy wrote:
On 20/5/25 20:28, Chenyi Qiang wrote:
A new state_change() helper is introduced for RamBlockAttribute
to efficiently notify all registered RamDiscardListeners, including
VFIO listeners, about memory conversion events in guest_memfd. The VFIO
listener can dynamically DMA map/unmap shared pages based on conversion
types:
- For conversions from shared to private, the VFIO system ensures the
discarding of shared mapping from the IOMMU.
- For conversions from private to shared, it triggers the population of
the shared mapping into the IOMMU.
Currently, memory conversion failures cause QEMU to quit instead of
resuming the guest or retrying the operation. It would be a future work
to add more error handling or rollback mechanisms once conversion
failures are allowed. For example, in-place conversion of guest_memfd
could retry the unmap operation during the conversion from shared to
private. However, for now, keep the complex error handling out of the
picture as it is not required:
- If a conversion request is made for a page already in the desired
state, the helper simply returns success.
- For requests involving a range partially in the desired state, there
is no such scenario in practice at present. Simply return error.
- If a conversion request is declined by other systems, such as a
failure from VFIO during notify_to_populated(), the failure is
returned directly. As for notify_to_discard(), VFIO cannot fail
unmap/unpin, so no error is returned.
Note that the bitmap status is updated before callbacks, allowing
listeners to handle memory based on the latest status.
Signed-off-by: Chenyi Qiang <chenyi.qi...@intel.com>
---
Change in v5:
- Move the state_change() back to a helper instead of a callback of
the class since there's no child for the RamBlockAttributeClass.
- Remove the error handling and move them to an individual patch for
simple management.
Changes in v4:
- Add the state_change() callback in PrivateSharedManagerClass
instead of the RamBlockAttribute.
Changes in v3:
- Move the bitmap update before notifier callbacks.
- Call the notifier callbacks directly in notify_discard/populate()
with the expectation that the request memory range is in the
desired attribute.
- For the case that only partial range in the desire status, handle
the range with block_size granularity for ease of rollback
(https://lore.kernel.org/qemu-devel/812768d7-a02d-4b29-95f3-
fb7a125cf...@redhat.com/)
Changes in v2:
- Do the alignment changes due to the rename to
MemoryAttributeManager
- Move the state_change() helper definition in this patch.
---
include/system/ramblock.h | 2 +
system/ram-block-attribute.c | 134 +++++++++++++++++++++++++++++++++++
2 files changed, 136 insertions(+)
diff --git a/include/system/ramblock.h b/include/system/ramblock.h
index 09255e8495..270dffb2f3 100644
--- a/include/system/ramblock.h
+++ b/include/system/ramblock.h
@@ -108,6 +108,8 @@ struct RamBlockAttribute {
QLIST_HEAD(, RamDiscardListener) rdl_list;
};
+int ram_block_attribute_state_change(RamBlockAttribute *attr,
uint64_t offset,
+ uint64_t size, bool to_private);
Not sure about the "to_private" name. I'd think private/shared is
something KVM operates with and here, in RamBlock, it is discarded/
populated.
Make sense. To keep consistent, I will rename it as to_discard.
RamBlockAttribute *ram_block_attribute_create(MemoryRegion *mr);
void ram_block_attribute_destroy(RamBlockAttribute *attr);
diff --git a/system/ram-block-attribute.c b/system/ram-block-
attribute.c
index 8d4a24738c..f12dd4b881 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -253,6 +253,140 @@ ram_block_attribute_rdm_replay_discard(const
RamDiscardManager *rdm,
ram_block_attribute_rdm_replay_cb);
}
+static bool ram_block_attribute_is_valid_range(RamBlockAttribute
*attr,
+ uint64_t offset,
uint64_t size)
+{
+ MemoryRegion *mr = attr->mr;
+
+ g_assert(mr);
+
+ uint64_t region_size = memory_region_size(mr);
+ int block_size = ram_block_attribute_get_block_size(attr);
It is size_t, not int.
Fixed this and all below. Thanks!
+
+ if (!QEMU_IS_ALIGNED(offset, block_size)) {
Does not the @size have to be aligned too?
Yes. Actually, the "start" and "size" are already do the alignment check
in kvm_convert_memory(). I doubt if we still need it here.
Sure. My point is either check them both or neither.
Anyway, in
case of other users in the future, I'll add it.
Ok.
+ return false;
+ }
+ if (offset + size < offset || !size) {
This could be just (offset + size <= offset).
(these overflow checks always blow up my little brain)
Modified.
+ return false;
+ }
+ if (offset >= region_size || offset + size > region_size) {
Just (offset + size > region_size) should do.
Ditto.
+ return false;
+ }
+ return true;
+}
+
+static void ram_block_attribute_notify_to_discard(RamBlockAttribute
*attr,
+ uint64_t offset,
+ uint64_t size)
+{
+ RamDiscardListener *rdl;
+
+ QLIST_FOREACH(rdl, &attr->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl->section;
+
+ if (!memory_region_section_intersect_range(&tmp, offset,
size)) {
+ continue;
+ }
+ rdl->notify_discard(rdl, &tmp);
+ }
+}
+
+static int
+ram_block_attribute_notify_to_populated(RamBlockAttribute *attr,
+ uint64_t offset, uint64_t size)
+{
+ RamDiscardListener *rdl;
+ int ret = 0;
+
+ QLIST_FOREACH(rdl, &attr->rdl_list, next) {
+ MemoryRegionSection tmp = *rdl->section;
+
+ if (!memory_region_section_intersect_range(&tmp, offset,
size)) {
+ continue;
+ }
+ ret = rdl->notify_populate(rdl, &tmp);
+ if (ret) {
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static bool ram_block_attribute_is_range_populated(RamBlockAttribute
*attr,
+ uint64_t offset,
+ uint64_t size)
+{
+ const int block_size = ram_block_attribute_get_block_size(attr);
size_t.
+ const unsigned long first_bit = offset / block_size;
+ const unsigned long last_bit = first_bit + (size / block_size) - 1;
+ unsigned long found_bit;
+
+ /* We fake a shorter bitmap to avoid searching too far. */
What is "fake" about it? We truthfully check here that every bit in
[first_bit, last_bit] is set.
Aha, you ask this question again :)
(https://lore.kernel.org/qemu-devel/7131b4a3-a836-4efd-bcfc-982a0112e...@intel.com/)
ah sorry :)
If it is really confusing, let me remove this comment in next version.
yes please. Quite obvious if the helper takes the size, then this is what the
caller wants to search within.
+ found_bit = find_next_zero_bit(attr->bitmap, last_bit + 1,
+ first_bit);
+ return found_bit > last_bit;
+}
+
+static bool
+ram_block_attribute_is_range_discard(RamBlockAttribute *attr,
+ uint64_t offset, uint64_t size)
+{
+ const int block_size = ram_block_attribute_get_block_size(attr);
size_t.
+ const unsigned long first_bit = offset / block_size;
+ const unsigned long last_bit = first_bit + (size / block_size) - 1;
+ unsigned long found_bit;
+
+ /* We fake a shorter bitmap to avoid searching too far. */
+ found_bit = find_next_bit(attr->bitmap, last_bit + 1, first_bit);
+ return found_bit > last_bit;
+}
+
+int ram_block_attribute_state_change(RamBlockAttribute *attr,
uint64_t offset,
+ uint64_t size, bool to_private)
+{
+ const int block_size = ram_block_attribute_get_block_size(attr);
size_t.
+ const unsigned long first_bit = offset / block_size;
+ const unsigned long nbits = size / block_size;
+ int ret = 0;
+
+ if (!ram_block_attribute_is_valid_range(attr, offset, size)) {
+ error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
+ __func__, offset, size);
+ return -1;
May be -EINVAL?
Modified.
+ }
+
+ /* Already discard/populated */
+ if ((ram_block_attribute_is_range_discard(attr, offset, size) &&
+ to_private) ||
+ (ram_block_attribute_is_range_populated(attr, offset, size) &&
+ !to_private)) {
A tracepoint would be useful here imho.
[...]
+ return 0;
+ }
+
+ /* Unexpected mixture */
+ if ((!ram_block_attribute_is_range_populated(attr, offset, size) &&
+ to_private) ||
+ (!ram_block_attribute_is_range_discard(attr, offset, size) &&
+ !to_private)) {
+ error_report("%s, the range is not all in the desired state: "
+ "(offset 0x%lx, size 0x%lx), %s",
+ __func__, offset, size,
+ to_private ? "private" : "shared");
+ return -1;
-EBUSY?
Maybe also -EINVAL since it is due to the invalid provided mixture
range?
May be, I just prefer them different - might save some time on gdb-ing or
adding printf's. Thanks,
But Anyway, according to the discussion in patch #10, I'll add
the support for this mixture scenario. No need to return the error.
Yeah, chunk from 10/10 should be here really.
+ }
+
+ if (to_private) {
+ bitmap_clear(attr->bitmap, first_bit, nbits);
+ ram_block_attribute_notify_to_discard(attr, offset, size);
+ } else {
+ bitmap_set(attr->bitmap, first_bit, nbits);
+ ret = ram_block_attribute_notify_to_populated(attr, offset,
size);
+ }
and a successful tracepoint here may be?
Good suggestion! I'll add tracepoint in next version.
+
+ return ret;
+}
+
RamBlockAttribute *ram_block_attribute_create(MemoryRegion *mr)
{
uint64_t bitmap_size;
--
Alexey