Avi Kivity wrote:
On 03/16/2010 03:17 PM, Yoshiaki Tamura wrote:
Avi Kivity wrote:
On 03/16/2010 12:53 PM, Yoshiaki Tamura wrote:
Modifies wrapper functions for byte-based phys_ram_dirty bitmap to
bit-based phys_ram_dirty bitmap, and adds more wrapper functions to
prevent
direct access to the phys_ram_dirty bitmap.

+
+static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
+{
+ unsigned long mask;
+ int index = (addr>> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+ int offset = (addr>> TARGET_PAGE_BITS)& (HOST_LONG_BITS - 1);
+ int ret = 0;
+
+ mask = 1UL<< offset;
+ if (phys_ram_vga_dirty[index]& mask)
+ ret |= VGA_DIRTY_FLAG;
+ if (phys_ram_code_dirty[index]& mask)
+ ret |= CODE_DIRTY_FLAG;
+ if (phys_ram_migration_dirty[index]& mask)
+ ret |= MIGRATION_DIRTY_FLAG;
+
+ return ret;
}

static inline int cpu_physical_memory_get_dirty(ram_addr_t addr,
int dirty_flags)
{
- return phys_ram_dirty[addr>> TARGET_PAGE_BITS]& dirty_flags;
+ return cpu_physical_memory_get_dirty_flags(addr)& dirty_flags;
}

This turns one cacheline access into three. If the dirty bitmaps were in
an array, you could do

return dirty_bitmaps[dirty_index][addr >> (TARGET_PAGE_BITS +
BITS_IN_LONG)] & mask;

with one cacheline access.

If I'm understanding the existing code correctly,
int dirty_flags can be combined, like VGA + MIGRATION.
If we only have to worry about a single dirty flag, I agree with your
idea.

 From a quick grep it seems flags are not combined, except for something
strange with CODE_DIRTY_FLAG:

Thanks for checking out.
But the CODE_DIRTY_FLAG makes me really nervous...

static void notdirty_mem_writel(void *opaque, target_phys_addr_t
ram_addr,
uint32_t val)
{
int dirty_flags;
dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
if (!(dirty_flags & CODE_DIRTY_FLAG)) {
#if !defined(CONFIG_USER_ONLY)
tb_invalidate_phys_page_fast(ram_addr, 4);
dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
#endif
}
stl_p(qemu_get_ram_ptr(ram_addr), val);
dirty_flags |= (0xff & ~CODE_DIRTY_FLAG);
phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags;
/* we remove the notdirty callback only if the code has been
flushed */
if (dirty_flags == 0xff)
tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr);
}

I can't say I understand what it does.

Me neither.
This the reason I had to take naive approach...

On the other hand, qemu seems to require getting combined dirty flags.
If we introduce dirty bitmaps for each type, we need to access each
bitmap to get combined flags. I wasn't sure how to make this more
efficient...

static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
{
- phys_ram_dirty[addr>> TARGET_PAGE_BITS] = 0xff;
+ unsigned long mask;
+ int index = (addr>> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+ int offset = (addr>> TARGET_PAGE_BITS)& (HOST_LONG_BITS - 1);
+
+ mask = 1UL<< offset;
+ phys_ram_vga_dirty[index] |= mask;
+ phys_ram_code_dirty[index] |= mask;
+ phys_ram_migration_dirty[index] |= mask;
+}

This is also three cacheline accesses. I think we should have a master
bitmap which is updated by set_dirty(), and which is or'ed into the
other bitmaps when they are accessed. At least the vga and migration
bitmaps are only read periodically, not randomly, so this would be very
fast. In a way, this is similar to how the qemu bitmap is updated from
the kvm bitmap today.

Sounds good to me.
So we're going to introduce 4 (VGA, CODE, MIGRATION, master) bit-based
bitmaps in total.


Yeah, except CODE doesn't behave like the others. Would be best to
understand what it's requirements are before making the change. Maybe
CODE will need separate handling (so master will only feed VGA and
MIGRATION).

After implementing this patch set, I thought separating the wrapper functions for each dirty flag type might be an option. Unifying everything makes inefficient here. But anyway, do you know somebody who has a strong insight on this CODE_DIRTY_FLAG?



Reply via email to