Re: [PATCH RFC 9/9] KVM: Dirty ring support

Peter Xu Wed, 25 Mar 2020 14:33:17 -0700

On Wed, Mar 25, 2020 at 08:41:44PM +0000, Dr. David Alan Gilbert wrote:

[...]


> > +enum KVMReaperState {
> > +    KVM_REAPER_NONE = 0,
> > +    /* The reaper is sleeping */
> > +    KVM_REAPER_WAIT,
> > +    /* The reaper is reaping for dirty pages */
> > +    KVM_REAPER_REAPING,
> > +};
> 
> That probably needs to be KVMDirtyRingReaperState
> given there are many things that could be reaped.

Sure.

> 
> > +/*
> > + * KVM reaper instance, responsible for collecting the KVM dirty bits
> > + * via the dirty ring.
> > + */
> > +struct KVMDirtyRingReaper {
> > +    /* The reaper thread */
> > +    QemuThread reaper_thr;
> > +    /*
> > +     * Telling the reaper thread to wakeup.  This should be used as a
> > +     * generic interface to kick the reaper thread, like, in vcpu
> > +     * threads where it gets a exit due to ring full.
> > +     */
> > +    EventNotifier reaper_event;
> 
> I think I'd just used a simple semaphore for this type of thing.

I'm actually uncertain on which is cheaper...

At the meantime, I wanted to poll two handles at the same time below
(in kvm_dirty_ring_reaper_thread).  I don't know how to do that with
semaphore.  Could it?

[...]

> > @@ -412,6 +460,18 @@ int kvm_init_vcpu(CPUState *cpu)
> >              (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
> >      }
> >  
> > +    if (s->kvm_dirty_gfn_count) {
> > +        cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_size,
> > +                                   PROT_READ | PROT_WRITE, MAP_SHARED,
> 
> Is the MAP_SHARED required?

Yes it's required.  It's the same when we map the per-vcpu kvm_run.

If we use MAP_PRIVATE, it'll be in a COW fashion - when the userspace
writes to the dirty gfns the 1st time, it'll copy the current dirty
ring page in the kernel and from now on QEMU will never be able to see
what the kernel writes to the dirty gfn pages.  MAP_SHARED means the
userspace and the kernel shares exactly the same page(s).

> 
> > +                                   cpu->kvm_fd,
> > +                                   PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
> > +        if (cpu->kvm_dirty_gfns == MAP_FAILED) {
> > +            ret = -errno;
> > +            DPRINTF("mmap'ing vcpu dirty gfns failed\n");
> 
> Include errno?

Will do.

[...]

> > +static uint64_t kvm_dirty_ring_reap(KVMState *s)
> > +{
> > +    KVMMemoryListener *kml;
> > +    int ret, i, locked_count = s->nr_as;
> > +    CPUState *cpu;
> > +    uint64_t total = 0;
> > +
> > +    /*
> > +     * We need to lock all kvm slots for all address spaces here,
> > +     * because:
> > +     *
> > +     * (1) We need to mark dirty for dirty bitmaps in multiple slots
> > +     *     and for tons of pages, so it's better to take the lock here
> > +     *     once rather than once per page.  And more importantly,
> > +     *
> > +     * (2) We must _NOT_ publish dirty bits to the other threads
> > +     *     (e.g., the migration thread) via the kvm memory slot dirty
> > +     *     bitmaps before correctly re-protect those dirtied pages.
> > +     *     Otherwise we can have potential risk of data corruption if
> > +     *     the page data is read in the other thread before we do
> > +     *     reset below.
> > +     */
> > +    for (i = 0; i < s->nr_as; i++) {
> > +        kml = s->as[i].ml;
> > +        if (!kml) {
> > +            /*
> > +             * This is tricky - we grow s->as[] dynamically now.  Take
> > +             * care of that case.  We also assumed the as[] will fill
> > +             * one by one starting from zero.  Without this, we race
> > +             * with register_smram_listener.
> > +             *
> > +             * TODO: make all these prettier...
> > +             */
> > +            locked_count = i;
> > +            break;
> > +        }
> > +        kvm_slots_lock(kml);
> > +    }
> > +
> > +    CPU_FOREACH(cpu) {
> > +        total += kvm_dirty_ring_reap_one(s, cpu);
> > +    }
> > +
> > +    if (total) {
> > +        ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
> > +        assert(ret == total);
> > +    }
> > +
> > +    /* Unlock whatever locks that we have locked */
> > +    for (i = 0; i < locked_count; i++) {
> > +        kvm_slots_unlock(s->as[i].ml);
> > +    }
> > +
> > +    CPU_FOREACH(cpu) {
> > +        if (cpu->kvm_dirty_ring_full) {
> > +            qemu_sem_post(&cpu->kvm_dirty_ring_avail);
> > +        }
> 
> Why do you need to wait until here - couldn't you release
> each vcpu after you've reaped it?

We probably still need to wait.  Even after we reaped all the dirty
bits we only marked the pages as "collected", the buffers will only be
available again until the kernel re-protect those pages (when the
above KVM_RESET_DIRTY_RINGS completes).  Before that, continuing the
vcpu could let it exit again with the same ring full event.

[...]

> > +static int kvm_dirty_ring_reaper_init(KVMState *s)
> > +{
> > +    struct KVMDirtyRingReaper *r = &s->reaper;
> > +    int ret;
> > +
> > +    ret = event_notifier_init(&r->reaper_event, false);
> > +    assert(ret == 0);
> > +    ret = event_notifier_init(&r->reaper_flush_event, false);
> > +    assert(ret == 0);
> > +    qemu_sem_init(&r->reaper_flush_sem, 0);
> > +
> > +    qemu_thread_create(&r->reaper_thr, "kvm-reaper",
> > +                       kvm_dirty_ring_reaper_thread,
> > +                       s, QEMU_THREAD_JOINABLE);
> 
> That's marked as joinable - does it ever get joined?
> If the reaper thread does exit on error (I can only see the poll
> failure?) - what happens elsewhere - will things still try and kick it?

The reaper thread is not designed to fail for sure. If it fails, it'll
exit without being joined, but otherwise I'll just abort() directly in
the thread which seems to be not anything better...

Regarding to "why not join() it": I think it's simply because we don't
have corresponding operation to AccelClass.init_machine() and
kvm_init(). :-) From that POV, I think I'll still use JOINABLE with
the hope that someday the destroy_machine() hook will be ready and
we'll be able to join it.

Thanks,

-- 
Peter Xu

Re: [PATCH RFC 9/9] KVM: Dirty ring support

Reply via email to