On 06/02, Linus Torvalds wrote:
>
> On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov <o...@redhat.com> wrote:
> >
> > As I said from the very beginning, this code is fine on x86 because
> > atomic ops are fully serialised on x86.
>
> Yes. Other architectures require __smp_mb__{before,after}_atomic for
> the bit setting ops to actually be memory barriers.
>
> We *should* probably have acquire/release versions of the bit test/set
> helpers, but we don't, so they end up being full memory barriers with
> those things. Which isn't optimal, but I doubt it matters on most
> architectures.
>
> So maybe we'll some day have a "test_bit_acquire()" and a
> "set_bit_release()" etc.

In this particular case we need clear_bit_release() and iiuc it is
already here, just it is named clear_bit_unlock().

So do you agree that vhost_worker() needs smp_mb__before_atomic()
before clear_bit() or just clear_bit_unlock() to avoid the race with
vhost_work_queue() ?

Let me provide a simplified example:

        struct item {
                struct llist_node       llist;
                unsigned long           flags;
        };

        struct llist_head HEAD = {};    // global

        void queue(struct item *item)
        {
                // ensure this item was already flushed
                if (!test_and_set_bit(0, &item->flags))
                        llist_add(item->llist, &HEAD);

        }

        void flush(void)
        {
                struct llist_node *head = llist_del_all(&HEAD);
                struct item *item, *next;

                llist_for_each_entry_safe(item, next, head, llist)
                        clear_bit(0, &item->flags);
        }

I think this code is buggy in that flush() can race with queue(), the same
way as vhost_worker() and vhost_work_queue().

Once flush() clears bit 0, queue() can come on another CPU and re-queue
this item and change item->llist.next. We need a barrier before clear_bit()
to ensure that next = llist_entry(item->next) in llist_for_each_entry_safe()
completes before the result of clear_bit() is visible to queue().

And, I do not think we can rely on control dependency because... because
I fail to see the load-store control dependency in this code,
llist_for_each_entry_safe() loads item->llist.next but doesn't check the
result until the next iteration.

No?

Oleg.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to