Hi, ----- Original Message ----- > From: "Mark Kettenis" <mark.kette...@xs4all.nl> > To: "Stefan Fritsch" <stefan_frit...@genua.de> > Cc: tech@openbsd.org > Sent: Monday, August 13, 2012 2:20:50 PM > Subject: Re: [Patch] Virtio drivers for OpenBSD V5 > > > From: Stefan Fritsch <stefan_frit...@genua.de> > > Date: Mon, 13 Aug 2012 16:24:15 +0200 > > > > Hi, > > > > here is the next iteration of my patch. > > > > Changes from V4 include: > > > > - virtio: support RING_EVENT_IDX > > - virtio: use lfence/sfence because __sync_synchronize() is > > broken > > on gcc < 4.4 > > Broken how? > > > > +/* > > + * The mutexes are not necessary with OpenBSD's big kernel lock. > > + */ > > +#define mutex_enter(x) > > +#define mutex_destroy(x) > > +#define mutex_exit(x) > > +#define mutex_init(x,y,z) > > Then remove them completely. We don't need more simple_lock()-like > pollution in the tree. > > > +/* > > + * XXX: This is not optimal: > > + * XXX: We need remory barriers are also with UP kernels, > > What are you trying to say here?
I think what he's getting it is that even in the UP kernel, memory ordering for some operations must be enforced. > > > + * XXX: because the host system may be SMP. However, OpenBSD does > > not seem to > > + * XXX: provide suitable functions. > > + * XXX: On x86/amd64, membar_producer() could probably be a no-op > > because > > + * XXX: writes are always ordered. > > + * XXX: Also, gcc does not have __sync_synchronize() for all > > architectures. > > + * XXX: Also, for gcc < 4.4, __sync_synchronize() is broken > > + */ > > + > > +#if defined(__i386__) || defined(__amd64__) > > +#include <machine/cpufunc.h> > > +#define membar_consumer() lfence() > > +#define membar_producer() sfence() > > +#else > > +#define membar_consumer() __sync_synchronize() > > +#define membar_producer() __sync_synchronize() > > +#endif > > Introducing these macros here would be a mistake. If we really need > them, they should be added to appropriate MD header files and > implemented for all supported architectures. > > That said, it is not obvious to me what problem is being solved by > adding these memory barriers to the code. The use of barriers > suggests there is some lock-less algorithm in the code, but I failed > to spot it. The other side of the lockless queue is happening from the host, i.e. in Qemu. The virtqueue (aka VirtIO ring) is in shared memory between guest and host. A virtqueue descriptor (denoting, say, an ethernet frame for transmit) must be fully set up before the corresponding counter is incremented to inform the host of the newly available frame.