linux compat code vs netbsd kernel-internal negative error numbers

2018-12-26 Thread matthew green
hi folks.


my laptop has been crashing in i915 the last few days and we finally
tracked down the cause.  cv_wait_sig() can return ERESTART, which has
the value of -3.  when linux code returns errors it often does it
encoded in a pointer, and in the low negative range, so typically a
return value is 0, or negative errno.  this means that -ERESTART turns
into positive 3 and then eg, our PTR_ERR() macro asserts that the
errno is not less than zero.

so, the basic problem is that netbsd errnos are in the range -6 ..
95, linux has them all positive, and declares special handling for
negative values in many cases, and these aren't easy to reconcile.

there are two ways i can see to fix this.

- all errno interactions with linux style code bases need to have
  their values mapped to/from linux in all cases, and that the
  mapped values probably need to be unique.  it may be that only
  ERESTART needs handling (could only remap this, and assert the
  others aren't set.)  this means updating a lot of code, though
  the vast majority have a common comment.

- re-number these for netbsd.  i am running a kernel where the
  negative values have 255 addded to them.  (current ELAST is 96,
  so there's quite a lot of space left.)

the first one means not changing our old style for kernel-only
errors, but i'm really a fan of that anyway.  the second does
mean giving up a minor ability to check quickly (user errno should
be a small positive number, vs, range or similiar check.  i could
not find any of these checks anyway, so maybe not a real issue.)

i think i prefer the second because it is both simpler to deal with
means not having to constantly patch new code for the remap, but it
does seem disappointing to have to give up this "feature" so we can
have nice things.


any suggestions or preferences?  


.mrg.


threadpool(9) -- an API for scheduling jobs on shared pools of threads

2018-12-26 Thread Jason Thorpe
Hey folks ...

The other day I checked in a new kernel API - threadpool(9) - an API for 
scheduling jobs on shared pools of threads.  This new API was written by Taylor 
Campbell, but had been languishing as a WIP draft for some years now.  As it 
happens, I have a need for the (forthcoming, still being debugged) task(9) API, 
also written by Taylor, that's built on top of threadpool(9), so I decided to 
whip it into shape.

threadpool(9) basically makes it easy to create jobs that run in thread 
context, but that don't necessarily need to have a thread waiting around all 
the time to do work.  Kernel threads are created as-needed, and the threads 
will idle out and exit after a period of time if there is no work for them to 
do.  Thread pools are a shared system resource, and you can use unbound pools 
(that have no particular CPU affinity) or pools that are bound to a specific 
CPU (in the case of per-CPU pools, each CPU gets its own private pool for each 
priority).

The pools themselves are also created on-demand, only when requested by 
something else.  When requesting a reference to a pool, the caller specifies 
the priority that the threads should run at: PRI_NONE (the default timesharing 
priority) or up to MAXPRI_KERNEL_RT.

The threadpool(9) work abstraction is built around the concept of a "job", 
threadpool_job_t.  This is an opaque structure that the caller needs to 
allocate storage for.  A job can be scheduled on a pool from any context, 
including hard interrupt context up to and including IPL_VM.  The job will run 
until completion, and can take an arbitrarily long time, and sleep an 
arbitrarily long time; additional threads will be created in the pool for other 
jobs on-demand.  Note: there is no hard limit on the number of threads a pool 
will create.  Once scheduled, a job cannot be scheduled again until it has 
completed, at which point it needs to notify the system of this fact by calling 
threadpool_job_done().  Job cancellation is possible if the job has not yet 
run, but once a job is running, cancellation must wait for it to complete.  
Among other things, this provides a deterministic way to ensure that a job is 
not running.  More information on job lifecycle and cancellation semantics can 
be found in the man page.

Job functions provided by the caller are passed a pointer to the 
threadpool_job_t corresponding to the work they're doing.  It is expected that 
this threadpool_job_t is embedded in the caller's state structure, and this 
state can be recovered by using the "container-of" access pattern, e.g.:

struct my_job_state {
kmutex_t mutex;
int some_counter;
threadpool_job_t the_job;
};

threadpool_t *unbound_lopri_pool;

...

void
my_job_setup_routine(struct my_job_state *state)
{
...
error = threadpool_get(_lopri_threadpool, PRI_NONE);
...
threadpool_job_init(>the_job, my_job_function, >mutex);
...
}

void
my_interrupt_handler(struct my_job_state *state)
{
...
mutex_enter(>mutex);
...
threadpool_schedule_job(unbound_lopri_pool, >the_job);
...
mutex_exit(>mutex);
...
}

void
my_job_function(threadpool_job_t *job)
{
struct my_job_state *state =
container_of(job, struct my_job_state, the_job);

/* state->mutex is unlocked upon entering job */

/* do whatever needs to be done. */

threadpool_job_done(job);

/* thread we ran on will be idle'd and available for future jobs until 
timing out and exiting. */
}

In addition to the foundation for task(9) (and eventually workqueue(9) -- 
Taylor already has a prototype implementation of that API on top of 
threadpool(9)), this API could be very useful for some of the other uses of 
ephemeral or mostly-idle kernel threads... the two that jumped into my mind 
immediately were scsibus_discover_thread and scsipi_completion_thread ... each 
of those could be easily converted to jobs running on an unbound PRI_NONE 
thread pool.  atabusconfig_thread and atabus_thread are other obvious 
candidates.  I actually think those 4 examples are really great applications of 
where to use threadpool(9) directly, because they are infrequent, but 
potentially arbitrarily long-running, and thus not suitable for task(9).  If 
someone wants to tackle those, I'd be happy to answer questions about how to 
use threadpool(9) in those situations (or perhaps I'll just do it so that 
there's a concrete in-tree example of how to use the API that's not as complex 
as task(9) is).

Anyway, feel to reach out if you have questions!  And kudos to Taylor for the 
great work... it didn't require much effort to get it the 
never-even-compile-tested draft up and running.

-- thorpej



Re: RFC: vioif(4) multiqueue support

2018-12-26 Thread s ymgch
Thank you for your helpful comments!


2018年12月26日(水) 11:37 Taylor R Campbell :
>
> > Date: Wed, 26 Dec 2018 10:03:15 +0900
> > From: Shoichi Yamaguchi 
> >
> > > I implemented a patch that make vioif(4) support multi-queue. And I have 
> > > put
> > > the patch on ftp.n.o. I used vioif(4) multiqueue on qemu-kvm on Linux 
> > > kernel
> > > 4.19.5. And It seems to be working fine.
> > >
> > > https://ftp.netbsd.org/pub/NetBSD/misc/yamaguchi/vioif_mutilq.patch
> >
> > Do you have any comments?
> > I would like to going to commit the patch if there is no comment until 
> > tomorrow.
>
> Hi, Yamaguchi-san!  A lot of Americans and Europeans are on vacation
> this time of year, so it might be better to hold off for another week
> or two.  Here's a quick review -- I don't know anything about virtio,
> so this is just about use of kernel APIs and abstractions.  Someone
> who knows something about virtio should take a look too.

Yes, indeed. I'll wait for other comments for more one or two week.

>
> > diff --git a/sys/dev/pci/if_vioif.c b/sys/dev/pci/if_vioif.c
> > index 3bbd300e88e..769b108e793 100644
> > --- a/sys/dev/pci/if_vioif.c
> > +++ b/sys/dev/pci/if_vioif.c
> >
> >  /* Feature bits */
> > -#define VIRTIO_NET_F_CSUM(1<<0)
> > [...]
> > +#define VIRTIO_NET_F_MQ  (1<<22)
>
> If you're going to modify all of the lines here, maybe take the
> opportunity to convert them to use __BIT?
>
> > @@ -171,73 +184,110 @@ struct virtio_net_ctrl_vlan {
> > [...]
> >  /*
> >   * if_vioifvar.h:
> >   */
> > +struct vioif_txqueue {
> > + struct virtqueue*txq_vq;
>
> Why not make this an embedded structure?

The reason why I don't use "struct virtqueue" but use "struct virtqueue *"
is to register a virtqueue array ("struct virtqueue []") to virtio(4)
as an argument
of virtio_child_attach_start() or virtio_child_attach_set_vqs(). Virtqueues used
in a child device of virtio(4) like vioif(4) must be array.

>
> struct vioif_txqueue {
> struct virtqueuetxq_vq;
> ...
> };
>
> struct vioif_softc {
> struct vioif_txqueue*sc_txq;
> struct vioif_rxqueue*sc_rxq;
> struct vioif_ctrlqueue  *sc_ctrlq;
> ...
> };
>
> > + kmutex_t*txq_lock;
>
> Why is this a pointer to kmutex_t with mutex_obj_alloc/free and not
> just a kmutex_t with mutex_init/destroy?  Is it reused anywhere?  If
> it is reused, this needs explanation in the comments.  If it is not,
> just use kmutex_t.

It is for the error handling code.
Example:
for (...) {
txq[i]->txq_lock = mutex_obj_alloc();
s = softint_establish()
if (s == NULL)
goto err;
}

err:
for (...)
if (txq[i]->txq_lock)
mutex_obj_free(txq->txq_lock);

I use the pointer value as a flag weather the lock already allocated or not, and
I didn't want to add a field into vioif_txqueue to save it.

>
> Can you write a comment summarizing what locks cover what fields, and,
> if more than one lock can be held at once, what the lock order is?

I'll add comments for the locks.
Currently, the locks cover all fields in the structure, and two or more
than locks can't be held at once.

>
> > +struct vioif_rxqueue {
> > + struct virtqueue*rxq_vq;
> > + kmutex_t*rxq_lock;
>
> Likewise.
>
> > -#define VIOIF_TX_LOCK(_sc)   mutex_enter(&(_sc)->sc_tx_lock)
> > -#define VIOIF_TX_UNLOCK(_sc) mutex_exit(&(_sc)->sc_tx_lock)
> > -#define VIOIF_TX_LOCKED(_sc) mutex_owned(&(_sc)->sc_tx_lock)
> > -#define VIOIF_RX_LOCK(_sc)   mutex_enter(&(_sc)->sc_rx_lock)
> > -#define VIOIF_RX_UNLOCK(_sc) mutex_exit(&(_sc)->sc_rx_lock)
> > -#define VIOIF_RX_LOCKED(_sc) mutex_owned(&(_sc)->sc_rx_lock)
> > +#define VIOIF_TXQ_LOCK(_q)   mutex_enter((_q)->txq_lock)
> > +#define VIOIF_TXQ_TRYLOCK(_q)mutex_tryenter((_q)->txq_lock)
> > +#define VIOIF_TXQ_UNLOCK(_q) mutex_exit((_q)->txq_lock)
> > +#define VIOIF_TXQ_LOCKED(_q) mutex_owned((_q)->txq_lock)
> > +
> > +#define VIOIF_RXQ_LOCK(_q)   mutex_enter((_q)->rxq_lock)
> > +#define VIOIF_RXQ_UNLOCK(_q) mutex_exit((_q)->rxq_lock)
> > +#define VIOIF_RXQ_LOCKED(_q) mutex_owned((_q)->rxq_lock)
>
> Can we just use mutex_enter/exit/ without the macros?  Sometimes we
> use macros where they are conditional, depending on NET_MPSAFE, but if
> there's no need for that, I would prefer to read direct calls to
> mutex_enter/exit/
>
> > +static int
> > +vioif_alloc_queues(struct vioif_softc *sc)
> > +{
> > + int nvq_pairs = sc->sc_max_nvq_pairs;
> > + int nvqs = nvq_pairs * 2;
> > + int i;
> > +
> > + sc->sc_rxq = kmem_zalloc(sizeof(sc->sc_rxq[0]) * nvq_pairs,
> > + KM_NOSLEEP);
> > + if (sc->sc_rxq == NULL)
> > + return -1;
> > +
> > + sc->sc_txq = kmem_zalloc(sizeof(sc->sc_txq[0]) * nvq_pairs,
> > + KM_NOSLEEP);
> > + if (sc->sc_txq == NULL)
> > + return -1;
>
> Check to avoid arithmetic overflow here:
>

Re: RFC: vioif(4) multiqueue support

2018-12-26 Thread Shoichi Yamaguchi
Hi,

2018年12月18日(火) 9:35 Shoichi Yamaguchi :
>
> Hi all,
>
> I implemented a patch that make vioif(4) support multi-queue. And I have put
> the patch on ftp.n.o. I used vioif(4) multiqueue on qemu-kvm on Linux kernel
> 4.19.5. And It seems to be working fine.
>
> https://ftp.netbsd.org/pub/NetBSD/misc/yamaguchi/vioif_mutilq.patch

Do you have any comments?
I would like to going to commit the patch if there is no comment until tomorrow.

Thanks,

Yamaguchi