Re: drm(4), multi-threaded task queues and barriers
On Fri, Jul 05, 2019 at 12:20:20PM +0200, Mark Kettenis wrote: > > Date: Fri, 5 Jul 2019 14:20:40 +1000 > > From: Jonathan Gray > > > > On Wed, Jun 12, 2019 at 09:15:36AM +0200, Mark Kettenis wrote: > > > > Date: Wed, 12 Jun 2019 17:04:10 +1000 > > > > From: Jonathan Gray > > > > > > > > On Tue, Jun 11, 2019 at 09:10:46PM +0200, Mark Kettenis wrote: > > > > > The drm(4) codebase really needs multi-threaded task queues since the > > > > > code has taks that wait for the completion of other tasks that are > > > > > submitted to the same task queue. Thank you Linux... > > > > > > > > > > Unfortunately the code also needs to wait for the completion of > > > > > submitted tasks from other threads. This implemented using > > > > > task_barrier(9). But unfortunately our current implementation only > > > > > works for single-threaded task queues. > > > > > > > > > > The diff below fixes this by marking the barrier task with a flag and > > > > > making sure that all threads of the task queue are syncchronized. > > > > > This achived through a TASK_BARRIER flag that simply blocks the > > > > > threads until the last unblocked thread sees the flag and executes the > > > > > task. > > > > > > > > > > The diff also starts 4 threads for each workqueue that gets created by > > > > > the drm(4) layer. The number 4 is a bit arbitrary but it is the > > > > > number of threads that Linux creates per CPU for a so-called "unbound" > > > > > workqueue which hopefully is enough to always make progress. > > > > > > > > Ignoring taskletq and systq use in burner_task/switchtask across > > > > drivers and the local only backlight_schedule_update_status(), was it > > > > intentional to exclude: > > > > > > > > workqueue.h: alloc_workqueue() > > > > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > > > > workqueue.h: alloc_ordered_workqueue() > > > > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > > > > workqueue.h: > > > > #define system_power_efficient_wq ((struct workqueue_struct > > > > *)systq) > > > > > > Not entirely. But I don't want to spawn too many threads... If this > > > diff seems to work, we might want to tweak the numbers a bit to see > > > what works best. Whatever we do, the ordered workqueues need to stay > > > single-threaded. > > > > While the non-drm parts of this went in the drm_linux.c and intel_hotplug.c > > parts have not. Was there a particular reason these changes didn't go in? > > I wanted to check whether we could get away with using less threads > for the queues but I didn't get around to doing it. But I guess I > could just commit this and do some additional tuning later. > > You're ok with the change going in? Yes, sorry if that wasn't clear. ok jsg@ > > > > > > Please test. If you experience a "hang" with this diff, please try to > > > > > log in to the machine remotely over ssh and send me and jsg@ the > > > > > output of "ps -AHwlk". > > > > > > > > > > Thanks, > > > > > > > > > > Mark > > > > > > > > > > > > > > > Index: dev/pci/drm/drm_linux.c > > > > > === > > > > > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > > > > > retrieving revision 1.38 > > > > > diff -u -p -r1.38 drm_linux.c > > > > > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > > > > > +++ dev/pci/drm/drm_linux.c 11 Jun 2019 18:54:38 - > > > > > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > > > > > { > > > > > if (system_wq == NULL) { > > > > > system_wq = (struct workqueue_struct *) > > > > > - taskq_create("drmwq", 1, IPL_HIGH, 0); > > > > > + taskq_create("drmwq", 4, IPL_HIGH, 0); > > > > > } > > > > > if (system_unbound_wq == NULL) { > > > > > system_unbound_wq = (struct workqueue_struct *) > > > > > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > > > > > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > > > > > } > > > > > if (system_long_wq == NULL) { > > > > > system_long_wq = (struct workqueue_struct *) > > > > > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > > > > > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > > > > > } > > > > > > > > > > if (taskletq == NULL) > > > > > Index: dev/pci/drm/i915/intel_hotplug.c > > > > > === > > > > > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > > > > > retrieving revision 1.2 > > > > > diff -u -p -r1.2 intel_hotplug.c > > > > > --- dev/pci/drm/i915/intel_hotplug.c 14 Apr 2019 10:14:52 - > > > > > 1.2 > > > > > +++ dev/pci/drm/i915/intel_hotplug.c 11 Jun 2019 18:54:38 - > > > > > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > > > > > INIT_WORK(_priv->hotplug.hotplug_work, > > > > > i915_hotplug_work_func); > > > > >
Re: drm(4), multi-threaded task queues and barriers
> Date: Fri, 5 Jul 2019 14:20:40 +1000 > From: Jonathan Gray > > On Wed, Jun 12, 2019 at 09:15:36AM +0200, Mark Kettenis wrote: > > > Date: Wed, 12 Jun 2019 17:04:10 +1000 > > > From: Jonathan Gray > > > > > > On Tue, Jun 11, 2019 at 09:10:46PM +0200, Mark Kettenis wrote: > > > > The drm(4) codebase really needs multi-threaded task queues since the > > > > code has taks that wait for the completion of other tasks that are > > > > submitted to the same task queue. Thank you Linux... > > > > > > > > Unfortunately the code also needs to wait for the completion of > > > > submitted tasks from other threads. This implemented using > > > > task_barrier(9). But unfortunately our current implementation only > > > > works for single-threaded task queues. > > > > > > > > The diff below fixes this by marking the barrier task with a flag and > > > > making sure that all threads of the task queue are syncchronized. > > > > This achived through a TASK_BARRIER flag that simply blocks the > > > > threads until the last unblocked thread sees the flag and executes the > > > > task. > > > > > > > > The diff also starts 4 threads for each workqueue that gets created by > > > > the drm(4) layer. The number 4 is a bit arbitrary but it is the > > > > number of threads that Linux creates per CPU for a so-called "unbound" > > > > workqueue which hopefully is enough to always make progress. > > > > > > Ignoring taskletq and systq use in burner_task/switchtask across > > > drivers and the local only backlight_schedule_update_status(), was it > > > intentional to exclude: > > > > > > workqueue.h: alloc_workqueue() > > > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > > > workqueue.h: alloc_ordered_workqueue() > > > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > > > workqueue.h: > > > #define system_power_efficient_wq ((struct workqueue_struct *)systq) > > > > Not entirely. But I don't want to spawn too many threads... If this > > diff seems to work, we might want to tweak the numbers a bit to see > > what works best. Whatever we do, the ordered workqueues need to stay > > single-threaded. > > While the non-drm parts of this went in the drm_linux.c and intel_hotplug.c > parts have not. Was there a particular reason these changes didn't go in? I wanted to check whether we could get away with using less threads for the queues but I didn't get around to doing it. But I guess I could just commit this and do some additional tuning later. You're ok with the change going in? > > > > Please test. If you experience a "hang" with this diff, please try to > > > > log in to the machine remotely over ssh and send me and jsg@ the > > > > output of "ps -AHwlk". > > > > > > > > Thanks, > > > > > > > > Mark > > > > > > > > > > > > Index: dev/pci/drm/drm_linux.c > > > > === > > > > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > > > > retrieving revision 1.38 > > > > diff -u -p -r1.38 drm_linux.c > > > > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > > > > +++ dev/pci/drm/drm_linux.c 11 Jun 2019 18:54:38 - > > > > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > > > > { > > > > if (system_wq == NULL) { > > > > system_wq = (struct workqueue_struct *) > > > > - taskq_create("drmwq", 1, IPL_HIGH, 0); > > > > + taskq_create("drmwq", 4, IPL_HIGH, 0); > > > > } > > > > if (system_unbound_wq == NULL) { > > > > system_unbound_wq = (struct workqueue_struct *) > > > > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > > > > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > > > > } > > > > if (system_long_wq == NULL) { > > > > system_long_wq = (struct workqueue_struct *) > > > > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > > > > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > > > > } > > > > > > > > if (taskletq == NULL) > > > > Index: dev/pci/drm/i915/intel_hotplug.c > > > > === > > > > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > > > > retrieving revision 1.2 > > > > diff -u -p -r1.2 intel_hotplug.c > > > > --- dev/pci/drm/i915/intel_hotplug.c14 Apr 2019 10:14:52 - > > > > 1.2 > > > > +++ dev/pci/drm/i915/intel_hotplug.c11 Jun 2019 18:54:38 - > > > > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > > > > INIT_WORK(_priv->hotplug.hotplug_work, > > > > i915_hotplug_work_func); > > > > INIT_WORK(_priv->hotplug.dig_port_work, > > > > i915_digport_work_func); > > > > INIT_WORK(_priv->hotplug.poll_init_work, > > > > i915_hpd_poll_init_work); > > > > - dev_priv->hotplug.poll_init_work.tq = systq; > > > > INIT_DELAYED_WORK(_priv->hotplug.reenable_work, > > > >
Re: drm(4), multi-threaded task queues and barriers
On Wed, Jun 12, 2019 at 09:15:36AM +0200, Mark Kettenis wrote: > > Date: Wed, 12 Jun 2019 17:04:10 +1000 > > From: Jonathan Gray > > > > On Tue, Jun 11, 2019 at 09:10:46PM +0200, Mark Kettenis wrote: > > > The drm(4) codebase really needs multi-threaded task queues since the > > > code has taks that wait for the completion of other tasks that are > > > submitted to the same task queue. Thank you Linux... > > > > > > Unfortunately the code also needs to wait for the completion of > > > submitted tasks from other threads. This implemented using > > > task_barrier(9). But unfortunately our current implementation only > > > works for single-threaded task queues. > > > > > > The diff below fixes this by marking the barrier task with a flag and > > > making sure that all threads of the task queue are syncchronized. > > > This achived through a TASK_BARRIER flag that simply blocks the > > > threads until the last unblocked thread sees the flag and executes the > > > task. > > > > > > The diff also starts 4 threads for each workqueue that gets created by > > > the drm(4) layer. The number 4 is a bit arbitrary but it is the > > > number of threads that Linux creates per CPU for a so-called "unbound" > > > workqueue which hopefully is enough to always make progress. > > > > Ignoring taskletq and systq use in burner_task/switchtask across > > drivers and the local only backlight_schedule_update_status(), was it > > intentional to exclude: > > > > workqueue.h: alloc_workqueue() > > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > > workqueue.h: alloc_ordered_workqueue() > > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > > workqueue.h: > > #define system_power_efficient_wq ((struct workqueue_struct *)systq) > > Not entirely. But I don't want to spawn too many threads... If this > diff seems to work, we might want to tweak the numbers a bit to see > what works best. Whatever we do, the ordered workqueues need to stay > single-threaded. While the non-drm parts of this went in the drm_linux.c and intel_hotplug.c parts have not. Was there a particular reason these changes didn't go in? > > > > Please test. If you experience a "hang" with this diff, please try to > > > log in to the machine remotely over ssh and send me and jsg@ the > > > output of "ps -AHwlk". > > > > > > Thanks, > > > > > > Mark > > > > > > > > > Index: dev/pci/drm/drm_linux.c > > > === > > > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > > > retrieving revision 1.38 > > > diff -u -p -r1.38 drm_linux.c > > > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > > > +++ dev/pci/drm/drm_linux.c 11 Jun 2019 18:54:38 - > > > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > > > { > > > if (system_wq == NULL) { > > > system_wq = (struct workqueue_struct *) > > > - taskq_create("drmwq", 1, IPL_HIGH, 0); > > > + taskq_create("drmwq", 4, IPL_HIGH, 0); > > > } > > > if (system_unbound_wq == NULL) { > > > system_unbound_wq = (struct workqueue_struct *) > > > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > > > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > > > } > > > if (system_long_wq == NULL) { > > > system_long_wq = (struct workqueue_struct *) > > > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > > > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > > > } > > > > > > if (taskletq == NULL) > > > Index: dev/pci/drm/i915/intel_hotplug.c > > > === > > > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > > > retrieving revision 1.2 > > > diff -u -p -r1.2 intel_hotplug.c > > > --- dev/pci/drm/i915/intel_hotplug.c 14 Apr 2019 10:14:52 - > > > 1.2 > > > +++ dev/pci/drm/i915/intel_hotplug.c 11 Jun 2019 18:54:38 - > > > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > > > INIT_WORK(_priv->hotplug.hotplug_work, i915_hotplug_work_func); > > > INIT_WORK(_priv->hotplug.dig_port_work, i915_digport_work_func); > > > INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); > > > - dev_priv->hotplug.poll_init_work.tq = systq; > > > INIT_DELAYED_WORK(_priv->hotplug.reenable_work, > > > intel_hpd_irq_storm_reenable_work); > > > } > > > Index: kern/kern_task.c > > > === > > > RCS file: /cvs/src/sys/kern/kern_task.c,v > > > retrieving revision 1.25 > > > diff -u -p -r1.25 kern_task.c > > > --- kern/kern_task.c 28 Apr 2019 04:20:40 - 1.25 > > > +++ kern/kern_task.c 11 Jun 2019 18:54:39 - > > > @@ -43,6 +43,7 @@ struct taskq { > > > TQ_S_DESTROYED > > > }tq_state; > > > unsigned int tq_running; > > > + unsigned int tq_barrier; > > > unsigned
Re: drm(4), multi-threaded task queues and barriers
Mark Kettenis on Thu, Jun 13 2019: > So here is a diff that fixes the problem as far as I can tell. > Jonathan, Sven, can you give this a go? Works for me. > + tq->tq_waiting++; > msleep(tq, >tq_mtx, PWAIT, "bored", 0); > + tq->tq_waiting--; Heh, didn't spot this either. State machines are hard. :) Regards.
Re: drm(4), multi-threaded task queues and barriers
On Thu, Jun 13, 2019 at 11:27:57PM +0200, Mark Kettenis wrote: > > Date: Wed, 12 Jun 2019 23:57:27 +0200 (CEST) > > From: Mark Kettenis > > > > > From: "Sven M. Hallberg" > > > Date: Wed, 12 Jun 2019 23:18:24 +0200 > > > > > > Mark Kettenis on Tue, Jun 11 2019: > > > > The drm(4) codebase really needs multi-threaded task queues [...] > > > > > > > > The diff also starts 4 threads for each workqueue that gets created by > > > > the drm(4) layer. The number 4 is a bit arbitrary but it is the > > > > number of threads that Linux creates per CPU for a so-called "unbound" > > > > workqueue which hopefully is enough to always make progress. > > > > > > > > Please test. > > > > > > Looks good and appears to work fine with two displays (one internal, one > > > external). Will test with three at work tomorrow. > > > > > > > > > > - dev_priv->hotplug.poll_init_work.tq = systq; > > > > > > Intentional? > > > > Yes. It removes a local modification that should no longer be necessary. > > > > Unfortunately the diff doesn't work with amdgpu. Some more thinking > > needed... > > So here is a diff that fixes the problem as far as I can tell. > Jonathan, Sven, can you give this a go? This revised diff does not introduce any new problems with amdgpu as far as I can tell. Xorg starts/runs, piglit starts and later gets stuck in dmafence/scheduler related wait channels (as it did before). > > Index: dev/pci/drm/drm_linux.c > === > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > retrieving revision 1.38 > diff -u -p -r1.38 drm_linux.c > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > +++ dev/pci/drm/drm_linux.c 13 Jun 2019 20:53:23 - > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > { > if (system_wq == NULL) { > system_wq = (struct workqueue_struct *) > - taskq_create("drmwq", 1, IPL_HIGH, 0); > + taskq_create("drmwq", 4, IPL_HIGH, 0); > } > if (system_unbound_wq == NULL) { > system_unbound_wq = (struct workqueue_struct *) > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > } > if (system_long_wq == NULL) { > system_long_wq = (struct workqueue_struct *) > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > } > > if (taskletq == NULL) > Index: dev/pci/drm/i915/intel_hotplug.c > === > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > retrieving revision 1.2 > diff -u -p -r1.2 intel_hotplug.c > --- dev/pci/drm/i915/intel_hotplug.c 14 Apr 2019 10:14:52 - 1.2 > +++ dev/pci/drm/i915/intel_hotplug.c 13 Jun 2019 20:53:23 - > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > INIT_WORK(_priv->hotplug.hotplug_work, i915_hotplug_work_func); > INIT_WORK(_priv->hotplug.dig_port_work, i915_digport_work_func); > INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); > - dev_priv->hotplug.poll_init_work.tq = systq; > INIT_DELAYED_WORK(_priv->hotplug.reenable_work, > intel_hpd_irq_storm_reenable_work); > } > Index: kern/kern_task.c > === > RCS file: /cvs/src/sys/kern/kern_task.c,v > retrieving revision 1.25 > diff -u -p -r1.25 kern_task.c > --- kern/kern_task.c 28 Apr 2019 04:20:40 - 1.25 > +++ kern/kern_task.c 13 Jun 2019 20:53:23 - > @@ -43,6 +43,7 @@ struct taskq { > TQ_S_DESTROYED > }tq_state; > unsigned int tq_running; > + unsigned int tq_waiting; > unsigned int tq_nthreads; > unsigned int tq_flags; > const char *tq_name; > @@ -59,6 +60,7 @@ static const char taskq_sys_name[] = "sy > struct taskq taskq_sys = { > TQ_S_CREATED, > 0, > + 0, > 1, > 0, > taskq_sys_name, > @@ -77,6 +79,7 @@ static const char taskq_sys_mp_name[] = > struct taskq taskq_sys_mp = { > TQ_S_CREATED, > 0, > + 0, > 1, > TASKQ_MPSAFE, > taskq_sys_mp_name, > @@ -122,6 +125,7 @@ taskq_create(const char *name, unsigned > > tq->tq_state = TQ_S_CREATED; > tq->tq_running = 0; > + tq->tq_waiting = 0; > tq->tq_nthreads = nthreads; > tq->tq_name = name; > tq->tq_flags = flags; > @@ -223,6 +227,7 @@ taskq_barrier(struct taskq *tq) > > WITNESS_CHECKORDER(>tq_lock_object, LOP_NEWORDER, NULL); > > + SET(t.t_flags, TASK_BARRIER); > task_add(tq, ); > cond_wait(, "tqbar"); > } > @@ -238,6 +243,7 @@ taskq_del_barrier(struct taskq *tq, stru > if (task_del(tq, del)) > return; > > +
Re: drm(4), multi-threaded task queues and barriers
> Date: Wed, 12 Jun 2019 23:57:27 +0200 (CEST) > From: Mark Kettenis > > > From: "Sven M. Hallberg" > > Date: Wed, 12 Jun 2019 23:18:24 +0200 > > > > Mark Kettenis on Tue, Jun 11 2019: > > > The drm(4) codebase really needs multi-threaded task queues [...] > > > > > > The diff also starts 4 threads for each workqueue that gets created by > > > the drm(4) layer. The number 4 is a bit arbitrary but it is the > > > number of threads that Linux creates per CPU for a so-called "unbound" > > > workqueue which hopefully is enough to always make progress. > > > > > > Please test. > > > > Looks good and appears to work fine with two displays (one internal, one > > external). Will test with three at work tomorrow. > > > > > > > - dev_priv->hotplug.poll_init_work.tq = systq; > > > > Intentional? > > Yes. It removes a local modification that should no longer be necessary. > > Unfortunately the diff doesn't work with amdgpu. Some more thinking > needed... So here is a diff that fixes the problem as far as I can tell. Jonathan, Sven, can you give this a go? Index: dev/pci/drm/drm_linux.c === RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v retrieving revision 1.38 diff -u -p -r1.38 drm_linux.c --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 +++ dev/pci/drm/drm_linux.c 13 Jun 2019 20:53:23 - @@ -1399,15 +1399,15 @@ drm_linux_init(void) { if (system_wq == NULL) { system_wq = (struct workqueue_struct *) - taskq_create("drmwq", 1, IPL_HIGH, 0); + taskq_create("drmwq", 4, IPL_HIGH, 0); } if (system_unbound_wq == NULL) { system_unbound_wq = (struct workqueue_struct *) - taskq_create("drmubwq", 1, IPL_HIGH, 0); + taskq_create("drmubwq", 4, IPL_HIGH, 0); } if (system_long_wq == NULL) { system_long_wq = (struct workqueue_struct *) - taskq_create("drmlwq", 1, IPL_HIGH, 0); + taskq_create("drmlwq", 4, IPL_HIGH, 0); } if (taskletq == NULL) Index: dev/pci/drm/i915/intel_hotplug.c === RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v retrieving revision 1.2 diff -u -p -r1.2 intel_hotplug.c --- dev/pci/drm/i915/intel_hotplug.c14 Apr 2019 10:14:52 - 1.2 +++ dev/pci/drm/i915/intel_hotplug.c13 Jun 2019 20:53:23 - @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 INIT_WORK(_priv->hotplug.hotplug_work, i915_hotplug_work_func); INIT_WORK(_priv->hotplug.dig_port_work, i915_digport_work_func); INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); - dev_priv->hotplug.poll_init_work.tq = systq; INIT_DELAYED_WORK(_priv->hotplug.reenable_work, intel_hpd_irq_storm_reenable_work); } Index: kern/kern_task.c === RCS file: /cvs/src/sys/kern/kern_task.c,v retrieving revision 1.25 diff -u -p -r1.25 kern_task.c --- kern/kern_task.c28 Apr 2019 04:20:40 - 1.25 +++ kern/kern_task.c13 Jun 2019 20:53:23 - @@ -43,6 +43,7 @@ struct taskq { TQ_S_DESTROYED }tq_state; unsigned int tq_running; + unsigned int tq_waiting; unsigned int tq_nthreads; unsigned int tq_flags; const char *tq_name; @@ -59,6 +60,7 @@ static const char taskq_sys_name[] = "sy struct taskq taskq_sys = { TQ_S_CREATED, 0, + 0, 1, 0, taskq_sys_name, @@ -77,6 +79,7 @@ static const char taskq_sys_mp_name[] = struct taskq taskq_sys_mp = { TQ_S_CREATED, 0, + 0, 1, TASKQ_MPSAFE, taskq_sys_mp_name, @@ -122,6 +125,7 @@ taskq_create(const char *name, unsigned tq->tq_state = TQ_S_CREATED; tq->tq_running = 0; + tq->tq_waiting = 0; tq->tq_nthreads = nthreads; tq->tq_name = name; tq->tq_flags = flags; @@ -223,6 +227,7 @@ taskq_barrier(struct taskq *tq) WITNESS_CHECKORDER(>tq_lock_object, LOP_NEWORDER, NULL); + SET(t.t_flags, TASK_BARRIER); task_add(tq, ); cond_wait(, "tqbar"); } @@ -238,6 +243,7 @@ taskq_del_barrier(struct taskq *tq, stru if (task_del(tq, del)) return; + SET(t.t_flags, TASK_BARRIER); task_add(tq, ); cond_wait(, "tqbar"); } @@ -304,13 +310,26 @@ taskq_next_work(struct taskq *tq, struct struct task *next; mtx_enter(>tq_mtx); +retry: while ((next = TAILQ_FIRST(>tq_worklist)) == NULL) { if (tq->tq_state != TQ_S_RUNNING) { mtx_leave(>tq_mtx); return (0);
Re: drm(4), multi-threaded task queues and barriers
Mark Kettenis on Wed, Jun 12 2019: >> Looks good and appears to work fine with two displays (one internal, one >> external). Will test with three at work tomorrow. Your diff also works for me with three displays (inteldrm). > Unfortunately the diff doesn't work with amdgpu. Some more thinking > needed... I would be interested to hear what the issue is. Regards, pesco
Re: drm(4), multi-threaded task queues and barriers
> From: "Sven M. Hallberg" > Date: Wed, 12 Jun 2019 23:18:24 +0200 > > Mark Kettenis on Tue, Jun 11 2019: > > The drm(4) codebase really needs multi-threaded task queues [...] > > > > The diff also starts 4 threads for each workqueue that gets created by > > the drm(4) layer. The number 4 is a bit arbitrary but it is the > > number of threads that Linux creates per CPU for a so-called "unbound" > > workqueue which hopefully is enough to always make progress. > > > > Please test. > > Looks good and appears to work fine with two displays (one internal, one > external). Will test with three at work tomorrow. > > > > - dev_priv->hotplug.poll_init_work.tq = systq; > > Intentional? Yes. It removes a local modification that should no longer be necessary. Unfortunately the diff doesn't work with amdgpu. Some more thinking needed... > > Index: dev/pci/drm/drm_linux.c > > === > > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > > retrieving revision 1.38 > > diff -u -p -r1.38 drm_linux.c > > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > > +++ dev/pci/drm/drm_linux.c 11 Jun 2019 18:54:38 - > > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > > { > > if (system_wq == NULL) { > > system_wq = (struct workqueue_struct *) > > - taskq_create("drmwq", 1, IPL_HIGH, 0); > > + taskq_create("drmwq", 4, IPL_HIGH, 0); > > } > > if (system_unbound_wq == NULL) { > > system_unbound_wq = (struct workqueue_struct *) > > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > > } > > if (system_long_wq == NULL) { > > system_long_wq = (struct workqueue_struct *) > > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > > } > > > > if (taskletq == NULL) > > Index: dev/pci/drm/i915/intel_hotplug.c > > === > > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > > retrieving revision 1.2 > > diff -u -p -r1.2 intel_hotplug.c > > --- dev/pci/drm/i915/intel_hotplug.c14 Apr 2019 10:14:52 - > > 1.2 > > +++ dev/pci/drm/i915/intel_hotplug.c11 Jun 2019 18:54:38 - > > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > > INIT_WORK(_priv->hotplug.hotplug_work, i915_hotplug_work_func); > > INIT_WORK(_priv->hotplug.dig_port_work, i915_digport_work_func); > > INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); > > - dev_priv->hotplug.poll_init_work.tq = systq; > > INIT_DELAYED_WORK(_priv->hotplug.reenable_work, > > intel_hpd_irq_storm_reenable_work); > > } > > Index: kern/kern_task.c > > === > > RCS file: /cvs/src/sys/kern/kern_task.c,v > > retrieving revision 1.25 > > diff -u -p -r1.25 kern_task.c > > --- kern/kern_task.c28 Apr 2019 04:20:40 - 1.25 > > +++ kern/kern_task.c11 Jun 2019 18:54:39 - > > @@ -43,6 +43,7 @@ struct taskq { > > TQ_S_DESTROYED > > }tq_state; > > unsigned int tq_running; > > + unsigned int tq_barrier; > > unsigned int tq_nthreads; > > unsigned int tq_flags; > > const char *tq_name; > > @@ -59,6 +60,7 @@ static const char taskq_sys_name[] = "sy > > struct taskq taskq_sys = { > > TQ_S_CREATED, > > 0, > > + 0, > > 1, > > 0, > > taskq_sys_name, > > @@ -77,6 +79,7 @@ static const char taskq_sys_mp_name[] = > > struct taskq taskq_sys_mp = { > > TQ_S_CREATED, > > 0, > > + 0, > > 1, > > TASKQ_MPSAFE, > > taskq_sys_mp_name, > > @@ -122,6 +125,7 @@ taskq_create(const char *name, unsigned > > > > tq->tq_state = TQ_S_CREATED; > > tq->tq_running = 0; > > + tq->tq_barrier = 0; > > tq->tq_nthreads = nthreads; > > tq->tq_name = name; > > tq->tq_flags = flags; > > @@ -161,6 +165,7 @@ taskq_destroy(struct taskq *tq) > > panic("unexpected %s tq state %u", tq->tq_name, tq->tq_state); > > } > > > > + tq->tq_barrier = 0; > > while (tq->tq_running > 0) { > > wakeup(tq); > > msleep(>tq_running, >tq_mtx, PWAIT, "tqdestroy", 0); > > @@ -223,6 +228,7 @@ taskq_barrier(struct taskq *tq) > > > > WITNESS_CHECKORDER(>tq_lock_object, LOP_NEWORDER, NULL); > > > > + SET(t.t_flags, TASK_BARRIER); > > task_add(tq, ); > > cond_wait(, "tqbar"); > > } > > @@ -238,6 +244,7 @@ taskq_del_barrier(struct taskq *tq, stru > > if (task_del(tq, del)) > > return; > > > > + SET(t.t_flags, TASK_BARRIER); > > task_add(tq, ); > > cond_wait(, "tqbar"); > > } > > @@ -304,6 +311,7 @@ taskq_next_work(struct taskq *tq, struct >
Re: drm(4), multi-threaded task queues and barriers
Mark Kettenis on Tue, Jun 11 2019: > The drm(4) codebase really needs multi-threaded task queues [...] > > The diff also starts 4 threads for each workqueue that gets created by > the drm(4) layer. The number 4 is a bit arbitrary but it is the > number of threads that Linux creates per CPU for a so-called "unbound" > workqueue which hopefully is enough to always make progress. > > Please test. Looks good and appears to work fine with two displays (one internal, one external). Will test with three at work tomorrow. > - dev_priv->hotplug.poll_init_work.tq = systq; Intentional? -p > Index: dev/pci/drm/drm_linux.c > === > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > retrieving revision 1.38 > diff -u -p -r1.38 drm_linux.c > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > +++ dev/pci/drm/drm_linux.c 11 Jun 2019 18:54:38 - > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > { > if (system_wq == NULL) { > system_wq = (struct workqueue_struct *) > - taskq_create("drmwq", 1, IPL_HIGH, 0); > + taskq_create("drmwq", 4, IPL_HIGH, 0); > } > if (system_unbound_wq == NULL) { > system_unbound_wq = (struct workqueue_struct *) > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > } > if (system_long_wq == NULL) { > system_long_wq = (struct workqueue_struct *) > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > } > > if (taskletq == NULL) > Index: dev/pci/drm/i915/intel_hotplug.c > === > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > retrieving revision 1.2 > diff -u -p -r1.2 intel_hotplug.c > --- dev/pci/drm/i915/intel_hotplug.c 14 Apr 2019 10:14:52 - 1.2 > +++ dev/pci/drm/i915/intel_hotplug.c 11 Jun 2019 18:54:38 - > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > INIT_WORK(_priv->hotplug.hotplug_work, i915_hotplug_work_func); > INIT_WORK(_priv->hotplug.dig_port_work, i915_digport_work_func); > INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); > - dev_priv->hotplug.poll_init_work.tq = systq; > INIT_DELAYED_WORK(_priv->hotplug.reenable_work, > intel_hpd_irq_storm_reenable_work); > } > Index: kern/kern_task.c > === > RCS file: /cvs/src/sys/kern/kern_task.c,v > retrieving revision 1.25 > diff -u -p -r1.25 kern_task.c > --- kern/kern_task.c 28 Apr 2019 04:20:40 - 1.25 > +++ kern/kern_task.c 11 Jun 2019 18:54:39 - > @@ -43,6 +43,7 @@ struct taskq { > TQ_S_DESTROYED > }tq_state; > unsigned int tq_running; > + unsigned int tq_barrier; > unsigned int tq_nthreads; > unsigned int tq_flags; > const char *tq_name; > @@ -59,6 +60,7 @@ static const char taskq_sys_name[] = "sy > struct taskq taskq_sys = { > TQ_S_CREATED, > 0, > + 0, > 1, > 0, > taskq_sys_name, > @@ -77,6 +79,7 @@ static const char taskq_sys_mp_name[] = > struct taskq taskq_sys_mp = { > TQ_S_CREATED, > 0, > + 0, > 1, > TASKQ_MPSAFE, > taskq_sys_mp_name, > @@ -122,6 +125,7 @@ taskq_create(const char *name, unsigned > > tq->tq_state = TQ_S_CREATED; > tq->tq_running = 0; > + tq->tq_barrier = 0; > tq->tq_nthreads = nthreads; > tq->tq_name = name; > tq->tq_flags = flags; > @@ -161,6 +165,7 @@ taskq_destroy(struct taskq *tq) > panic("unexpected %s tq state %u", tq->tq_name, tq->tq_state); > } > > + tq->tq_barrier = 0; > while (tq->tq_running > 0) { > wakeup(tq); > msleep(>tq_running, >tq_mtx, PWAIT, "tqdestroy", 0); > @@ -223,6 +228,7 @@ taskq_barrier(struct taskq *tq) > > WITNESS_CHECKORDER(>tq_lock_object, LOP_NEWORDER, NULL); > > + SET(t.t_flags, TASK_BARRIER); > task_add(tq, ); > cond_wait(, "tqbar"); > } > @@ -238,6 +244,7 @@ taskq_del_barrier(struct taskq *tq, stru > if (task_del(tq, del)) > return; > > + SET(t.t_flags, TASK_BARRIER); > task_add(tq, ); > cond_wait(, "tqbar"); > } > @@ -304,6 +311,7 @@ taskq_next_work(struct taskq *tq, struct > struct task *next; > > mtx_enter(>tq_mtx); > +retry: > while ((next = TAILQ_FIRST(>tq_worklist)) == NULL) { > if (tq->tq_state != TQ_S_RUNNING) { > mtx_leave(>tq_mtx); > @@ -311,6 +319,17 @@ taskq_next_work(struct taskq *tq, struct > } > > msleep(tq, >tq_mtx, PWAIT, "bored",
Re: drm(4), multi-threaded task queues and barriers
> Date: Wed, 12 Jun 2019 17:04:10 +1000 > From: Jonathan Gray > > On Tue, Jun 11, 2019 at 09:10:46PM +0200, Mark Kettenis wrote: > > The drm(4) codebase really needs multi-threaded task queues since the > > code has taks that wait for the completion of other tasks that are > > submitted to the same task queue. Thank you Linux... > > > > Unfortunately the code also needs to wait for the completion of > > submitted tasks from other threads. This implemented using > > task_barrier(9). But unfortunately our current implementation only > > works for single-threaded task queues. > > > > The diff below fixes this by marking the barrier task with a flag and > > making sure that all threads of the task queue are syncchronized. > > This achived through a TASK_BARRIER flag that simply blocks the > > threads until the last unblocked thread sees the flag and executes the > > task. > > > > The diff also starts 4 threads for each workqueue that gets created by > > the drm(4) layer. The number 4 is a bit arbitrary but it is the > > number of threads that Linux creates per CPU for a so-called "unbound" > > workqueue which hopefully is enough to always make progress. > > Ignoring taskletq and systq use in burner_task/switchtask across > drivers and the local only backlight_schedule_update_status(), was it > intentional to exclude: > > workqueue.h: alloc_workqueue() > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > workqueue.h: alloc_ordered_workqueue() > struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); > workqueue.h: > #define system_power_efficient_wq ((struct workqueue_struct *)systq) Not entirely. But I don't want to spawn too many threads... If this diff seems to work, we might want to tweak the numbers a bit to see what works best. Whatever we do, the ordered workqueues need to stay single-threaded. > > Please test. If you experience a "hang" with this diff, please try to > > log in to the machine remotely over ssh and send me and jsg@ the > > output of "ps -AHwlk". > > > > Thanks, > > > > Mark > > > > > > Index: dev/pci/drm/drm_linux.c > > === > > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > > retrieving revision 1.38 > > diff -u -p -r1.38 drm_linux.c > > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > > +++ dev/pci/drm/drm_linux.c 11 Jun 2019 18:54:38 - > > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > > { > > if (system_wq == NULL) { > > system_wq = (struct workqueue_struct *) > > - taskq_create("drmwq", 1, IPL_HIGH, 0); > > + taskq_create("drmwq", 4, IPL_HIGH, 0); > > } > > if (system_unbound_wq == NULL) { > > system_unbound_wq = (struct workqueue_struct *) > > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > > } > > if (system_long_wq == NULL) { > > system_long_wq = (struct workqueue_struct *) > > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > > } > > > > if (taskletq == NULL) > > Index: dev/pci/drm/i915/intel_hotplug.c > > === > > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > > retrieving revision 1.2 > > diff -u -p -r1.2 intel_hotplug.c > > --- dev/pci/drm/i915/intel_hotplug.c14 Apr 2019 10:14:52 - > > 1.2 > > +++ dev/pci/drm/i915/intel_hotplug.c11 Jun 2019 18:54:38 - > > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > > INIT_WORK(_priv->hotplug.hotplug_work, i915_hotplug_work_func); > > INIT_WORK(_priv->hotplug.dig_port_work, i915_digport_work_func); > > INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); > > - dev_priv->hotplug.poll_init_work.tq = systq; > > INIT_DELAYED_WORK(_priv->hotplug.reenable_work, > > intel_hpd_irq_storm_reenable_work); > > } > > Index: kern/kern_task.c > > === > > RCS file: /cvs/src/sys/kern/kern_task.c,v > > retrieving revision 1.25 > > diff -u -p -r1.25 kern_task.c > > --- kern/kern_task.c28 Apr 2019 04:20:40 - 1.25 > > +++ kern/kern_task.c11 Jun 2019 18:54:39 - > > @@ -43,6 +43,7 @@ struct taskq { > > TQ_S_DESTROYED > > }tq_state; > > unsigned int tq_running; > > + unsigned int tq_barrier; > > unsigned int tq_nthreads; > > unsigned int tq_flags; > > const char *tq_name; > > @@ -59,6 +60,7 @@ static const char taskq_sys_name[] = "sy > > struct taskq taskq_sys = { > > TQ_S_CREATED, > > 0, > > + 0, > > 1, > > 0, > > taskq_sys_name, > > @@ -77,6 +79,7 @@ static const char taskq_sys_mp_name[] = > >
Re: drm(4), multi-threaded task queues and barriers
On Tue, Jun 11, 2019 at 09:10:46PM +0200, Mark Kettenis wrote: > The drm(4) codebase really needs multi-threaded task queues since the > code has taks that wait for the completion of other tasks that are > submitted to the same task queue. Thank you Linux... > > Unfortunately the code also needs to wait for the completion of > submitted tasks from other threads. This implemented using > task_barrier(9). But unfortunately our current implementation only > works for single-threaded task queues. > > The diff below fixes this by marking the barrier task with a flag and > making sure that all threads of the task queue are syncchronized. > This achived through a TASK_BARRIER flag that simply blocks the > threads until the last unblocked thread sees the flag and executes the > task. > > The diff also starts 4 threads for each workqueue that gets created by > the drm(4) layer. The number 4 is a bit arbitrary but it is the > number of threads that Linux creates per CPU for a so-called "unbound" > workqueue which hopefully is enough to always make progress. Ignoring taskletq and systq use in burner_task/switchtask across drivers and the local only backlight_schedule_update_status(), was it intentional to exclude: workqueue.h: alloc_workqueue() struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); workqueue.h: alloc_ordered_workqueue() struct taskq *tq = taskq_create(name, 1, IPL_TTY, 0); workqueue.h: #define system_power_efficient_wq ((struct workqueue_struct *)systq) > > Please test. If you experience a "hang" with this diff, please try to > log in to the machine remotely over ssh and send me and jsg@ the > output of "ps -AHwlk". > > Thanks, > > Mark > > > Index: dev/pci/drm/drm_linux.c > === > RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v > retrieving revision 1.38 > diff -u -p -r1.38 drm_linux.c > --- dev/pci/drm/drm_linux.c 9 Jun 2019 12:58:30 - 1.38 > +++ dev/pci/drm/drm_linux.c 11 Jun 2019 18:54:38 - > @@ -1399,15 +1399,15 @@ drm_linux_init(void) > { > if (system_wq == NULL) { > system_wq = (struct workqueue_struct *) > - taskq_create("drmwq", 1, IPL_HIGH, 0); > + taskq_create("drmwq", 4, IPL_HIGH, 0); > } > if (system_unbound_wq == NULL) { > system_unbound_wq = (struct workqueue_struct *) > - taskq_create("drmubwq", 1, IPL_HIGH, 0); > + taskq_create("drmubwq", 4, IPL_HIGH, 0); > } > if (system_long_wq == NULL) { > system_long_wq = (struct workqueue_struct *) > - taskq_create("drmlwq", 1, IPL_HIGH, 0); > + taskq_create("drmlwq", 4, IPL_HIGH, 0); > } > > if (taskletq == NULL) > Index: dev/pci/drm/i915/intel_hotplug.c > === > RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_hotplug.c,v > retrieving revision 1.2 > diff -u -p -r1.2 intel_hotplug.c > --- dev/pci/drm/i915/intel_hotplug.c 14 Apr 2019 10:14:52 - 1.2 > +++ dev/pci/drm/i915/intel_hotplug.c 11 Jun 2019 18:54:38 - > @@ -619,7 +619,6 @@ void intel_hpd_init_work(struct drm_i915 > INIT_WORK(_priv->hotplug.hotplug_work, i915_hotplug_work_func); > INIT_WORK(_priv->hotplug.dig_port_work, i915_digport_work_func); > INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); > - dev_priv->hotplug.poll_init_work.tq = systq; > INIT_DELAYED_WORK(_priv->hotplug.reenable_work, > intel_hpd_irq_storm_reenable_work); > } > Index: kern/kern_task.c > === > RCS file: /cvs/src/sys/kern/kern_task.c,v > retrieving revision 1.25 > diff -u -p -r1.25 kern_task.c > --- kern/kern_task.c 28 Apr 2019 04:20:40 - 1.25 > +++ kern/kern_task.c 11 Jun 2019 18:54:39 - > @@ -43,6 +43,7 @@ struct taskq { > TQ_S_DESTROYED > }tq_state; > unsigned int tq_running; > + unsigned int tq_barrier; > unsigned int tq_nthreads; > unsigned int tq_flags; > const char *tq_name; > @@ -59,6 +60,7 @@ static const char taskq_sys_name[] = "sy > struct taskq taskq_sys = { > TQ_S_CREATED, > 0, > + 0, > 1, > 0, > taskq_sys_name, > @@ -77,6 +79,7 @@ static const char taskq_sys_mp_name[] = > struct taskq taskq_sys_mp = { > TQ_S_CREATED, > 0, > + 0, > 1, > TASKQ_MPSAFE, > taskq_sys_mp_name, > @@ -122,6 +125,7 @@ taskq_create(const char *name, unsigned > > tq->tq_state = TQ_S_CREATED; > tq->tq_running = 0; > + tq->tq_barrier = 0; > tq->tq_nthreads = nthreads; > tq->tq_name = name; > tq->tq_flags = flags; > @@ -161,6 +165,7 @@ taskq_destroy(struct taskq *tq) >