[PATCH] nouveau: Add irq waiting as alternative to busywait
Hey, Op 14-07-12 00:56, Maarten Maathuis schreef: > On Fri, Jul 13, 2012 at 11:35 PM, Maarten Lankhorst > wrote: >> A way to trigger an irq will be needed for optimus support since >> cpu-waiting isn't always viable there. This could also be nice for >> power saving on since cpu would no longer have to spin, and >> performance might improve slightly on cpu-limited workloads. >> >> Some way to quantify these effects would be nice, even if the >> end result would be 'no performance regression'. An earlier >> version always emitted an interrupt, resulting in glxgears going >> from 8k fps to 7k. However this is no longer the case, as I'm >> using the kernel submission channel for generating irqs as >> needed now. >> >> On nv84 I'm using NOTIFY_INTR, but that might have been >> removed on fermi, so instead I'm using invalid command >> 0x0058 now as a way to signal completion. > Out of curiosity, isn't this like a handcoded version of software > methods? If so, why handcoded? Or are software methods not supported > on NVC0? > I don't think there is a software engine, and if you look at the code only the kernel hardware channel will be allowed to raise a wake-up interrupt. On normal channels you'll get a invalid command in dmesg. On nv84 the interrupt will be eaten unless it originated from the kernel hw channel in which case things will be woken up, since it's a valid fifo command there. Either nvc0 and later dropped the support or I wasn't able to activate it during testing. ~Maarten
Re: [PATCH] nouveau: Add irq waiting as alternative to busywait
Hey, Op 14-07-12 00:56, Maarten Maathuis schreef: > On Fri, Jul 13, 2012 at 11:35 PM, Maarten Lankhorst > wrote: >> A way to trigger an irq will be needed for optimus support since >> cpu-waiting isn't always viable there. This could also be nice for >> power saving on since cpu would no longer have to spin, and >> performance might improve slightly on cpu-limited workloads. >> >> Some way to quantify these effects would be nice, even if the >> end result would be 'no performance regression'. An earlier >> version always emitted an interrupt, resulting in glxgears going >> from 8k fps to 7k. However this is no longer the case, as I'm >> using the kernel submission channel for generating irqs as >> needed now. >> >> On nv84 I'm using NOTIFY_INTR, but that might have been >> removed on fermi, so instead I'm using invalid command >> 0x0058 now as a way to signal completion. > Out of curiosity, isn't this like a handcoded version of software > methods? If so, why handcoded? Or are software methods not supported > on NVC0? > I don't think there is a software engine, and if you look at the code only the kernel hardware channel will be allowed to raise a wake-up interrupt. On normal channels you'll get a invalid command in dmesg. On nv84 the interrupt will be eaten unless it originated from the kernel hw channel in which case things will be woken up, since it's a valid fifo command there. Either nvc0 and later dropped the support or I wasn't able to activate it during testing. ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] nouveau: Add irq waiting as alternative to busywait
On Fri, Jul 13, 2012 at 11:35 PM, Maarten Lankhorst wrote: > A way to trigger an irq will be needed for optimus support since > cpu-waiting isn't always viable there. This could also be nice for > power saving on since cpu would no longer have to spin, and > performance might improve slightly on cpu-limited workloads. > > Some way to quantify these effects would be nice, even if the > end result would be 'no performance regression'. An earlier > version always emitted an interrupt, resulting in glxgears going > from 8k fps to 7k. However this is no longer the case, as I'm > using the kernel submission channel for generating irqs as > needed now. > > On nv84 I'm using NOTIFY_INTR, but that might have been > removed on fermi, so instead I'm using invalid command > 0x0058 now as a way to signal completion. Out of curiosity, isn't this like a handcoded version of software methods? If so, why handcoded? Or are software methods not supported on NVC0? > > Signed-off-by: Maarten Lankhorst > > --- > drivers/gpu/drm/nouveau/nouveau_drv.h |2 + > drivers/gpu/drm/nouveau/nouveau_fence.c | 49 > --- > drivers/gpu/drm/nouveau/nouveau_fifo.h |1 + > drivers/gpu/drm/nouveau/nouveau_state.c |1 + > drivers/gpu/drm/nouveau/nv04_fifo.c | 25 > drivers/gpu/drm/nouveau/nv84_fence.c| 18 +-- > drivers/gpu/drm/nouveau/nvc0_fence.c| 12 ++-- > drivers/gpu/drm/nouveau/nvc0_fifo.c |3 +- > drivers/gpu/drm/nouveau/nve0_fifo.c | 15 +++-- > 9 files changed, 110 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h > b/drivers/gpu/drm/nouveau/nouveau_drv.h > index f97a1a7..d9d274d 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drv.h > +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h > @@ -707,6 +707,7 @@ struct drm_nouveau_private { > struct drm_mm heap; > struct nouveau_bo *bo; > } fence; > + wait_queue_head_t fence_wq; > > struct { > spinlock_t lock; > @@ -1656,6 +1657,7 @@ nv44_graph_class(struct drm_device *dev) > #define NV84_SUBCHAN_WRCACHE_FLUSH > 0x0024 > #define NV10_SUBCHAN_REF_CNT > 0x0050 > #define NVSW_SUBCHAN_PAGE_FLIP > 0x0054 > +#define NVSW_SUBCHAN_FENCE_WAKE > 0x0058 > #define NV11_SUBCHAN_DMA_SEMAPHORE > 0x0060 > #define NV11_SUBCHAN_SEMAPHORE_OFFSET > 0x0064 > #define NV11_SUBCHAN_SEMAPHORE_ACQUIRE > 0x0068 > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c > b/drivers/gpu/drm/nouveau/nouveau_fence.c > index 3c18049..3ba8dee 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -68,7 +68,7 @@ nouveau_fence_update(struct nouveau_channel *chan) > > spin_lock(&fctx->lock); > list_for_each_entry_safe(fence, fnext, &fctx->pending, head) { > - if (priv->read(chan) < fence->sequence) > + if (priv->read(chan) - fence->sequence >= 0x8000U) > break; > > if (fence->work) > @@ -111,11 +111,9 @@ nouveau_fence_done(struct nouveau_fence *fence) > return !fence->channel; > } > > -int > -nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) > +static int nouveau_fence_wait_busy(struct nouveau_fence *fence, bool lazy, > bool intr) > { > unsigned long sleep_time = NSEC_PER_MSEC / 1000; > - ktime_t t; > int ret = 0; > > while (!nouveau_fence_done(fence)) { > @@ -127,7 +125,7 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool > lazy, bool intr) > __set_current_state(intr ? TASK_INTERRUPTIBLE : >TASK_UNINTERRUPTIBLE); > if (lazy) { > - t = ktime_set(0, sleep_time); > + ktime_t t = ktime_set(0, sleep_time); > schedule_hrtimeout(&t, HRTIMER_MODE_REL); > sleep_time *= 2; > if (sleep_time > NSEC_PER_MSEC) > @@ -144,6 +142,47 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool > lazy, bool intr) > return ret; > } > > +static int nouveau_fence_wait_event(struct nouveau_fence *fence, bool intr) > +{ > + struct drm_nouveau_private *dev_priv = > fence->channel->dev->dev_private; > + unsigned long timeout = fence->timeout; > + int ret = 0; > + struct nouveau_channel *chan = dev_priv->channel; > + struct nouveau_channel *prev = fence->channel; > + struct nouveau_fence_priv *priv = nv_engine(chan->dev, > NVOBJ_ENGINE_FENCE); > + > + if (nouveau_fence_done(fence)) > + return 0; > + > + if
[PATCH] nouveau: Add irq waiting as alternative to busywait
A way to trigger an irq will be needed for optimus support since cpu-waiting isn't always viable there. This could also be nice for power saving on since cpu would no longer have to spin, and performance might improve slightly on cpu-limited workloads. Some way to quantify these effects would be nice, even if the end result would be 'no performance regression'. An earlier version always emitted an interrupt, resulting in glxgears going from 8k fps to 7k. However this is no longer the case, as I'm using the kernel submission channel for generating irqs as needed now. On nv84 I'm using NOTIFY_INTR, but that might have been removed on fermi, so instead I'm using invalid command 0x0058 now as a way to signal completion. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_drv.h |2 + drivers/gpu/drm/nouveau/nouveau_fence.c | 49 --- drivers/gpu/drm/nouveau/nouveau_fifo.h |1 + drivers/gpu/drm/nouveau/nouveau_state.c |1 + drivers/gpu/drm/nouveau/nv04_fifo.c | 25 drivers/gpu/drm/nouveau/nv84_fence.c| 18 +-- drivers/gpu/drm/nouveau/nvc0_fence.c| 12 ++-- drivers/gpu/drm/nouveau/nvc0_fifo.c |3 +- drivers/gpu/drm/nouveau/nve0_fifo.c | 15 +++-- 9 files changed, 110 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index f97a1a7..d9d274d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -707,6 +707,7 @@ struct drm_nouveau_private { struct drm_mm heap; struct nouveau_bo *bo; } fence; + wait_queue_head_t fence_wq; struct { spinlock_t lock; @@ -1656,6 +1657,7 @@ nv44_graph_class(struct drm_device *dev) #define NV84_SUBCHAN_WRCACHE_FLUSH 0x0024 #define NV10_SUBCHAN_REF_CNT 0x0050 #define NVSW_SUBCHAN_PAGE_FLIP 0x0054 +#define NVSW_SUBCHAN_FENCE_WAKE 0x0058 #define NV11_SUBCHAN_DMA_SEMAPHORE 0x0060 #define NV11_SUBCHAN_SEMAPHORE_OFFSET0x0064 #define NV11_SUBCHAN_SEMAPHORE_ACQUIRE 0x0068 diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 3c18049..3ba8dee 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -68,7 +68,7 @@ nouveau_fence_update(struct nouveau_channel *chan) spin_lock(&fctx->lock); list_for_each_entry_safe(fence, fnext, &fctx->pending, head) { - if (priv->read(chan) < fence->sequence) + if (priv->read(chan) - fence->sequence >= 0x8000U) break; if (fence->work) @@ -111,11 +111,9 @@ nouveau_fence_done(struct nouveau_fence *fence) return !fence->channel; } -int -nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) +static int nouveau_fence_wait_busy(struct nouveau_fence *fence, bool lazy, bool intr) { unsigned long sleep_time = NSEC_PER_MSEC / 1000; - ktime_t t; int ret = 0; while (!nouveau_fence_done(fence)) { @@ -127,7 +125,7 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) __set_current_state(intr ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE); if (lazy) { - t = ktime_set(0, sleep_time); + ktime_t t = ktime_set(0, sleep_time); schedule_hrtimeout(&t, HRTIMER_MODE_REL); sleep_time *= 2; if (sleep_time > NSEC_PER_MSEC) @@ -144,6 +142,47 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) return ret; } +static int nouveau_fence_wait_event(struct nouveau_fence *fence, bool intr) +{ + struct drm_nouveau_private *dev_priv = fence->channel->dev->dev_private; + unsigned long timeout = fence->timeout; + int ret = 0; + struct nouveau_channel *chan = dev_priv->channel; + struct nouveau_channel *prev = fence->channel; + struct nouveau_fence_priv *priv = nv_engine(chan->dev, NVOBJ_ENGINE_FENCE); + + if (nouveau_fence_done(fence)) + return 0; + + if (!timeout) + timeout = jiffies + 3 * DRM_HZ; + + if (prev != chan) + ret = priv->sync(fence, prev, chan); + if (ret) + goto busy; + + if (intr) + ret = wait_event_interruptible_timeout(dev_priv->fence_wq, nouveau_fence_done(fence), timeout); + else + ret = wait_event_timeout(dev_priv->fence_wq, nouveau_fence_done(fence), timeout); + +
Re: [PATCH] nouveau: Add irq waiting as alternative to busywait
On Fri, Jul 13, 2012 at 11:35 PM, Maarten Lankhorst wrote: > A way to trigger an irq will be needed for optimus support since > cpu-waiting isn't always viable there. This could also be nice for > power saving on since cpu would no longer have to spin, and > performance might improve slightly on cpu-limited workloads. > > Some way to quantify these effects would be nice, even if the > end result would be 'no performance regression'. An earlier > version always emitted an interrupt, resulting in glxgears going > from 8k fps to 7k. However this is no longer the case, as I'm > using the kernel submission channel for generating irqs as > needed now. > > On nv84 I'm using NOTIFY_INTR, but that might have been > removed on fermi, so instead I'm using invalid command > 0x0058 now as a way to signal completion. Out of curiosity, isn't this like a handcoded version of software methods? If so, why handcoded? Or are software methods not supported on NVC0? > > Signed-off-by: Maarten Lankhorst > > --- > drivers/gpu/drm/nouveau/nouveau_drv.h |2 + > drivers/gpu/drm/nouveau/nouveau_fence.c | 49 > --- > drivers/gpu/drm/nouveau/nouveau_fifo.h |1 + > drivers/gpu/drm/nouveau/nouveau_state.c |1 + > drivers/gpu/drm/nouveau/nv04_fifo.c | 25 > drivers/gpu/drm/nouveau/nv84_fence.c| 18 +-- > drivers/gpu/drm/nouveau/nvc0_fence.c| 12 ++-- > drivers/gpu/drm/nouveau/nvc0_fifo.c |3 +- > drivers/gpu/drm/nouveau/nve0_fifo.c | 15 +++-- > 9 files changed, 110 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h > b/drivers/gpu/drm/nouveau/nouveau_drv.h > index f97a1a7..d9d274d 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drv.h > +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h > @@ -707,6 +707,7 @@ struct drm_nouveau_private { > struct drm_mm heap; > struct nouveau_bo *bo; > } fence; > + wait_queue_head_t fence_wq; > > struct { > spinlock_t lock; > @@ -1656,6 +1657,7 @@ nv44_graph_class(struct drm_device *dev) > #define NV84_SUBCHAN_WRCACHE_FLUSH > 0x0024 > #define NV10_SUBCHAN_REF_CNT > 0x0050 > #define NVSW_SUBCHAN_PAGE_FLIP > 0x0054 > +#define NVSW_SUBCHAN_FENCE_WAKE > 0x0058 > #define NV11_SUBCHAN_DMA_SEMAPHORE > 0x0060 > #define NV11_SUBCHAN_SEMAPHORE_OFFSET > 0x0064 > #define NV11_SUBCHAN_SEMAPHORE_ACQUIRE > 0x0068 > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c > b/drivers/gpu/drm/nouveau/nouveau_fence.c > index 3c18049..3ba8dee 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -68,7 +68,7 @@ nouveau_fence_update(struct nouveau_channel *chan) > > spin_lock(&fctx->lock); > list_for_each_entry_safe(fence, fnext, &fctx->pending, head) { > - if (priv->read(chan) < fence->sequence) > + if (priv->read(chan) - fence->sequence >= 0x8000U) > break; > > if (fence->work) > @@ -111,11 +111,9 @@ nouveau_fence_done(struct nouveau_fence *fence) > return !fence->channel; > } > > -int > -nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) > +static int nouveau_fence_wait_busy(struct nouveau_fence *fence, bool lazy, > bool intr) > { > unsigned long sleep_time = NSEC_PER_MSEC / 1000; > - ktime_t t; > int ret = 0; > > while (!nouveau_fence_done(fence)) { > @@ -127,7 +125,7 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool > lazy, bool intr) > __set_current_state(intr ? TASK_INTERRUPTIBLE : >TASK_UNINTERRUPTIBLE); > if (lazy) { > - t = ktime_set(0, sleep_time); > + ktime_t t = ktime_set(0, sleep_time); > schedule_hrtimeout(&t, HRTIMER_MODE_REL); > sleep_time *= 2; > if (sleep_time > NSEC_PER_MSEC) > @@ -144,6 +142,47 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool > lazy, bool intr) > return ret; > } > > +static int nouveau_fence_wait_event(struct nouveau_fence *fence, bool intr) > +{ > + struct drm_nouveau_private *dev_priv = > fence->channel->dev->dev_private; > + unsigned long timeout = fence->timeout; > + int ret = 0; > + struct nouveau_channel *chan = dev_priv->channel; > + struct nouveau_channel *prev = fence->channel; > + struct nouveau_fence_priv *priv = nv_engine(chan->dev, > NVOBJ_ENGINE_FENCE); > + > + if (nouveau_fence_done(fence)) > + return 0; > + > + if
[PATCH] nouveau: Add irq waiting as alternative to busywait
A way to trigger an irq will be needed for optimus support since cpu-waiting isn't always viable there. This could also be nice for power saving on since cpu would no longer have to spin, and performance might improve slightly on cpu-limited workloads. Some way to quantify these effects would be nice, even if the end result would be 'no performance regression'. An earlier version always emitted an interrupt, resulting in glxgears going from 8k fps to 7k. However this is no longer the case, as I'm using the kernel submission channel for generating irqs as needed now. On nv84 I'm using NOTIFY_INTR, but that might have been removed on fermi, so instead I'm using invalid command 0x0058 now as a way to signal completion. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_drv.h |2 + drivers/gpu/drm/nouveau/nouveau_fence.c | 49 --- drivers/gpu/drm/nouveau/nouveau_fifo.h |1 + drivers/gpu/drm/nouveau/nouveau_state.c |1 + drivers/gpu/drm/nouveau/nv04_fifo.c | 25 drivers/gpu/drm/nouveau/nv84_fence.c| 18 +-- drivers/gpu/drm/nouveau/nvc0_fence.c| 12 ++-- drivers/gpu/drm/nouveau/nvc0_fifo.c |3 +- drivers/gpu/drm/nouveau/nve0_fifo.c | 15 +++-- 9 files changed, 110 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index f97a1a7..d9d274d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -707,6 +707,7 @@ struct drm_nouveau_private { struct drm_mm heap; struct nouveau_bo *bo; } fence; + wait_queue_head_t fence_wq; struct { spinlock_t lock; @@ -1656,6 +1657,7 @@ nv44_graph_class(struct drm_device *dev) #define NV84_SUBCHAN_WRCACHE_FLUSH 0x0024 #define NV10_SUBCHAN_REF_CNT 0x0050 #define NVSW_SUBCHAN_PAGE_FLIP 0x0054 +#define NVSW_SUBCHAN_FENCE_WAKE 0x0058 #define NV11_SUBCHAN_DMA_SEMAPHORE 0x0060 #define NV11_SUBCHAN_SEMAPHORE_OFFSET0x0064 #define NV11_SUBCHAN_SEMAPHORE_ACQUIRE 0x0068 diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 3c18049..3ba8dee 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -68,7 +68,7 @@ nouveau_fence_update(struct nouveau_channel *chan) spin_lock(&fctx->lock); list_for_each_entry_safe(fence, fnext, &fctx->pending, head) { - if (priv->read(chan) < fence->sequence) + if (priv->read(chan) - fence->sequence >= 0x8000U) break; if (fence->work) @@ -111,11 +111,9 @@ nouveau_fence_done(struct nouveau_fence *fence) return !fence->channel; } -int -nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) +static int nouveau_fence_wait_busy(struct nouveau_fence *fence, bool lazy, bool intr) { unsigned long sleep_time = NSEC_PER_MSEC / 1000; - ktime_t t; int ret = 0; while (!nouveau_fence_done(fence)) { @@ -127,7 +125,7 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) __set_current_state(intr ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE); if (lazy) { - t = ktime_set(0, sleep_time); + ktime_t t = ktime_set(0, sleep_time); schedule_hrtimeout(&t, HRTIMER_MODE_REL); sleep_time *= 2; if (sleep_time > NSEC_PER_MSEC) @@ -144,6 +142,47 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) return ret; } +static int nouveau_fence_wait_event(struct nouveau_fence *fence, bool intr) +{ + struct drm_nouveau_private *dev_priv = fence->channel->dev->dev_private; + unsigned long timeout = fence->timeout; + int ret = 0; + struct nouveau_channel *chan = dev_priv->channel; + struct nouveau_channel *prev = fence->channel; + struct nouveau_fence_priv *priv = nv_engine(chan->dev, NVOBJ_ENGINE_FENCE); + + if (nouveau_fence_done(fence)) + return 0; + + if (!timeout) + timeout = jiffies + 3 * DRM_HZ; + + if (prev != chan) + ret = priv->sync(fence, prev, chan); + if (ret) + goto busy; + + if (intr) + ret = wait_event_interruptible_timeout(dev_priv->fence_wq, nouveau_fence_done(fence), timeout); + else + ret = wait_event_timeout(dev_priv->fence_wq, nouveau_fence_done(fence), timeout); + +