Re: [Intel-gfx] [PATCH 3/4] drm/i915: peel dma-fence-chains wait fences
On 20/04/11 11:50, Lionel Landwerlin wrote: > On 10/04/2020 19:51, Venkata Sandeep Dhanalakota wrote: > > From: Lionel Landwerlin > > > > To allow faster engine to engine synchronization, peel the layer of > > dma-fence-chain to expose potential i915 fences so that the > > i915-request code can emit HW semaphore wait/signal operations in the > > ring which is faster than waking up the host to submit unblocked > > workloads after interrupt notification. > > > > Signed-off-by: Lionel Landwerlin > > --- > > .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 39 +-- > > 1 file changed, 35 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > index 8dd651cdca39..e43b76d7e9fd 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > @@ -2508,6 +2508,7 @@ await_fence_array(struct i915_execbuffer *eb, > > for (n = 0; n < nfences; n++) { > > struct drm_syncobj *syncobj; > > + struct dma_fence_chain *chain; > > unsigned int flags; > > syncobj = ptr_unpack_bits(fences[n].syncobj, , 2); > > @@ -2515,10 +2516,40 @@ await_fence_array(struct i915_execbuffer *eb, > > if (!fences[n].dma_fence) > > continue; > > - err = i915_request_await_dma_fence(eb->request, > > - fences[n].dma_fence); > > - if (err < 0) > > - return err; > > + /* > > +* If we're dealing with a dma-fence-chain, peel the chain by > > +* adding all of the unsignaled fences > > +* (dma_fence_chain_for_each does that for us) the chain > > +* points to. > > +* > > +* This enables us to identify waits on i915 fences and allows > > +* for faster engine-to-engine synchronization using HW > > +* semaphores. > > +*/ > > + chain = to_dma_fence_chain(fences[n].dma_fence); > > + if (chain) { > > + struct dma_fence *iter; > > + > > + dma_fence_chain_for_each(iter, fences[n].dma_fence) { > > > The kbuild bot made me think of an interesting case. > > It is possible to build a chain where the first element isn't a > dma_fence_chain. > Yes agreed, we could have a valid fence-chain with first element as normal dma_fence and so iter_chain can be null. Will address this in next revision of the patch. > > We should handle this here like this : > > > if (iter_chain) > > err = i915_request_await_dma_fence(eb->request, iter_chain->fence); > > else > > err = i915_request_await_dma_fence(eb->request, iter); > > if (err < 0) { > > dma_fence_put(iter); > > return err; > > } > > > > + struct dma_fence_chain *iter_chain = > > + to_dma_fence_chain(iter); > > + > > + GEM_BUG_ON(!iter_chain); > > + > > + err = i915_request_await_dma_fence(eb->request, > > + > > iter_chain->fence); > > + if (err < 0) { > > + dma_fence_put(iter); > > + return err; > > + } > > + } > > + > > + } else { > > + err = i915_request_await_dma_fence(eb->request, > > + fences[n].dma_fence); > > + if (err < 0) > > + return err; > > + } > > } > > return 0; > > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/4] drm/i915: peel dma-fence-chains wait fences
On 10/04/2020 19:51, Venkata Sandeep Dhanalakota wrote: From: Lionel Landwerlin To allow faster engine to engine synchronization, peel the layer of dma-fence-chain to expose potential i915 fences so that the i915-request code can emit HW semaphore wait/signal operations in the ring which is faster than waking up the host to submit unblocked workloads after interrupt notification. Signed-off-by: Lionel Landwerlin --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 39 +-- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 8dd651cdca39..e43b76d7e9fd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2508,6 +2508,7 @@ await_fence_array(struct i915_execbuffer *eb, for (n = 0; n < nfences; n++) { struct drm_syncobj *syncobj; + struct dma_fence_chain *chain; unsigned int flags; syncobj = ptr_unpack_bits(fences[n].syncobj, , 2); @@ -2515,10 +2516,40 @@ await_fence_array(struct i915_execbuffer *eb, if (!fences[n].dma_fence) continue; - err = i915_request_await_dma_fence(eb->request, - fences[n].dma_fence); - if (err < 0) - return err; + /* +* If we're dealing with a dma-fence-chain, peel the chain by +* adding all of the unsignaled fences +* (dma_fence_chain_for_each does that for us) the chain +* points to. +* +* This enables us to identify waits on i915 fences and allows +* for faster engine-to-engine synchronization using HW +* semaphores. +*/ + chain = to_dma_fence_chain(fences[n].dma_fence); + if (chain) { + struct dma_fence *iter; + + dma_fence_chain_for_each(iter, fences[n].dma_fence) { The kbuild bot made me think of an interesting case. It is possible to build a chain where the first element isn't a dma_fence_chain. We should handle this here like this : if (iter_chain) err = i915_request_await_dma_fence(eb->request, iter_chain->fence); else err = i915_request_await_dma_fence(eb->request, iter); if (err < 0) { dma_fence_put(iter); return err; } + struct dma_fence_chain *iter_chain = + to_dma_fence_chain(iter); + + GEM_BUG_ON(!iter_chain); + + err = i915_request_await_dma_fence(eb->request, + iter_chain->fence); + if (err < 0) { + dma_fence_put(iter); + return err; + } + } + + } else { + err = i915_request_await_dma_fence(eb->request, + fences[n].dma_fence); + if (err < 0) + return err; + } } return 0; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/4] drm/i915: peel dma-fence-chains wait fences
Hi Venkata, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on drm-tip/drm-tip] [cannot apply to drm-intel/for-linux-next linus/master v5.6 next-20200410] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Venkata-Sandeep-Dhanalakota/drm-i915-introduce-a-mechanism-to-extend-execbuf2/20200411-031057 base: git://anongit.freedesktop.org/drm/drm-tip drm-tip If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot cppcheck warnings: (new ones prefixed by >>) >> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2550:12: warning: Either the >> condition '!iter_chain' is redundant or there is possible null pointer >> dereference: iter_chain. [nullPointerRedundantCheck] iter_chain->fence); ^ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2547:16: note: Assuming that condition '!iter_chain' is not redundant GEM_BUG_ON(!iter_chain); ^ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2545:24: note: Assignment 'iter_chain=to_dma_fence_chain(iter)', assigned value is 0 to_dma_fence_chain(iter); ^ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2550:12: note: Null pointer dereference iter_chain->fence); ^ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1423:6: warning: The scope of the variable 'err' can be reduced. [variableScope] int err; ^ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3158:1: warning: Label 'end_user' is not used. [unusedLabel] end_user: ^ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1346:21: warning: Clarify calculation precedence for '&' and '?'. [clarifyCalculation] len = offset & 7 ? 8 : 5; ^ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1401:24: warning: 'vaddr' is of type 'void *'. When using void pointers in calculations, the behaviour is undefined. [arithOperationsOnVoidPointer] clflush_write32(vaddr + offset_in_page(offset), ^ vim +2550 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 2510 2511 static int 2512 await_fence_array(struct i915_execbuffer *eb, 2513struct i915_eb_fences *fences, 2514int nfences) 2515 { 2516 unsigned int n; 2517 int err; 2518 2519 for (n = 0; n < nfences; n++) { 2520 struct drm_syncobj *syncobj; 2521 struct dma_fence_chain *chain; 2522 unsigned int flags; 2523 2524 syncobj = ptr_unpack_bits(fences[n].syncobj, , 2); 2525 2526 if (!fences[n].dma_fence) 2527 continue; 2528 2529 /* 2530 * If we're dealing with a dma-fence-chain, peel the chain by 2531 * adding all of the unsignaled fences 2532 * (dma_fence_chain_for_each does that for us) the chain 2533 * points to. 2534 * 2535 * This enables us to identify waits on i915 fences and allows 2536 * for faster engine-to-engine synchronization using HW 2537 * semaphores. 2538 */ 2539 chain = to_dma_fence_chain(fences[n].dma_fence); 2540 if (chain) { 2541 struct dma_fence *iter; 2542 2543 dma_fence_chain_for_each(iter, fences[n].dma_fence) { 2544 struct dma_fence_chain *iter_chain = 2545 to_dma_fence_chain(iter); 2546 2547 GEM_BUG_ON(!iter_chain); 2548 2549 err = i915_request_await_dma_fence(eb->request, > 2550 > iter_chain->fence); 2551 if (err < 0) { 2552 dma_fence_put(iter); 2553 return err; 2554 } 2555 } 2556 2557 } else { 2558 err = i915_request_await_dma_fence(eb->request, 2559 fences[n].dma_fence); 2560 if (err < 0) 2561 return err; 2562 } 2563 } 2564 2565 return 0; 2566 } 2567 --- 0-DAY CI Kernel Test Service, Intel Corporation
[Intel-gfx] [PATCH 3/4] drm/i915: peel dma-fence-chains wait fences
From: Lionel Landwerlin To allow faster engine to engine synchronization, peel the layer of dma-fence-chain to expose potential i915 fences so that the i915-request code can emit HW semaphore wait/signal operations in the ring which is faster than waking up the host to submit unblocked workloads after interrupt notification. Signed-off-by: Lionel Landwerlin --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 39 +-- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 8dd651cdca39..e43b76d7e9fd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2508,6 +2508,7 @@ await_fence_array(struct i915_execbuffer *eb, for (n = 0; n < nfences; n++) { struct drm_syncobj *syncobj; + struct dma_fence_chain *chain; unsigned int flags; syncobj = ptr_unpack_bits(fences[n].syncobj, , 2); @@ -2515,10 +2516,40 @@ await_fence_array(struct i915_execbuffer *eb, if (!fences[n].dma_fence) continue; - err = i915_request_await_dma_fence(eb->request, - fences[n].dma_fence); - if (err < 0) - return err; + /* +* If we're dealing with a dma-fence-chain, peel the chain by +* adding all of the unsignaled fences +* (dma_fence_chain_for_each does that for us) the chain +* points to. +* +* This enables us to identify waits on i915 fences and allows +* for faster engine-to-engine synchronization using HW +* semaphores. +*/ + chain = to_dma_fence_chain(fences[n].dma_fence); + if (chain) { + struct dma_fence *iter; + + dma_fence_chain_for_each(iter, fences[n].dma_fence) { + struct dma_fence_chain *iter_chain = + to_dma_fence_chain(iter); + + GEM_BUG_ON(!iter_chain); + + err = i915_request_await_dma_fence(eb->request, + iter_chain->fence); + if (err < 0) { + dma_fence_put(iter); + return err; + } + } + + } else { + err = i915_request_await_dma_fence(eb->request, + fences[n].dma_fence); + if (err < 0) + return err; + } } return 0; -- 2.21.0.5.gaeb582a983 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx