Re: [libav-devel] [PATCH] pthread_frame: use better memory orders for frame progress
Quoting Wan-Teh Chang (2016-12-09 18:54:47) > This improves commit 59c70227405c214b29971e6272f3a3ff6fcce3d0. > > In ff_thread_report_progress(), the fast code path can load > progress[field] with the relaxed memory order, and the slow code path > can store progress[field] with the release memory order. These changes > are mainly intended to avoid confusion when one inspects the source code. > They are unlikely to have measurable performance improvement. > > ff_thread_report_progress() and ff_thread_await_progress() form a pair. > ff_thread_await_progress() reads progress[field] with the acquire memory > order (in the fast code path). Therefore, one expects to see > ff_thread_report_progress() write progress[field] with the matching > release memory order. > > In the fast code path in ff_thread_report_progress(), the atomic load of > progress[field] doesn't need the acquire memory order because the > calling thread is trying to make the data it just decoded visible to the > other threads, rather than trying to read the data decoded by other > threads. > > In ff_thread_get_buffer(), initialize progress[0] and progress[1] using > atomic_init(). > > Signed-off-by: Wan-Teh Chang> --- > libavcodec/pthread_frame.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) Pushed, thanks. -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] pthread_frame: use better memory orders for frame progress
On Fri, Dec 9, 2016 at 10:22 AM, Sean McGovernwrote: > Hi Wan-Teh, > [...] > Does this by any chance fix https://bugzilla.libav.org/show_bug.cgi?id=930 ? Hi Sean, No. My patch is completely unrelated to that bug. That bug is a crash caused by a null pointer dereference. The crash occurs when preparing the first argument for ff_thread_report_progress(). I believe the process didn't enter the ff_thread_report_progress() function, even though ff_thread_report_progress() is in the callback. Wan-Teh Chang ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] pthread_frame: use better memory orders for frame progress
Hi Wan-Teh, On Dec 9, 2016 12:55, "Wan-Teh Chang"wrote: This improves commit 59c70227405c214b29971e6272f3a3ff6fcce3d0. In ff_thread_report_progress(), the fast code path can load progress[field] with the relaxed memory order, and the slow code path can store progress[field] with the release memory order. These changes are mainly intended to avoid confusion when one inspects the source code. They are unlikely to have measurable performance improvement. ff_thread_report_progress() and ff_thread_await_progress() form a pair. ff_thread_await_progress() reads progress[field] with the acquire memory order (in the fast code path). Therefore, one expects to see ff_thread_report_progress() write progress[field] with the matching release memory order. In the fast code path in ff_thread_report_progress(), the atomic load of progress[field] doesn't need the acquire memory order because the calling thread is trying to make the data it just decoded visible to the other threads, rather than trying to read the data decoded by other threads. In ff_thread_get_buffer(), initialize progress[0] and progress[1] using atomic_init(). Signed-off-by: Wan-Teh Chang --- libavcodec/pthread_frame.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c index 2736a81..142eaa5 100644 --- a/libavcodec/pthread_frame.c +++ b/libavcodec/pthread_frame.c @@ -458,7 +458,7 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field) atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : NULL; if (!progress || -atomic_load_explicit([field], memory_order_acquire) >= n) +atomic_load_explicit([field], memory_order_relaxed) >= n) return; p = f->owner->internal->thread_ctx; @@ -468,7 +468,7 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int field) pthread_mutex_lock(>progress_mutex); -atomic_store([field], n); +atomic_store_explicit([field], n, memory_order_release); pthread_cond_broadcast(>progress_cond); pthread_mutex_unlock(>progress_mutex); @@ -745,8 +745,8 @@ int ff_thread_get_buffer(AVCodecContext *avctx, ThreadFrame *f, int flags) } progress = (atomic_int*)f->progress->data; -atomic_store([0], -1); -atomic_store([1], -1); +atomic_init([0], -1); +atomic_init([1], -1); } pthread_mutex_lock(>parent->buffer_mutex); -- 2.8.0.rc3.226.g39d4020 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel Does this by any chance fix https://bugzilla.libav.org/show_bug.cgi?id=930 ? Sean McGovern - +1 613 862 6507 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] pthread_frame: use better memory orders for frame progress
Hi Luca, I will update the patch with the longer description. On Wed, Dec 7, 2016 at 5:42 PM, Luca Barbatowrote: > > Assuming it is not causing noticeable slowdowns/speedups (a benchmark > would be welcome), I'm all for consistency. I assume that initializing progress[0] and progress[1] with atomic_init() instead of atomic_store() is not controversial. The changes to ff_thread_report_progress() are unlikely to cause noticeable slowdowns/speedups. In this kind of situation I often inspect the code generated by the compiler to evaluate the changes. In this analysis I use clang 3.8.0, targeting x86_64, at optimization level -O3. (I admit a compiler targeting ARM would be better.) The first change in ff_thread_report_progress(), to the atomic_load_explicit() call, results in no change in the generated code. The second change in ff_thread_report_progress(), to the atomic_store(), changes an xchgl instruction to a movl instruction. Here is the diff between the generated code (original vs. modified) of the end of ff_thread_report_progress(). The original C source code is: 469 pthread_mutex_lock(>progress_mutex); 470 471 atomic_store([field], n); 472 473 pthread_cond_broadcast(>progress_cond); 474 pthread_mutex_unlock(>progress_mutex); 475 } I change line 471 to: 471 atomic_store_explicit([field], n, memory_order_release); The diff between generated code is: == .Ltmp154: .LBB2_5: #DEBUG_VALUE: ff_thread_report_progress:field <- %R9D #DEBUG_VALUE: ff_thread_report_progress:n <- %EBP #DEBUG_VALUE: ff_thread_report_progress:progress <- %RBX #DEBUG_VALUE: ff_thread_report_progress:p <- %R14 .loc6 469 28# libavcodec/pthread_frame.c:469:28 leaq208(%r14), %r15 .loc6 469 5 is_stmt 0 # libavcodec/pthread_frame.c:469:5 movq%r15, %rdi callq pthread_mutex_lock .loc6 471 5 is_stmt 1 # libavcodec/pthread_frame.c:471:5 - xchgl %ebp, (%rbx,%r12,4) -.Ltmp155: + movl%ebp, (%rbx,%r12,4) .loc6 473 32# libavcodec/pthread_frame.c:473:32 addq$72, %r14 -.Ltmp156: +.Ltmp155: .loc6 473 5 is_stmt 0 # libavcodec/pthread_frame.c:473:5 movq%r14, %rdi callq pthread_cond_broadcast .loc6 474 5 is_stmt 1 # libavcodec/pthread_frame.c:474:5 movq%r15, %rdi popq%rbx -.Ltmp157: +.Ltmp156: popq%r12 popq%r14 popq%r15 popq%rbp +.Ltmp157: jmp pthread_mutex_unlock# TAILCALL .Ltmp158: .LBB2_6:# %.thread #DEBUG_VALUE: ff_thread_report_progress:n <- %EBP #DEBUG_VALUE: ff_thread_report_progress:field <- %R9D #DEBUG_VALUE: ff_thread_report_progress:f <- %RDI .loc6 475 1 discriminator 2 # libavcodec/pthread_frame.c:475:1 popq%rbx popq%r12 popq%r14 popq%r15 popq%rbp .Ltmp159: retq .Ltmp160: .Lfunc_end2: .size ff_thread_report_progress, .Lfunc_end2-ff_thread_report_progress .cfi_endproc == Wan-Teh Chang ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] pthread_frame: use better memory orders for frame progress
On 30/11/2016 23:13, Wan-Teh Chang wrote: > This patch is intended to avoid confusion when one inspects the source code. > > ff_thread_report_progress() and ff_thread_await_progress() form a > pair. ff_thread_await_progress() reads progress[field] with the > "acquire" memory order (in the fast code path). Therefore, one expects > to see ff_thread_report_progress() write progress[field] with the > matching "release" memory order. > > In the fast code path in ff_thread_report_progress(), the atomic load > of progress[field] doesn't need the "acquire" memory order because the > calling thread is trying to make the data it just decoded visible to > the other threads, rather than trying to read the data decoded by > other threads. > > Please let me know if you have other questions. Assuming it is not causing noticeable slowdowns/speedups (a benchmark would be welcome), I'm all for consistency. Would be nice if you update the patch with this longer description though. Anton, do you agree? lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] pthread_frame: use better memory orders for frame progress
Hi Anton, On Wed, Nov 30, 2016 at 1:03 PM, Anton Khirnovwrote: > > Did you see any measurable change in performance after this patch? This patch is intended to avoid confusion when one inspects the source code. ff_thread_report_progress() and ff_thread_await_progress() form a pair. ff_thread_await_progress() reads progress[field] with the "acquire" memory order (in the fast code path). Therefore, one expects to see ff_thread_report_progress() write progress[field] with the matching "release" memory order. In the fast code path in ff_thread_report_progress(), the atomic load of progress[field] doesn't need the "acquire" memory order because the calling thread is trying to make the data it just decoded visible to the other threads, rather than trying to read the data decoded by other threads. Please let me know if you have other questions. Thanks, Wan-Teh Chang ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] pthread_frame: use better memory orders for frame progress
Quoting Wan-Teh Chang (2016-11-30 21:33:37) > This improves commit 59c70227405c214b29971e6272f3a3ff6fcce3d0. > > In ff_thread_report_progress(), the fast code path can load > progress[field] with the relaxed memory order, and the slow code path > can store progress[field] with the release memory order. > > In ff_thread_get_buffer(), initialize progress[0] and progress[1] using > atomic_init(). > > Signed-off-by: Wan-Teh Chang> --- > libavcodec/pthread_frame.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c > index 2736a81..142eaa5 100644 > --- a/libavcodec/pthread_frame.c > +++ b/libavcodec/pthread_frame.c > @@ -458,7 +458,7 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int > field) > atomic_int *progress = f->progress ? (atomic_int*)f->progress->data : > NULL; > > if (!progress || > -atomic_load_explicit([field], memory_order_acquire) >= n) > +atomic_load_explicit([field], memory_order_relaxed) >= n) > return; > > p = f->owner->internal->thread_ctx; > @@ -468,7 +468,7 @@ void ff_thread_report_progress(ThreadFrame *f, int n, int > field) > > pthread_mutex_lock(>progress_mutex); > > -atomic_store([field], n); > +atomic_store_explicit([field], n, memory_order_release); > > pthread_cond_broadcast(>progress_cond); > pthread_mutex_unlock(>progress_mutex); > @@ -745,8 +745,8 @@ int ff_thread_get_buffer(AVCodecContext *avctx, > ThreadFrame *f, int flags) > } > progress = (atomic_int*)f->progress->data; > > -atomic_store([0], -1); > -atomic_store([1], -1); > +atomic_init([0], -1); > +atomic_init([1], -1); > } > > pthread_mutex_lock(>parent->buffer_mutex); > -- > 2.8.0.rc3.226.g39d4020 Did you see any measurable change in performance after this patch? I used to have lot more explicit memory orders in my original version of the patch, but then I dropped most of them, since I thought they were unlikely to have any significant effect and just added a chance of hard to detect racy bugs. -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel