[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

Marek Olšák  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #50 from Marek Olšák  ---
Fixed by:

commit b687f766fddb7b39479cd9ee0427984029ea3559
Author: Marek Olšák 
Date:   Tue Oct 25 13:10:49 2016 +0200

st/mesa: allow multiple concurrent waiters in ClientWaitSync


Feel free to re-open the bug if you encounter these issues again.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #49 from Suzuki, Shinji  ---
(In reply to Marek Olšák from comment #47)
> What was wrong with the initial mutex idea? I think the solution in comment
> 22 is sufficient to close this bug.
I did not like added storage overhead of containing a mutex and CPU overhead of
initialization and destruction of the mutex even for single threaded execution.
Also I wonder if the lock needed to be held while so->fence is inspected &
duplicated. (Per sync-object mutex poses less contention than using
ctx->Shared.Mutex therefore this point is not worth bothering about.)
However I can't see anything wrong with the fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #48 from Marek Olšák  ---
I've sent my patches that fix this to mesa-dev.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #47 from Marek Olšák  ---
What was wrong with the initial mutex idea? I think the solution in comment 22
is sufficient to close this bug.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #46 from Michel Dänzer  ---
(In reply to Suzuki, Shinji from comment #44)
> >=> r600_fence_reference and st_client_wait_sync continue working
> >   with the fence object memory, which was already freed by thread
> >   1.
> 
> I see that, since so->fence is duplicated at the beginning of the function,
> destruction of the fence object would never happen in the block covered by
> cmpxchg. If it ever happens then that would be by the 'unref' before the
> function exit.

Right, but there's nothing preventing that from happening before
r600_fence_reference in thread 0 calls pipe_reference. Mixing pipe_reference
and p_atomic_cmpxchg like this isn't safe.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #45 from Suzuki, Shinji  ---
> feeling that solution via serialization of unref-ops blurs the nature of the
> problem that is inherent in the sock code. 
This concern can perhaps be obviated by inserting an explicit check.

   if (screen->fence_finish(screen, pipe, fence, timeout)) {
  mtx_lock(>Shared->Mutex);
  if( so->fence!= NULL )
screen->fence_reference(screen, >fence, NULL);
  mtx_unlock(>Shared->Mutex);
  so->b.StatusFlag = GL_TRUE;
   }

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #44 from Suzuki, Shinji  ---
> screen->fence_reference(screen, , NULL);
>
> is always thread-safe with a local variable fence,
That case is thread-safe not because the variable is local but because
concurrent execution is not possible because the variable is local. :)

>Since the fence signalled before the first waiter returns, I'd say that's at 
>least sort of correct. :)
I agree that it is 'sort of' correct. But a program that is very particular
about fence being reached within set time-out period or not will misbehave.

>=> r600_fence_reference and st_client_wait_sync continue working
>   with the fence object memory, which was already freed by thread
>   1.

I see that, since so->fence is duplicated at the beginning of the function,
destruction of the fence object would never happen in the block covered by
cmpxchg. If it ever happens then that would be by the 'unref' before the
function exit.

> Did you run into any particular problem with my latest patch?
I run my app for while with your patch minus the locking during reference
duplication and observed no crashes. I'm sufficiently confident that yours is
correct and my remaining concern is mostly about lock contention plus the
feeling that solution via serialization of unref-ops blurs the nature of the
problem that is inherent in the sock code. There is also a concern that if you
acquire lock during reference duplication here, someone who study this part of
the code later on may think that locking is necessary operation and do the same
even though reference duplication is, as far as I can see, lock-free operation.
That would further increase contention.
With all that said, I don't mind you applying your patch to the master. I
believe it is correct. Your decision must take priority because I think you are
in position to devote much more time and energy in improving mesa than I can
contribute. In addition, I would not be enough confident about my cmpxchg
solution until I write tens of unit-tests to be run days and weeks continuously
in various environments, which I perhaps will not have time for.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #43 from Michel Dänzer  ---
(In reply to Suzuki, Shinji from comment #41)
> "screen->fence_reference(screen, , NULL);
> is thread-safe if calls are serialized otherwise not thread safe".

Not quite.

 screen->fence_reference(screen, , NULL);

is always thread-safe with a local variable fence, because each thread has a
separate instance of the local variable, so only one of them will attempt to
destroy the fence object referenced by the pointer stored in the local
variables.


(In reply to Suzuki, Shinji from comment #42)
> (The use of the shared completion flag seems to have potential
> synchronization problem because a waiter can see a successful wait as a
> result of unsuccessful call to fence_finish() if another thread comes and
> completes a successful fence_finish() call after the first thread completes
> unsuccessful fence_finish() but before returns.)

Since the fence signalled before the first waiter returns, I'd say that's at
least sort of correct. :)


>screen->fence_reference(screen, , so->fence);

[...]

>if (fence) {
>   if(screen->fence_finish(screen, fence, timeout)) {
>  so->b.StatusFlag = GL_TRUE;
>  if( p_atomic_cmpxchg(>fence, fence, NULL) == fence ) {

I'm afraid there's still a race here:

Thread 0Thread 1


screen->fence_reference(screen, , so->fence);
-> struct r600_multi_fence *rsrc = (struct r600_multi_fence *)src;
   -> rsrc != NULL

if
(p_atomic_cmpxchg(>fence, fence, NULL) == fence) {
-> fence !=
NULL, so->fence == NULL
   ->
continues to destroy the fence object

=> r600_fence_reference and st_client_wait_sync continue working
   with the fence object memory, which was already freed by thread
   1.


Did you run into any particular problem with my latest patch? Assuming the
shared mutex contention is your only issue with it, have you actually measured
any significant contention with it? If not, I'd like to submit it as the fix
for this bug report. It can always be changed to use a per-fence-object mutex
later if somebody measures significant contention with the shared mutex.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #42 from Suzuki, Shinji  ---
Sorry for the clutter. Timeout is not handled in the previous attempt.
(The use of the shared completion flag seems to have potential synchronization
problem because a waiter can see a successful wait as a result of unsuccessful
call to fence_finish() if another thread comes and completes a successful
fence_finish() call after the first thread completes unsuccessful
fence_finish() but before returns.)

static void st_client_wait_sync(struct gl_context *ctx,
struct gl_sync_object *obj,
GLbitfield flags, GLuint64 timeout)
{
   struct pipe_screen *screen = st_context(ctx)->pipe->screen;
   struct st_sync_object *so = (struct st_sync_object*)obj;
   struct pipe_fence_handle *fence = NULL;

   /* We don't care about GL_SYNC_FLUSH_COMMANDS_BIT, because flush is  
* already called when creating a fence. 
*/

   /* Duplicate the reference so that the fence object is guaranteed to 
* be alive at least until associated 'unref' below is executed. 
* This is important because multiple threads have to execute
* fence_finish() concurrently even if they target same fence object 
* in order to deal with potentially different time-out settings.
*/
   screen->fence_reference(screen, , so->fence);

   if (fence) {
  if(screen->fence_finish(screen, fence, timeout)) {
 so->b.StatusFlag = GL_TRUE;
 if( p_atomic_cmpxchg(>fence, fence, NULL) == fence ) {
/* Get done with 'so->object'. This is a 'unref' op.
 * Borrow the value in 'fence' since so->fence is already   
 * set to NULL by the cmpxchg above.
 */
struct pipe_fence_handle * fence_copy = fence;
screen->fence_reference(screen, _copy, NULL);
 }
  }
   } // fence==0 means the fence has already reached
   screen->fence_reference(screen, , NULL);
}

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #41 from Suzuki, Shinji  ---
I think now I have better understanding of the problem we are dealing with
here.

>Not thread safe (race condition on so->fence):
>  screen->fence_reference(screen, >fence, NULL);
>
>Always thread safe (if fence is a local variable):
>  screen->fence_reference(screen, , NULL);

I think above can be more concisely stated that
"screen->fence_reference(screen, , NULL);
is thread-safe if calls are serialized otherwise not thread safe".

What's fundamentally wrong with the untouched mesa code is that
screen->fence_reference(screen, >fence, NULL) is potentially called more
than once. If the calls are serialized, no crash occurs because the second and
later calls behave as no-op. Protecting each call with a mutex is a way to
assure that serial execution. But that is an indirect resolution of the
problem. A direct resolution is to have screen->fence_reference() not to be
called more than once because that shared reference contributes to only one
increment in the reference count. Below is my latest attempt.

static void st_client_wait_sync(struct gl_context *ctx,
struct gl_sync_object *obj,
GLbitfield flags, GLuint64 timeout)
{
   struct pipe_screen *screen = st_context(ctx)->pipe->screen;  
   struct st_sync_object *so = (struct st_sync_object*)obj; 
   struct pipe_fence_handle *fence = NULL;  

   /* Duplicate the reference so that the fence object is guaranteed to
* be alive at least until associated 'unref' below is executed.
* This is important because multiple threads have to execute
* fence_finish() concurrently even if they target same fence object
* to deal with potentially different time-out settings.
*/
   screen->fence_reference(screen, , so->fence);  

   if (fence && screen->fence_finish(screen, fence, timeout)) {
  if( p_atomic_cmpxchg(>fence, fence, NULL) == fence ) {
 /* Get done with 'so->object'. This is a 'unref' op.
  * Borrow the value in 'fence' since so->fence is already
  * set to NULL by the cmpxchg above.
  */
 struct pipe_fence_handle * fence_copy = fence; 
 screen->fence_reference(screen, _copy, NULL);
  } 
   }
   so->b.StatusFlag = GL_TRUE;   
   screen->fence_reference(screen, , NULL);   
}

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #40 from Suzuki, Shinji  ---
Sorry I misunderstood your question and thought that you are asking if my
modification works or not. Your patch should work if mine works because yours
has tighter serialization. (My modification does not acquire mutex while
duplicating so->fence)

Yours is;
+   mtx_lock(>Shared->Mutex);
+   screen->fence_reference(screen, , so->fence);
+   mtx_unlock(>Shared->Mutex);

Mine is;
+   screen->fence_reference(screen, , so->fence);


(In reply to Michel Dänzer from comment #37)
> Created attachment 127413 [details] [review]
> Only protect concurrent access to so->fence
> 
> Based on the discussion so far, this patch only protects concurrent acccess
> to so->fence. Shinji-san, does this work reliably for your test case?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #39 from Michel Dänzer  ---
Well, it seems to be the best test case we have so far, and it's the most
relevant one for this bug report. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #38 from Suzuki, Shinji  ---
As you saw, now I'm running my app without locking on the first block of code
but with locking on the second. So far so good. But my application is streaming
video processing. I assume threads runs in rock-step fashion after a little
perturbation at startup (that comes from initializations here and there?)
Therefore I'm afraid whether my app runs without crash or not is not a good
assessment of correctness.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

Michel Dänzer  changed:

   What|Removed |Added

 Attachment #127267|0   |1
is obsolete||

--- Comment #37 from Michel Dänzer  ---
Created attachment 127413
  --> https://bugs.freedesktop.org/attachment.cgi?id=127413=edit
Only protect concurrent access to so->fence

Based on the discussion so far, this patch only protects concurrent acccess to
so->fence. Shinji-san, does this work reliably for your test case?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #36 from Suzuki, Shinji  ---
Michel-san, thank you for elaboration. Maybe this discussion continues because
I have failed to express my question clearly. What I'm wondering is that if the
following section needs to be protected by ctx->Shared->Mutex or not.

   screen->fence_reference(screen, , so->fence);
   if(!fence) {
  /* If the so->fence has been reset to NULL, the fence have been reached   
 but so->b.StatusFlag may not be set to GL_TRUE yet. Since the caller   
 may check on the value of the flag as soon as the control returns, 
 do the same too although redundant.
  */
  so->b.StatusFlag = GL_TRUE;
  goto quit;
   }

I completely agree that locking is needed in the following section marked with
!.
(so->b.StatusFlag = GL_TRUE can be moved out of the block though.)

if (screen->fence_finish(screen, fence, 0)) {
!  mtx_lock(>Shared->Mutex);   
!  screen->fence_reference(screen, >fence, NULL);   
!  so->b.StatusFlag = GL_TRUE;  
!  mtx_unlock(>Shared->Mutex); 
}

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #35 from Marek Olšák  ---
(In reply to Michel Dänzer from comment #34)
> (In reply to Suzuki, Shinji from comment #31)
> > Why is this patch rejected?
> 
> Because it'll have to be rebased on top of the updated patch 1.
> 
> 
> > Now I'm revisiting your patch. Do we need to have mutual exclusion on
> >screen->fence_reference(screen, , so->fence);
> > and
> >   screen->fence_reference(screen, >fence, NULL);
> > ?
> 
> I'm not sure r600_fence_reference() is thread safe.

OK, again:

fence_reference is thread-safe with regard to the reference counter and fence
destruction. It doesn't, however, protect the "dst" pointer itself. Therefore:

Not thread safe (race condition on so->fence):
  screen->fence_reference(screen, >fence, NULL);

Always thread safe (if fence is a local variable):
  screen->fence_reference(screen, , NULL);

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #34 from Michel Dänzer  ---
(In reply to Suzuki, Shinji from comment #31)
> Why is this patch rejected?

Because it'll have to be rebased on top of the updated patch 1.


> Now I'm revisiting your patch. Do we need to have mutual exclusion on
>screen->fence_reference(screen, , so->fence);
> and
>   screen->fence_reference(screen, >fence, NULL);
> ?

I'm not sure r600_fence_reference() is thread safe.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #33 from Suzuki, Shinji  ---
(In reply to Marek Olšák from comment #32)
> Reference counting is thread safe, but the assignment in the sync obj
> structure itself isn't. So you need a mutex.

I see no assignment to the sync object structure occurring in evaluating
'screen->fence_reference(screen, , so->fence);'

# I noticed that discussion here is being fed to mesa-dev mailing list do we
better move to offline channel? Please feel free to e-mail me.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #32 from Marek Olšák  ---
Reference counting is thread safe, but the assignment in the sync obj structure
itself isn't. So you need a mutex.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #31 from Suzuki, Shinji  ---
(In reply to Michel Dänzer from comment #30)
> (In reply to Suzuki, Shinji from comment #29)
> Yes, that'll be patch 2, à la
> https://patchwork.freedesktop.org/patch/115220/ . :)
Why is this patch rejected?

Now I'm revisiting your patch. Do we need to have mutual exclusion on
   screen->fence_reference(screen, , so->fence);
and
  screen->fence_reference(screen, >fence, NULL);
?
Concurrent calls to screen->fence_reference(screen, >fence, NULL) must be
arbitrated as they race on dereference/resetting of the lhs value. But I
believe, though not too confident, that reference counting itself is
implemented in thread-safe manner therefore no arbitration on
screen->fence_reference(screen, , so->fence); is needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #30 from Michel Dänzer  ---
(In reply to Suzuki, Shinji from comment #29)
> BTW, st_check_sync and st_client_wait_sync() seem identical except for the
> existence of flags and timeout. Should they be merged?

Yes, that'll be patch 2, à la https://patchwork.freedesktop.org/patch/115220/ .
:)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #29 from Suzuki, Shinji  ---
Sorry. I looked at only your latest comment when I replied previously because I
read it in email.

You are absolutely right. fence_finish() MUST run concurrently even if they
target same sync-object because, as you say, timeout setting in waits may be
different. I was thinking only in terms of infinite waits. I'll reapply your
patch and continue watching how my app behaves.
Thank you for noticing this before I post my buggy patch to the mailing list.

BTW, st_check_sync and st_client_wait_sync() seem identical except for the
existence of flags and timeout. Should they be merged?

(In reply to Michel Dänzer from comment #26)
> (In reply to Marek Olšák from comment #24)
> > Hm. Probably none.
> 
> Actually, I think there are: E.g. consider one thread calling
> glClientWaitSync with a non-0 timeout, blocking for some time with the mutex
> locked. If another thread calls glClientWaitSync with a 0 timeout (or
> whichever API call ends up in st_check_sync) during that time, it'll block
> until the first thread unlocks the mutex.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #28 from Suzuki, Shinji  ---
Yes. I agree with you that we can do without per-sync-object if  we
allow all waiters enter fence_finish() freely.
With that said, per-sync-object mutex has another benefit of
potentially reducing lock contention among waiters on differing sync
objects and with other mesa components that deals with shared
resources. To be fair I also have to mention that ctx->Shared.Mutex is
touched everywhere that trying to optimize in this particular context
only may not make much sense and adding mutex certainly has associated
overhead. Overall, I vote +1 on your strategy  if free execution of
fence_finish() is to be allowed.


On Wed, Oct 19, 2016 at 10:21 AM,   wrote:
> Comment # 27 on bug 98172 from Michel Dänzer
>
> Note that if we allow concurrent fence_finish calls, I don't think we need a
> per-sync-object mutex.
>
> 
> You are receiving this mail because:
>
> You reported the bug.
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #27 from Michel Dänzer  ---
Note that if we allow concurrent fence_finish calls, I don't think we need a
per-sync-object mutex.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #26 from Michel Dänzer  ---
(In reply to Marek Olšák from comment #24)
> Hm. Probably none.

Actually, I think there are: E.g. consider one thread calling glClientWaitSync
with a non-0 timeout, blocking for some time with the mutex locked. If another
thread calls glClientWaitSync with a 0 timeout (or whichever API call ends up
in st_check_sync) during that time, it'll block until the first thread unlocks
the mutex.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #25 from Suzuki, Shinji  ---
Thank you for your comments.
Patches attached here by me are against 11.2.
 I'll base on maser when posting diff to the mailing list.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #24 from Marek Olšák  ---
Hm. Probably none. The patch looks good.

BTW, the patch won't apply on master, because the fence_finish function is
slightly different there.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #23 from Suzuki, Shinji  ---
(In reply to Marek Olšák from comment #22)
> It would be better to call fence_finish while not holding the mutex. For
>mtx_unlock(>mtx);
> 
>if (screen->fence_finish(screen, fence, timeout)) {
>   mtx_lock(>mtx);

What are the benefits of allowing multiple outstanding calls to fence_finish()
per sync-object?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #22 from Marek Olšák  ---
It would be better to call fence_finish while not holding the mutex. For
example:

mtx_lock(>mtx);
/* If the fence doesn't exist, assume it's signalled. */
if (so->fence) {
   struct pipe_fence_handle *fence = NULL;
   screen->fence_reference(screen, , so->fence);
   mtx_unlock(>mtx);

   if (screen->fence_finish(screen, fence, timeout)) {
  mtx_lock(>mtx);
  screen->fence_reference(screen, >fence, NULL);
  so->b.StatusFlag = GL_TRUE;
  mtx_unlock(>mtx);
   }
   screen->fence_reference(screen, , NULL);
} else {
   mtx_unlock(>mtx);
}

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #21 from Suzuki, Shinji  ---
Created attachment 127366
  --> https://bugs.freedesktop.org/attachment.cgi?id=127366=edit
Per sync-object mutex. Redundant 'else'  and setting of StatusFlag are
eliminated.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #20 from Suzuki, Shinji  ---
Thank you. That's certainly easier to read.
Also, now I see that so->b.StatusFlag needs to be set to True only when
so->ference is set to NULL in either st_check_sync() or st_client_wait_sync()
because so->ference gets set to NULL nowhere else between fence object creation
and destruction.
I'll continue to work with the patched lib for another while before submitting
a patch to the mailing list.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #19 from Michel Dänzer  ---
(In reply to shinji.suzuki from comment #16)
> Created attachment 127317 [details] [review]
> Arbitration on so->fence through per sync-object mutex.

One minor comment below, otherwise looks good to me. Feel free to submit the
patch generated by git format-patch to the mesa-dev mailing list for review.


+   if (type == GL_SYNC_FENCE) {
+  struct st_sync_object * so = 
+ (struct st_sync_object*)CALLOC_STRUCT(st_sync_object);
+  mtx_init(>mtx, mtx_plain);
+  return (struct gl_sync_object *)so;
+   } else
   return NULL;

This can be written as:

   if (type == GL_SYNC_FENCE) {
  [...]
   }

   return NULL;

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #18 from shinji.suz...@gmail.com ---
> Looks like glDeleteSync directly calls (or is a macro of?)
I thought that was the case because I got this stack trace;
(gdb) where
#0  st_delete_sync_object (ctx=0x863890, obj=0x7fffe8001320)
at state_tracker/st_cb_syncobj.c:62
#1  0x00466137 in Fence::~Fence (this=0x7fffe80012a0, 
__in_chrg=)
   at /home/suzuki/src/photron/ndro2016/gl_fence.cpp:31

gl_fence.cpp:31 is the call site of glDeleteSync().
But if I set break point at _mesa_DeleteSync, then I get this.

#0  _mesa_DeleteSync (sync=0x7fffe80017e0) at main/syncobj.c:241
#1  0x00466137 in Fence::~Fence (this=0x7fffe8001760,
__in_chrg=)
at /home/suzuki/src/photron/ndro2016/gl_fence.cpp:31

If I step into st_delete_sync_object(), I get same trace as the first one.
Looks like as if gdb is not showing correct call trace while the program is
executing st_delete_sync_object().

The bottom line is; glDeleteSync() object likely calls st_delete_sync_object
not directly but via _mesa_DeleteSync, which does perform reference counting on
the sync-object.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #17 from shinji.suz...@gmail.com ---
I need to take my words back.

> dtor should be called only by the thread that eliminated the last reference.

Looks like glDeleteSync directly calls (or is a macro of?)
st_delete_sync_object. If that is true, the reference count on sync-object, if
there is, is not respected. Therefore I think mtx_Lock(so->Shared.Mutex) needs
to be present in the dtor function as well. I apologize for my oversight.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #16 from shinji.suz...@gmail.com ---
Created attachment 127317
  --> https://bugs.freedesktop.org/attachment.cgi?id=127317=edit
Arbitration on so->fence through per sync-object mutex.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #15 from shinji.suz...@gmail.com ---
Created attachment 127316
  --> https://bugs.freedesktop.org/attachment.cgi?id=127316=edit
Arbitration on so->fence based on Michel-san's patch

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #14 from shinji.suz...@gmail.com ---
I've ben running tests with changes derived from your patch, with and without
random sleeps to give some shake-ups to execution order with, and have seen no
crashes so far. The differences from yours are;
* No mutual exclusions in ctor/dtor functions. (I see no chance of race in ctor
and dtor should be called only by the thread that eliminated the last
reference.)
* Locking duration is made slightly shorter.

Now I feel sufficiently convinced about the correctness of the fix and at the
same time feels that alternative implementation may better be given
consideration. As I wrote previously locking on so->Shared.Mutex introduces
contention that is not essential. Allowing concurrent calls, in relative to the
sync object at hand, to fence_finish() will require more careful implementation
from underlining gpu specific drivers, though r600 implementation seems good,
and running the same check by multiple threads seems inefficient. I'll attach
patch for an alternative implementation as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #13 from shinji.suz...@gmail.com ---
> So the second fence_reference call will find so->fence == NULL and do nothing.

Thank you. I failed to see that fence_reference(screen,,0) is no-op
thanks to "if (ptr != refererence)" in pipe_reference_descsribed().
I'll get back to you when I finish runnning some tests.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #12 from Michel Dänzer  ---
It should be fine. You're right that fence_finish can run concurrently in
multiple threads, but even if it returns true in multiple threads, the
fence_reference calls are serialized by the mutex. So the second
fence_reference call will find so->fence == NULL and do nothing.

Have you run into any problems with this patch in your testing?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #11 from shinji.suz...@gmail.com ---
Sorry I've made too many mistake in writing. (Should have gone to bed before
writing.) I'll rewrite whole post below.

I'm afraid execution of st_fence_sync() can still race.
Thread-A can run upto
  success = screen->fence_finish(screen, pipe, fence, 0);
and then get preempted and Thread-B can run upto the same location.
And then
  screen->fence_reference(screen, >fence, NULL);
will be executed serially by both threads. If screen->fence_finish()
is thread-safe and return true only for the first invocation then all is fine
but likely it is not true as otherwise we will not be struggling with this
issue.
I think the gist of it is that checking of so->fence and nullifying of it
should be executed atomically. If "if (success)" is replaced with "if (success
&& so->fence)" then the program may behave correctly but I'm still not
comfortable about sreen->fence_finish() being called concurrently.
I'm also concerned that mutual exclusion on ctx->Shared->Mutex may introduce
unnecessarily strict serialization. 
Can't we introduce per sync-object mutex so that excution of checking of
so->fence and nullyfying of it happen atomically?
Is that modification too intrusive? (At least it is unnecessary overhead when
st_fence_sync() is not executed concurrently on the same sync object.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #10 from shinji.suz...@gmail.com ---
>I'm not confortable about sreen->fence_ference() being called concurrently.

I'm not comfortable about sreen->fence_finish() being called concurrently.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #9 from shinji.suz...@gmail.com ---
oops. I'm getting confused. Concurrent programmingis hard.
> screen->fence_reference(screen, >fence, NULL);
will not be executed in arbitrary order but serially due to the mutex locking.
Still the out come should be a segfault in the thread that comes late.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #8 from shinji.suz...@gmail.com ---
Comment on attachment 127267
  --> https://bugs.freedesktop.org/attachment.cgi?id=127267
Lock the shared state mutex and work with a local reference of so->fence

Review of attachment 127267:
-

I'm afraid execution of st_fence_sync() can still race.
Thread-A can run upto
  success = screen->fence_finish(screen, pipe, fence, 0);
and then get prempted and Thread-B can run upto the same location.
And then
  screen->fence_reference(screen, >fence, NULL);
can still be executed in arbitrary order. If screen->fence_refrence()
is thread-safe and return true only for the first invocation then all is fine
but likely it is not true as otherwise we will not be struggling with this
issue.
I think the gist of it is that checking of so->fence and nullifying of it
should be executed atomically. If "if (success)" is replaced with "if (success
&& so->fence)" then the program may behave correctly but I'm not confortable
about sreen->fence_ference() being called concurrently.
I'm also concerned that mutual exclusion on ctx->Shared->Mutex may introduce
unnecessarily strict serialization. 
Can't we introduce per sync-object mutex so that excution of checking of
so->fence and nullyfying of it happen atomically?
Is that modification too intrusive? (At least it is unnecessary overhead when
st_fence_sync() is not executed concurrently on the same sync object.)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

Michel Dänzer  changed:

   What|Removed |Added

 Attachment #127204|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

Michel Dänzer  changed:

   What|Removed |Added

 Attachment #127237|0   |1
is obsolete||

--- Comment #7 from Michel Dänzer  ---
Created attachment 127267
  --> https://bugs.freedesktop.org/attachment.cgi?id=127267=edit
Lock the shared state mutex and work with a local reference of so->fence

Does this patch fix the problem completely?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #6 from shinji.suz...@gmail.com ---
Created attachment 127237
  --> https://bugs.freedesktop.org/attachment.cgi?id=127237=edit
Diff that went into my copy of mesa-source

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #5 from shinji.suz...@gmail.com ---
Hello Michael-san.
I'm afraid my testing was not enough and I spoke too soon. Today I've run my
app remotely through ssh session by pointing DISPLAY to :0 as I'm away from my
home machine. Now the app crashes, though much less often, even with modified
library containing your patch. Remoting and screen being blanked seems to
affect some timing as I get around 25fps only v.s. around 35fps that I get when
I work on the console.
Here is the call stack. (assert() is my lame attempt in trying to understand
what code path is being taken, which is apparently compiled to no-op.)

state_tracker/st_cb_syncobj.c
118if (screen->fence_finish(screen, fence, timeout)) {
│
   │119   assert(0);  
│
  >│120   screen->fence_reference(screen, >fence, NULL);  
│
   │121   so->b.StatusFlag = GL_TRUE; 
│
   │122}  
│

r600_pipe_common.c
766 if (pipe_reference(&(*rdst)->reference, >reference))
│
  >│767 ws->fence_reference(&(*rdst)->gfx, NULL); 
│
   │768 ws->fence_reference(&(*rdst)->sdma, NULL);
│
   │769 FREE(*rdst);  
│

radeon_drm_cs.c
667 static void radeon_fence_reference(struct pipe_fence_handle **dst,
│
   │668struct pipe_fence_handle *src) 
│
   │669 { 
│
  >│670 pb_reference((struct pb_buffer**)dst, (struct pb_buffer*)src);
│
   │671 } 
│

src/gallium/auxiliary/pipebuffer/pb_buffer.h

235 static inline void
│
   │236 pb_reference(struct pb_buffer **dst,  
│
   │237  struct pb_buffer *src)   
│
   │238 { 
│
  >│239struct pb_buffer *old = *dst;  
│
   │240   
│
   │241if (pipe_reference(&(*dst)->reference, >reference))   
│
   │242   pb_destroy( old );  
│
   │243*dst = src;
│
   │244 } 
│

(gdb) p old
$1 = 
(gdb) p dst
$2 = (struct pb_buffer **) 0x8
(gdb) 

I'll attach the diff that corresponds to your patch to make sure I'm not making
mistake in applying your patch by hand as some hunks were rejected by 'patch'.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #4 from Michel Dänzer  ---
Shinji-san,

(In reply to shinji.suzuki from comment #3)
> Yes, it does! Thank you for your interest and effort in solving the issue.

Great, thank you for testing the patch.


> My itch is that I can't see why your patch eliminates the race.

Does the commit log of https://patchwork.freedesktop.org/patch/115219/ clarify
it?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #3 from shinji.suz...@gmail.com ---
(In reply to Michel Dänzer from comment #2)
> Created attachment 127204 [details] [review]
> Work with a local reference of so->fence
> 
> Does this patch help?

Yes, it does! Thank you for your interest and effort in solving the issue.
I've rebuilt mesa from ubuntu source and run my program with it. Without your
patch, the app segfaults like it did with the OS supplied library. With your
patch the app has not crashed even once so far. The app crashes, when it does,
mainly upon start up like 4 times out of 5. I run the app a few dozen times
successfully with the patched library.

My itch is that I can't see why your patch eliminates the race. My
understanding is that the segfault happens because the second thread tries to
dereference >fence, which had been nullified by the first thread and I
can't figure out how that is avoided by the patch.

FYI, here is the call-stack when the apps crashes without the patch.

(gdb) where
#0  pb_reference (src=0x0, dst=0x10)
at ../../../../../../src/gallium/auxiliary/pipebuffer/pb_buffer.h:239
#1  radeon_fence_reference (dst=0x10, src=0x0)
at ../../../../../../src/gallium/winsys/radeon/drm/radeon_drm_cs.c:670
#2  0x7f9b52bff441 in r600_fence_reference (screen=, 
dst=0x7f9b3003c218, src=0x0)
at ../../../../../src/gallium/drivers/radeon/r600_pipe_common.c:768
#3  0x7f9b5272ea7f in st_client_wait_sync (ctx=, 
obj=0x7f9b3003c1f0, flags=, timeout=)
at ../../../src/mesa/state_tracker/st_cb_syncobj.c:114
#4  0x7f9b526b0217 in _mesa_ClientWaitSync (sync=, flags=1, 
timeout=18446744073709551615) at ../../../src/mesa/main/syncobj.c:341
#5  0x00466106 in Fence::clientWaitSync (this=0x7f9b3001c0c0)
at /home/suzuki/src/xxx/ndro2016/gl_fence.cpp:25
#6  0x00479e09 in Fenced::clientWaitSync (
this=0x7f9b4cc26e70) at /home/suzuki/src/xxx/ndro2016/gl_fence.h:59
#7  0x7f9b4de64f02 in FileReaderThread::work (this=0x1a7b540)
at /home/suzuki/src/xxx/ndro2016/thrd_filereader.cpp:63
#8  0x00465586 in WorkerThread::run (this=0x1a7b540)
at /home/suzuki/src/xxx/ndro2016/thrd.cpp:173
#9  0x0046529e in WorkerThread::entry_func (worker=0x1a7b540)
at /home/suzuki/src/xxx/ndro2016/thrd.cpp:144
#10 0x7f9b567e36fa in start_thread (arg=0x7f9b4cc27700)
at pthread_create.c:333
#11 0x7f9b55a74b5d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #2 from Michel Dänzer  ---
Created attachment 127204
  --> https://bugs.freedesktop.org/attachment.cgi?id=127204=edit
Work with a local reference of so->fence

Does this patch help?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

shinji.suz...@gmail.com changed:

   What|Removed |Added

   Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
 QA Contact|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
  Component|Drivers/Gallium/r600|Mesa core
 CC||shinji.suz...@gmail.com

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev