Richard,

I spent some more time on this today. I am still under the impression
that there is a race between F_SETOWN/F_SETSIG and a counter overflow
with notification.

Is there a way you could modifiy your program such that no thread
call pfm_start() before all setups are done? Then I suspect the
problem will disappear.

On Mon, Apr 30, 2007 at 05:21:47PM -0400, Richard C Bilson wrote:
> Stephane,
> 
> > I think the core issue is that you have a race condition in your
> > program between the various worker threads. The race has to do with
> > dosignal(). I noticed that you get the wrong fd almost instantly when
> > you hit the problem. I think this may be due to the fact that you have
> > a race between one thread starting monitoring and generating samples
> > and thus notification vs.  another thread coming online, i.e., starting
> > to execute dosignal(). I am not 100% sure this is the problem because the
> > monitoring thread has set its F_SETSIG, so it should be the only one
> > receicing the signal, yet I have not verified the logic in the kernel.
> > It may be that if a thread has not yet set its F_SETSIG, then it may be
> > chosen first.
> 
> If this is correct, then the meaning of this combination of F_SETSIG
> with F_SETOWN is evidently "send SIGIO to this thread, or to any other
> thread that hasn't called F_SETSIG on a different fd." Describing the
> problem as a "race" in my program assumes this behavior. However, this
> doesn't seem to me to be obvious or particularly useful behavior. I'd
> go so far as to call it a bug.
> 
> It's easy enough to test this hypothesis by setting up the handler and
> unmasking SIGIO after the calls to fcntl, but before the call to
> pfm_self_start. This ought to remove the "race" that concerns you. I
> just tried this, and it still fails.
> 
> > I simply added a big sleep between dosignal() and the
> > beginning of active monitoring.
> 
> I'm not sure that a "big" sleep makes a difference, since the threads
> are created in quick succession and likely sleep concurrently.
> 
> What would you expect the results of such an experiment to be? I see it
> fail less often, but it still fails.
> 
> > The other thing I did to the program is that I explicitly blocked SIGIO
> > in the master thread.
> 
> This doesn't make an obvious difference on my machine, but it's a good
> idea.
> 
> Thanks for your continuing investigation.
> _______________________________________________
> perfmon mailing list
> [email protected]
> http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

-- 

-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to