Richard,

On Fri, May 04, 2007 at 01:21:17PM -0400, Richard C Bilson wrote:
> > From [EMAIL PROTECTED] Wed May  2 12:33:44 2007
> > 
> > I spent some more time on this today. I am still under the impression
> > that there is a race between F_SETOWN/F_SETSIG and a counter overflow
> > with notification.
> > 
> > Is there a way you could modifiy your program such that no thread
> > call pfm_start() before all setups are done? Then I suspect the
> > problem will disappear.
> 
> I hate to be the bearer of bad news, but I have done as you suggest and
> the problem remains. If you'd care to see my current test code it is at
> http://plg.uwaterloo.ca/~rcbilson/sigio.cc
> 
You've already ruined my election week-end ;-<

Anyway, I have modified your program to instrument the bad condition.
In particular I wanted to know what is wrong: the si_fd or the 
thread receving the SIGIO.

# sigio 3 (3 threads created)
created th=1082132832 i=0
created th=1090525536 i=1
created th=1098918240 i=2
[FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] 
UNHALTED_CORE_CYCLES
[FIXED_CTR1(pmd1)]
[GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0]
th=1082132832 id=0 fd=3
th=1082132832 fd=3 start tid=7968 pid=7967
[FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] 
UNHALTED_CORE_CYCLES
[FIXED_CTR1(pmd1)]
[GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0]
th=1098918240 id=2 fd=4
th=1098918240 fd=4 start tid=7970 pid=7967
[FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] 
UNHALTED_CORE_CYCLES
[FIXED_CTR1(pmd1)]
[GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0]
th=1090525536 id=1 fd=5
th=1090525536 fd=5 start tid=7969 pid=7967
Runtime error (UNIX pid:7967) si_fd=4 fd=3 th=1082132832

si_fd=4 which is a fd for a thread that has started monitoring. Yet
the thread owner of fd=3 has also started. So not obvious which is
wrong. Yet I would tend to think it's the thread. What do you se in
your setup?

I looked at the kernel code and it is not clear what is wrong
(see fs/fcntl.c). Somehow, it seems like the kernel picks the wrong
thread.
What worries me is th following loop in send_sigio():
        do_each_pid_task(pid, type, p) {
                send_sigio_to_task(p, fown, fd, band);
        } while_each_pid_task(pid, type, p);

I don't quite understand what is going on with struct pid *.
But this could potentially send to multiple threads or the
wrong thread.

More investigation needed. It may be that there is no way to 
target the SIGIO to a particular thread for each descriptor.

-- 
-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to