> Anyway, I have modified your program to instrument the bad condition.
> In particular I wanted to know what is wrong: the si_fd or the 
> thread receving the SIGIO.

If I understand the question correctly, I would answer: the thread. It
seems possible always to read from si_fd without blocking, which
indicates that the si_fd deserves a notification. However, it is not
always the case that the notification is handled by the thread
expecting a notification on that fd, nor is it always the case that the
notified thread can read from the fd it was expecting a notification
on.

It's easy enough to change the signal handler to read from sfp->si_fd
rather than uPerfmon_fd. If I do this, the program doesn't fail.

> # sigio 3 (3 threads created)
> created th=1082132832 i=0
> created th=1090525536 i=1
> created th=1098918240 i=2
> [FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] 
> UNHALTED_CORE_CYCLES
> [FIXED_CTR1(pmd1)]
> [GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0]
> th=1082132832 id=0 fd=3
> th=1082132832 fd=3 start tid=7968 pid=7967
> [FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] 
> UNHALTED_CORE_CYCLES
> [FIXED_CTR1(pmd1)]
> [GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0]
> th=1098918240 id=2 fd=4
> th=1098918240 fd=4 start tid=7970 pid=7967
> [FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] 
> UNHALTED_CORE_CYCLES
> [FIXED_CTR1(pmd1)]
> [GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0]
> th=1090525536 id=1 fd=5
> th=1090525536 fd=5 start tid=7969 pid=7967
> Runtime error (UNIX pid:7967) si_fd=4 fd=3 th=1082132832
> 
> si_fd=4 which is a fd for a thread that has started monitoring. Yet
> the thread owner of fd=3 has also started. So not obvious which is
> wrong. Yet I would tend to think it's the thread. What do you se in
> your setup?

If I understand your experiment correctly, the program would have to
fail in the (now very small) interval after configuring signals but
before all threads have called pfm_self_start in order to suggest a
positive conclusion one way or the other. While I can't say that it
will never fail in that interval, experience suggests that it's
unlikely.

Before I go chasing that particular type of failure, I'd like to hear
your thoughts on my argument above.
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to