> Anyway, I have modified your program to instrument the bad condition. > In particular I wanted to know what is wrong: the si_fd or the > thread receving the SIGIO.
If I understand the question correctly, I would answer: the thread. It seems possible always to read from si_fd without blocking, which indicates that the si_fd deserves a notification. However, it is not always the case that the notification is handled by the thread expecting a notification on that fd, nor is it always the case that the notified thread can read from the fd it was expecting a notification on. It's easy enough to change the signal handler to read from sfp->si_fd rather than uPerfmon_fd. If I do this, the program doesn't fail. > # sigio 3 (3 threads created) > created th=1082132832 i=0 > created th=1090525536 i=1 > created th=1098918240 i=2 > [FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] > UNHALTED_CORE_CYCLES > [FIXED_CTR1(pmd1)] > [GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0] > th=1082132832 id=0 fd=3 > th=1082132832 fd=3 start tid=7968 pid=7967 > [FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] > UNHALTED_CORE_CYCLES > [FIXED_CTR1(pmd1)] > [GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0] > th=1098918240 id=2 fd=4 > th=1098918240 fd=4 start tid=7970 pid=7967 > [FIXED_CTRL(pmc2)=0xa0 pmi0=1 en0=0x0 pmi1=1 en1=0x2 pmi2=1 en2=0x0] > UNHALTED_CORE_CYCLES > [FIXED_CTR1(pmd1)] > [GLOBAL_CTRL(pmc0)=0x200000000 en0=0 en1=0 fen0=0 fen1=1 fen2=0] > th=1090525536 id=1 fd=5 > th=1090525536 fd=5 start tid=7969 pid=7967 > Runtime error (UNIX pid:7967) si_fd=4 fd=3 th=1082132832 > > si_fd=4 which is a fd for a thread that has started monitoring. Yet > the thread owner of fd=3 has also started. So not obvious which is > wrong. Yet I would tend to think it's the thread. What do you se in > your setup? If I understand your experiment correctly, the program would have to fail in the (now very small) interval after configuring signals but before all threads have called pfm_self_start in order to suggest a positive conclusion one way or the other. While I can't say that it will never fail in that interval, experience suggests that it's unlikely. Before I go chasing that particular type of failure, I'd like to hear your thoughts on my argument above. _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
