Hello,
I did more tests and I think I found the reason why it is not working.
I am attaching a log with the debug information. The log shows
information between a call to sys_pfm_start() and sys_pfm_stop(), plus
a following call to sys_pfm_read_pmds().
In file perfmon/perfmon_ctxsw.c:242, pfm_save_pmds() is called
depending on the value returned by pfm_arch_ctxswout_thread(). This
second function returns the value returned by pfm_arch_is_active().
pfm_arch_is_active() returns 1 when the process is being monitor
(after calling pfm_start) and 0 if the process is not (after calling
pfm_stop).
The problem is that using only pfm_arch_is_active() to determine
whether we should save the PMDs or not is not enough. When pfm_stop()
is called the PMDs are not being saved in the powerpc implementation
of perfmon2 (they are saved in the x86 implementation). Therefore, if
after calling pfm_stop() a the process is switched out
pfm_arch_is_active() will return 0 (because the process is not being
monitored; we already call pfm_stop()). This will cause that the PMDs
are not saved at the context switch time and, thus the wrong ones will
be restored later.
All this can be seen in the attached log. At line 103 sys_pfm_stop()
is called. At line 112 sys_pfm_read_pmds() is called. For instance,
the value for counter 5 (#instructions) is 1218670811. Then a context
switch comes, but at line 128 we can see how need_save_pmds is 0 so
the PMDs are not saved. Next time they are restored, at line 135 (for
counter 5) the value is 1207034044. This value is the value which was
saved before at line 83.
I hope this can help to find an solution.
Thank you!
Victor
On Wed, May 20, 2009 at 10:27 AM, victor jimenez <betaband...@gmail.com> wrote:
> Hello,
>
> yeah, my mistake. It is POWER5 indeed.
>
> I was looking a bit more into it and I found a way which seems to
> solve the problem (although I am not sure it is the proper way to fix
> it). In file perfmon/perfmon_ctxsw.c, lines 241-242 there is:
>
> if (need_save_pmds)
> pfm_save_pmds(ctx, set);
>
> If I remove the check for need_save_pmds, and unconditionally save the
> PMDS I cannot reproduce the problem anymore.
>
> I hope this can help to find the problem.
>
> Thank you!
> Victor
>
> On Wed, May 20, 2009 at 1:35 AM, Corey J Ashford <cjash...@us.ibm.com> wrote:
>> victor jimenez <betaband...@gmail.com> wrote on 05/19/2009 09:25:38 AM:
>>
>>> Hello,
>>>
>>> I realized that the log may not look nicely, so I am attaching it as a
>>> text file.
>>>
>>> Victor
>>>
>>> On Tue, May 19, 2009 at 6:17 PM, victor jimenez <betaband...@gmail.com>
>> wrote:
>>> > Hello,
>>> >
>>> > I am using perfmon2 on IBM POWER5.
>>> > The version I am using is libpfm 3.8 with kernel 2.6.28 (February
>>> > 19th, 2009 release).
>>> >
>>> > My problem is that between two consecutives readings of the counters I
>>> > get decreasing values. For instance, I want to read the number of
>>> > misses in L2, so I read the counter PM_DATA_FROM_L2MISS. In reading
>>> > 'i' I get 59695 and in reading 'i+1' I get 59637.
>>> >
>>> > I am wondering if this is a bug or it is some sort of limitation.
>>> > However, in my opinion it looks like a bug.
>>> >
>>> > In order to understand what was going on, I modified the kernel to
>>> > print some messages. In the log at the end of the mail the meaning of
>>> > the fields in every line is:
>>> >
>>> > pfm_xxx : the function from where the message is printed
>>> > pid : pid of the process which is printing the message
>>> > task : parameter task which is present in some perfmon2 functions
>>> > ctx : parameter/variable task which is present in some perfmon2
>> functions
>>> > cnum : counter number
>>> > value : value which is being written into the PMDS
>>> > result : value readed from the PMDS
>>> >
>>> > About the pid field, in my experiment I am running two processes
>>> > (2488, 2489) only 2489 is using perfmon2 (i.e., there is only one
>>> > pfm_context).
>>> >
>>> > In my opinion, the problem comes because perfmon2 sometimes does not
>>> > save PMDS right before a context switch.
>>> >
>>> > For instance, in the next log, at line 1 process 2489 is switched out
>>> > and in line 6 and 8 pfm_save_pmds is called. The value which is being
>>> > saved for counter 3 is 58339. At line 20 the user reads the counter
>>> > with the call to sys_pfm_read_pmds and the value returned is 59695.
>>> >
>>> > The problem is that at line 26 when process 2489 is switched out
>>> > again, the PMDS are not saved. And they are being restored at line 30
>>> > with the old value which was saved before (58339).
>>> >
>>> > After that, the counters keep increasing until we read them again at
>>> > line 103. The value read there is 59637, which is smaller than the
>>> > previous reading (59695).
>>> >
>>
>> [snip]
>> Hello Victor,
>>
>> Thank you very much for reporting this. You are right that it does appear
>> to be a bug. I will look into
>> this tomorrow. Thank you for providing such detailed information.
>>
>> Just to be clear, we are talking about Power5, right? The subject line
>> says Power6, but that appears to be a typo,
>> given the log you have provided.
>>
>> Regards,
>>
>> - Corey
>>
>> Corey Ashford
>> Software Engineer
>> IBM Linux Technology Center, Linux Toolchain
>> Beaverton, OR
>> 503-578-3507
>> cjash...@us.ibm.com
>>
>>
>
1 # pfm_start
2
3 May 20 13:11:35 op710n3 kernel: [ 1664.086266] sys_pfm_start pid=2556
ctx=c0000000e6cb6000
4 May 20 13:11:35 op710n3 kernel: [ 1664.086269] __pfm_start pid=2556
task=c0000000e6948240 ctx=c0000000e6cb6000
5 May 20 13:11:35 op710n3 kernel: [ 1664.086272] pfm_power5_enable_counters
pid=2556 ctx=c0000000e6cb6000
6 May 20 13:11:35 op710n3 kernel: [ 1664.086275] pfm_power5_disable_counters
pid=2556 ctx=c0000000e6cb6000
7
8 # ctxswout
9
10 May 20 13:11:35 op710n3 kernel: [ 1664.100351] pfm_arch_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000
11 May 20 13:11:35 op710n3 kernel: [ 1664.100355] pfm_power5_disable_counters
pid=2556 ctx=c0000000e6cb6000
12 May 20 13:11:35 op710n3 kernel: [ 1664.100358] pfm_power5_read_pmd pid=2556
cnum=2 result=3
13 May 20 13:11:35 op710n3 kernel: [ 1664.100361] pfm_power5_read_pmd pid=2556
cnum=3 result=4384
14 May 20 13:11:35 op710n3 kernel: [ 1664.100364] pfm_power5_read_pmd pid=2556
cnum=5 result=1098515584
15 May 20 13:11:35 op710n3 kernel: [ 1664.100368] == __pfm_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000 need_save_pmds=1
16 May 20 13:11:35 op710n3 kernel: [ 1664.100372] pfm_power5_read_pmd pid=2556
cnum=2 result=3
17 May 20 13:11:35 op710n3 kernel: [ 1664.100375] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=2 value=3
18 May 20 13:11:35 op710n3 kernel: [ 1664.100379] pfm_power5_read_pmd pid=2556
cnum=3 result=4384
19 May 20 13:11:35 op710n3 kernel: [ 1664.100382] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=3 value=4384
20 May 20 13:11:35 op710n3 kernel: [ 1664.100385] pfm_power5_read_pmd pid=2556
cnum=5 result=1098515584
21 May 20 13:11:35 op710n3 kernel: [ 1664.100389] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=5 value=1098515584
22
23 # pfm_arch_clear_pmd_ovfl_cond
24
25 May 20 13:11:35 op710n3 kernel: [ 1664.100392] pfm_power5_write_pmd
pid=2556 cnum=2 value=0
26 May 20 13:11:35 op710n3 kernel: [ 1664.100395] pfm_power5_write_pmd
pid=2556 cnum=3 value=0
27 May 20 13:11:35 op710n3 kernel: [ 1664.100398] pfm_power5_write_pmd
pid=2556 cnum=5 value=0
28
29 # ctxswin
30
31 May 20 13:11:35 op710n3 kernel: [ 1664.100566] pfm_arch_restore_pmds
pid=2554 ctx=c0000000e6cb6000
32 May 20 13:11:35 op710n3 kernel: [ 1664.100570] pfm_power5_write_pmd
pid=2554 cnum=2 value=3
33 May 20 13:11:35 op710n3 kernel: [ 1664.100573] pfm_power5_write_pmd
pid=2554 cnum=3 value=4384
34 May 20 13:11:35 op710n3 kernel: [ 1664.100576] pfm_power5_write_pmd
pid=2554 cnum=5 value=1098515584
35 May 20 13:11:35 op710n3 kernel: [ 1664.100580] pfm_arch_ctxswin_thread
pid=2554 task=c0000000e6948240 ctx=c0000000e6cb6000
36 May 20 13:11:35 op710n3 kernel: [ 1664.100583] pfm_power5_enable_counters
pid=2554 ctx=c0000000e6cb6000
37 May 20 13:11:35 op710n3 kernel: [ 1664.100586] pfm_power5_disable_counters
pid=2554 ctx=c0000000e6cb6000
38
39 # ctxswout
40
41 May 20 13:11:35 op710n3 kernel: [ 1664.970350] pfm_arch_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000
42 May 20 13:11:35 op710n3 kernel: [ 1664.970354] pfm_power5_disable_counters
pid=2556 ctx=c0000000e6cb6000
43 May 20 13:11:35 op710n3 kernel: [ 1664.970358] pfm_power5_read_pmd pid=2556
cnum=2 result=3
44 May 20 13:11:35 op710n3 kernel: [ 1664.970361] pfm_power5_read_pmd pid=2556
cnum=3 result=4626
45 May 20 13:11:35 op710n3 kernel: [ 1664.970364] pfm_power5_read_pmd pid=2556
cnum=5 result=1204609130
46 May 20 13:11:35 op710n3 kernel: [ 1664.970368] == __pfm_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000 need_save_pmds=1
47 May 20 13:11:35 op710n3 kernel: [ 1664.970372] pfm_power5_read_pmd pid=2556
cnum=2 result=3
48 May 20 13:11:35 op710n3 kernel: [ 1664.970375] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=2 value=3
49 May 20 13:11:35 op710n3 kernel: [ 1664.970378] pfm_power5_read_pmd pid=2556
cnum=3 result=4626
50 May 20 13:11:35 op710n3 kernel: [ 1664.970382] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=3 value=4626
51 May 20 13:11:35 op710n3 kernel: [ 1664.970385] pfm_power5_read_pmd pid=2556
cnum=5 result=1204609130
52 May 20 13:11:35 op710n3 kernel: [ 1664.970389] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=5 value=1204609130
53
54 # pfm_arch_clear_pmd_ovfl_cond
55
56 May 20 13:11:35 op710n3 kernel: [ 1664.970392] pfm_power5_write_pmd
pid=2556 cnum=2 value=0
57 May 20 13:11:35 op710n3 kernel: [ 1664.970395] pfm_power5_write_pmd
pid=2556 cnum=3 value=0
58 May 20 13:11:35 op710n3 kernel: [ 1664.970398] pfm_power5_write_pmd
pid=2556 cnum=5 value=0
59
60 # ctxswin
61
62 May 20 13:11:35 op710n3 kernel: [ 1664.970404] pfm_arch_restore_pmds pid=15
ctx=c0000000e6cb6000
63 May 20 13:11:35 op710n3 kernel: [ 1664.970407] pfm_power5_write_pmd pid=15
cnum=2 value=3
64 May 20 13:11:35 op710n3 kernel: [ 1664.970410] pfm_power5_write_pmd pid=15
cnum=3 value=4626
65 May 20 13:11:35 op710n3 kernel: [ 1664.970413] pfm_power5_write_pmd pid=15
cnum=5 value=1204609130
66 May 20 13:11:35 op710n3 kernel: [ 1664.970417] pfm_arch_ctxswin_thread
pid=15 task=c0000000e6948240 ctx=c0000000e6cb6000
67 May 20 13:11:35 op710n3 kernel: [ 1664.970420] pfm_power5_enable_counters
pid=15 ctx=c0000000e6cb6000
68 May 20 13:11:35 op710n3 kernel: [ 1664.970423] pfm_power5_disable_counters
pid=15 ctx=c0000000e6cb6000
69
70 # ctxswout
71
72 May 20 13:11:36 op710n3 kernel: [ 1664.990349] pfm_arch_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000
73 May 20 13:11:36 op710n3 kernel: [ 1664.990353] pfm_power5_disable_counters
pid=2556 ctx=c0000000e6cb6000
74 May 20 13:11:36 op710n3 kernel: [ 1664.990356] pfm_power5_read_pmd pid=2556
cnum=2 result=3
75 May 20 13:11:36 op710n3 kernel: [ 1664.990359] pfm_power5_read_pmd pid=2556
cnum=3 result=4642
76 May 20 13:11:36 op710n3 kernel: [ 1664.990363] pfm_power5_read_pmd pid=2556
cnum=5 result=1207034044
77 May 20 13:11:36 op710n3 kernel: [ 1664.990366] == __pfm_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000 need_save_pmds=1
78 May 20 13:11:36 op710n3 kernel: [ 1664.990370] pfm_power5_read_pmd pid=2556
cnum=2 result=3
79 May 20 13:11:36 op710n3 kernel: [ 1664.990373] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=2 value=3
80 May 20 13:11:36 op710n3 kernel: [ 1664.990377] pfm_power5_read_pmd pid=2556
cnum=3 result=4642
81 May 20 13:11:36 op710n3 kernel: [ 1664.990380] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=3 value=4642
82 May 20 13:11:36 op710n3 kernel: [ 1664.990383] pfm_power5_read_pmd pid=2556
cnum=5 result=1207034044
83 May 20 13:11:36 op710n3 kernel: [ 1664.990387] pfm_save_pmds pid=2556
ctx=c0000000e6cb6000 cnum=5 value=1207034044
84
85 # pfm_arch_clear_pmd_ovfl_cond
86
87 May 20 13:11:36 op710n3 kernel: [ 1664.990390] pfm_power5_write_pmd
pid=2556 cnum=2 value=0
88 May 20 13:11:36 op710n3 kernel: [ 1664.990393] pfm_power5_write_pmd
pid=2556 cnum=3 value=0
89 May 20 13:11:36 op710n3 kernel: [ 1664.990396] pfm_power5_write_pmd
pid=2556 cnum=5 value=0
90
91 # ctxswin
92
93 May 20 13:11:36 op710n3 kernel: [ 1664.990412] pfm_arch_restore_pmds pid=3
ctx=c0000000e6cb6000
94 May 20 13:11:36 op710n3 kernel: [ 1664.990417] pfm_power5_write_pmd pid=3
cnum=2 value=3
95 May 20 13:11:36 op710n3 kernel: [ 1664.990422] pfm_power5_write_pmd pid=3
cnum=3 value=4642
96 May 20 13:11:36 op710n3 kernel: [ 1664.990427] pfm_power5_write_pmd pid=3
cnum=5 value=1207034044
97 May 20 13:11:36 op710n3 kernel: [ 1664.990432] pfm_arch_ctxswin_thread
pid=3 task=c0000000e6948240 ctx=c0000000e6cb6000
98 May 20 13:11:36 op710n3 kernel: [ 1664.990438] pfm_power5_enable_counters
pid=3 ctx=c0000000e6cb6000
99 May 20 13:11:36 op710n3 kernel: [ 1664.990443] pfm_power5_disable_counters
pid=3 ctx=c0000000e6cb6000
100
101 # pfm_stop
102
103 May 20 13:11:36 op710n3 kernel: [ 1665.085868] sys_pfm_stop pid=2556
ctx=c0000000e6cb6000
104 May 20 13:11:36 op710n3 kernel: [ 1665.085871] __pfm_stop pid=2556
task=c0000000e6948240 ctx=c0000000e6cb6000
105 May 20 13:11:36 op710n3 kernel: [ 1665.085875] pfm_power5_disable_counters
pid=2556 ctx=c0000000e6cb6000
106 May 20 13:11:36 op710n3 kernel: [ 1665.085878] pfm_power5_read_pmd pid=2556
cnum=2 result=3
107 May 20 13:11:36 op710n3 kernel: [ 1665.085881] pfm_power5_read_pmd pid=2556
cnum=3 result=4679
108 May 20 13:11:36 op710n3 kernel: [ 1665.085884] pfm_power5_read_pmd pid=2556
cnum=5 result=1218670811
109
110 # read counters (value for cnum=5 : 1218670811)
111
112 May 20 13:11:36 op710n3 kernel: [ 1665.085888] sys_pfm_read_pmds pid=2556
ctx=c0000000e6cb6000
113 May 20 13:11:36 op710n3 kernel: [ 1665.085892] __pfm_read_pmds pid=2556
ctx=c0000000e6cb6000
114 May 20 13:11:36 op710n3 kernel: [ 1665.085895] pfm_power5_read_pmd pid=2556
cnum=3 result=4679
115 May 20 13:11:36 op710n3 kernel: [ 1665.085898] __pfm_read_pmds pid=2556
cnum=3 val=4679 last_reset_val=0
116 May 20 13:11:36 op710n3 kernel: [ 1665.085901] pfm_power5_read_pmd pid=2556
cnum=2 result=3
117 May 20 13:11:36 op710n3 kernel: [ 1665.085905] __pfm_read_pmds pid=2556
cnum=2 val=3 last_reset_val=0
118 May 20 13:11:36 op710n3 kernel: [ 1665.085908] pfm_power5_read_pmd pid=2556
cnum=5 result=1218670811
119 May 20 13:11:36 op710n3 kernel: [ 1665.085911] __pfm_read_pmds pid=2556
cnum=5 val=1218670811 last_reset_val=0
120
121 # ctxswout
122
123 May 20 13:11:36 op710n3 kernel: [ 1665.085941] pfm_arch_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000
124 May 20 13:11:36 op710n3 kernel: [ 1665.085944] pfm_power5_disable_counters
pid=2556 ctx=c0000000e6cb6000
125 May 20 13:11:36 op710n3 kernel: [ 1665.085948] pfm_power5_read_pmd pid=2556
cnum=2 result=3
126 May 20 13:11:36 op710n3 kernel: [ 1665.085951] pfm_power5_read_pmd pid=2556
cnum=3 result=4679
127 May 20 13:11:36 op710n3 kernel: [ 1665.085954] pfm_power5_read_pmd pid=2556
cnum=5 result=1218670811
128 May 20 13:11:36 op710n3 kernel: [ 1665.085958] == __pfm_ctxswout_thread
pid=2556 task=c0000000e6948240 ctx=c0000000e6cb6000 need_save_pmds=0
129
130 # ctxswin (value for cnum=5 is restored to 1207034044 !!!)
131
132 May 20 13:11:36 op710n3 kernel: [ 1665.085995] pfm_arch_restore_pmds
pid=2557 ctx=c0000000e6cb6000
133 May 20 13:11:36 op710n3 kernel: [ 1665.085998] pfm_power5_write_pmd
pid=2557 cnum=2 value=3
134 May 20 13:11:36 op710n3 kernel: [ 1665.086001] pfm_power5_write_pmd
pid=2557 cnum=3 value=4642
135 May 20 13:11:36 op710n3 kernel: [ 1665.086005] pfm_power5_write_pmd
pid=2557 cnum=5 value=1207034044
136 May 20 13:11:36 op710n3 kernel: [ 1665.086008] pfm_arch_ctxswin_thread
pid=2557 task=c0000000e6948240 ctx=c0000000e6cb6000
------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables
unlimited royalty-free distribution of the report engine
for externally facing server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel