Re: [perfmon2] Regression in perfmon2 in 2.6.30 for Power (possibly others)

stephane eranian Fri, 28 Aug 2009 07:02:48 -0700

Corey,

On Fri, Aug 28, 2009 at 3:05 AM, Corey
Ashford<[email protected]> wrote:
> Hi Stephane,
>
> I have made some progress in tracking this problem down.  The big picture is
> that pfm_arch_ctxswin_thread is never getting called, so when the thread is
> switched out, and then back in again at some point, the PMU context is not
> getting restored onto the PMU registers, causing the counters to stop till
> the end of the run.
>
> pfm_arch_ctxswin_thread is not getting called because of the following code
> in perfmon_ctxsw.c:
>        /*
>         * TIF flag was removed since switch_to
>         * context is detaching, skip everything,
>         * keep oncpu=-1
>         */
>        if (!test_thread_flag(TIF_PERFMON_CTXSW))
>                goto skip_all;
>
> Apparently the TIF_PERFMON_CTXSW flag is always cleared.  I haven't tracked
> any farther back than this yet, but was hoping this might trigger a thought
> or two in your mind as to what might be going on.
>


TIF_PERFMON_CTXSW is only set in pfm_preload_context(). If you are testing
with self.c I don't see how this can be happening at this point. I
think you have
to instrument the places where the flag gets cleared.



> I also noticed that this code appears to have changed from 2.6.29 to 2.6.30.
>
> Anyway, I'd appreciate any thoughts you might have on this.  I may not get
> back to looking at this till Monday afternoon, so no huge rush.
>
> Thanks for your consideration,
>
> - Corey
>
> stephane eranian wrote:
>>
>> Corey,
>>
>> On Wed, Aug 26, 2009 at 1:55 AM, Corey
>> Ashford<[email protected]> wrote:
>>>
>>> Corey Ashford wrote:
>>>>
>>>> stephane eranian wrote:
>>>>>
>>>>> On Mon, Aug 24, 2009 at 8:48 PM, Corey
>>>>> Ashford<[email protected]> wrote:
>>>>>>
>>>>>> stephane eranian wrote:
>>>>>>>
>>>>>>> Corey,
>>>>>>>
>>>>>> [snip]
>>>>>>>
>>>>>>> Here are a couple of tests you could try and run to narrow it down:
>>>>>>>  - taskset -c 0 self
>>>>>>>  - syst
>>>>>>>
>>>>>> "taskset -c 0 self" doesn't improve the behavior.  The results are
>>>>>> still
>>>>>> all
>>>>>> over the place.
>>>>>>
>>>>> That's strange, must be something really central.
>>>>> You need to enable debugging. Careful as this has changed again in
>>>>> 2.6.30
>>>>> because of the dynamic_printk stuff. The good thing is that now you can
>>>>> turn on/off individual printk.
>>>>
>>>> I'm not familiar with dynamic_printk, so that will take some research.
>>>>
>>>>>> "syst" is giving me an error, which may be something completely
>>>>>> unrelated:
>>>>>>
>>>>>> [r...@elm3c4 examples_v2.x]# ./syst
>>>>>> cannot set affinity to CPU0: Invalid argument
>>>>>>
>>>>> Weird. You have a CPU0, don't you?
>>>>
>>>> Yes :)  I'm still debugging this to figure out what's going on.  No
>>>> results yet
>>>> (took me awhile to get systemtap running due to many pilot errors)
>>>
>>> Ok, I tracked the syst problem down.  There is an error in syst.c which
>>> manifests itself on big-endian machines when syst.c is compiled in 32-bit
>>> mode.
>>>
>>> The bit vector which is used to describe the cpus that you want to set
>>> the
>>> affinity for is an array of 32-bit words (when using the
>>> compat_sys_sched_setaffinity system call in 32-bit mode).  syst programs
>>> a
>>> vector of 64-bit words.  On a little endian machine, this wouldn't
>>> matter,
>>> because the least significant byte of the 32-bit or 64-bit word is always
>>> at
>>> offset 0.  But on a big-endian machine, the least significant byte is at
>>> offset 0x3 or 0x7 depending on the word size.  So the result is that the
>>> bit
>>> vector is interpreted as setting the affinity for a cpu which does not
>>> exist.
>>>
>> I think nowdays, we should simply use the libc cpu_set and call the
>> regular sched_setaffinity() instead of having a custom version. That
>> was from a long time ago. Hopefully, the official API will work on 32-bit
>> big-endian systems.
>>
>>> There are a couple of ways to fix this, and I will post a patch which
>>> contains both versions.
>>>
>>> So, after fixing this problem, syst does produce reliable results on
>>> 2.6.30.
>>>  So I am assuming now that this the problem with the self test (and
>>> others)
>>> is that something is messed up with the per-thread context code.
>>>
>> Yes, most likely. That is why I asked you to try taskset -c 0 self to
>> avoid
>> switching from one CPU to another. But obviously you can be switched in
>> and out.
>>
>>
>>> I will be start working on this.
>>>
>>> - Corey
>>>
>>>
>
> --
> Regards,
>
> - Corey
>
> Corey Ashford
> Software Engineer
> IBM Linux Technology Center, Linux Toolchain
> Beaverton, OR
> 503-578-3507
> [email protected]
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Regression in perfmon2 in 2.6.30 for Power (possibly others)

Reply via email to