Hi Stephane,

I have made some progress in tracking this problem down.  The big picture is 
that pfm_arch_ctxswin_thread is never getting called, so when the thread is 
switched out, and then back in again at some point, the PMU context is not 
getting restored onto the PMU registers, causing the counters to stop till
the end of the run.

pfm_arch_ctxswin_thread is not getting called because of the following code in 
perfmon_ctxsw.c:
         /*
          * TIF flag was removed since switch_to
          * context is detaching, skip everything,
          * keep oncpu=-1
          */
         if (!test_thread_flag(TIF_PERFMON_CTXSW))
                 goto skip_all;

Apparently the TIF_PERFMON_CTXSW flag is always cleared.  I haven't tracked any 
farther back than this yet, but was hoping this might trigger a thought or two 
in your mind as to what might be going on.

I also noticed that this code appears to have changed from 2.6.29 to 2.6.30.

Anyway, I'd appreciate any thoughts you might have on this.  I may not get back 
to looking at this till Monday afternoon, so no huge rush.

Thanks for your consideration,

- Corey

stephane eranian wrote:
> Corey,
> 
> On Wed, Aug 26, 2009 at 1:55 AM, Corey
> Ashford<cjash...@linux.vnet.ibm.com> wrote:
>> Corey Ashford wrote:
>>> stephane eranian wrote:
>>>> On Mon, Aug 24, 2009 at 8:48 PM, Corey
>>>> Ashford<cjash...@linux.vnet.ibm.com> wrote:
>>>>> stephane eranian wrote:
>>>>>> Corey,
>>>>>>
>>>>> [snip]
>>>>>> Here are a couple of tests you could try and run to narrow it down:
>>>>>>   - taskset -c 0 self
>>>>>>   - syst
>>>>>>
>>>>> "taskset -c 0 self" doesn't improve the behavior.  The results are still
>>>>> all
>>>>> over the place.
>>>>>
>>>> That's strange, must be something really central.
>>>> You need to enable debugging. Careful as this has changed again in 2.6.30
>>>> because of the dynamic_printk stuff. The good thing is that now you can
>>>> turn on/off individual printk.
>>> I'm not familiar with dynamic_printk, so that will take some research.
>>>
>>>>> "syst" is giving me an error, which may be something completely
>>>>> unrelated:
>>>>>
>>>>> [r...@elm3c4 examples_v2.x]# ./syst
>>>>> cannot set affinity to CPU0: Invalid argument
>>>>>
>>>> Weird. You have a CPU0, don't you?
>>> Yes :)  I'm still debugging this to figure out what's going on.  No
>>> results yet
>>> (took me awhile to get systemtap running due to many pilot errors)
>> Ok, I tracked the syst problem down.  There is an error in syst.c which
>> manifests itself on big-endian machines when syst.c is compiled in 32-bit
>> mode.
>>
>> The bit vector which is used to describe the cpus that you want to set the
>> affinity for is an array of 32-bit words (when using the
>> compat_sys_sched_setaffinity system call in 32-bit mode).  syst programs a
>> vector of 64-bit words.  On a little endian machine, this wouldn't matter,
>> because the least significant byte of the 32-bit or 64-bit word is always at
>> offset 0.  But on a big-endian machine, the least significant byte is at
>> offset 0x3 or 0x7 depending on the word size.  So the result is that the bit
>> vector is interpreted as setting the affinity for a cpu which does not
>> exist.
>>
> I think nowdays, we should simply use the libc cpu_set and call the
> regular sched_setaffinity() instead of having a custom version. That
> was from a long time ago. Hopefully, the official API will work on 32-bit
> big-endian systems.
> 
>> There are a couple of ways to fix this, and I will post a patch which
>> contains both versions.
>>
>> So, after fixing this problem, syst does produce reliable results on 2.6.30.
>>  So I am assuming now that this the problem with the self test (and others)
>> is that something is messed up with the per-thread context code.
>>
> Yes, most likely. That is why I asked you to try taskset -c 0 self to avoid
> switching from one CPU to another. But obviously you can be switched in
> and out.
> 
> 
>> I will be start working on this.
>>
>> - Corey
>>
>>

-- 
Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjash...@us.ibm.com

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to