Corey, On Fri, Aug 28, 2009 at 3:05 AM, Corey Ashford<cjash...@linux.vnet.ibm.com> wrote: > Hi Stephane, > > I have made some progress in tracking this problem down. The big picture is > that pfm_arch_ctxswin_thread is never getting called, so when the thread is > switched out, and then back in again at some point, the PMU context is not > getting restored onto the PMU registers, causing the counters to stop till > the end of the run. > > pfm_arch_ctxswin_thread is not getting called because of the following code > in perfmon_ctxsw.c: > /* > * TIF flag was removed since switch_to > * context is detaching, skip everything, > * keep oncpu=-1 > */ > if (!test_thread_flag(TIF_PERFMON_CTXSW)) > goto skip_all; > > Apparently the TIF_PERFMON_CTXSW flag is always cleared. I haven't tracked > any farther back than this yet, but was hoping this might trigger a thought > or two in your mind as to what might be going on. >
TIF_PERFMON_CTXSW is only set in pfm_preload_context(). If you are testing with self.c I don't see how this can be happening at this point. I think you have to instrument the places where the flag gets cleared. > I also noticed that this code appears to have changed from 2.6.29 to 2.6.30. > > Anyway, I'd appreciate any thoughts you might have on this. I may not get > back to looking at this till Monday afternoon, so no huge rush. > > Thanks for your consideration, > > - Corey > > stephane eranian wrote: >> >> Corey, >> >> On Wed, Aug 26, 2009 at 1:55 AM, Corey >> Ashford<cjash...@linux.vnet.ibm.com> wrote: >>> >>> Corey Ashford wrote: >>>> >>>> stephane eranian wrote: >>>>> >>>>> On Mon, Aug 24, 2009 at 8:48 PM, Corey >>>>> Ashford<cjash...@linux.vnet.ibm.com> wrote: >>>>>> >>>>>> stephane eranian wrote: >>>>>>> >>>>>>> Corey, >>>>>>> >>>>>> [snip] >>>>>>> >>>>>>> Here are a couple of tests you could try and run to narrow it down: >>>>>>> - taskset -c 0 self >>>>>>> - syst >>>>>>> >>>>>> "taskset -c 0 self" doesn't improve the behavior. The results are >>>>>> still >>>>>> all >>>>>> over the place. >>>>>> >>>>> That's strange, must be something really central. >>>>> You need to enable debugging. Careful as this has changed again in >>>>> 2.6.30 >>>>> because of the dynamic_printk stuff. The good thing is that now you can >>>>> turn on/off individual printk. >>>> >>>> I'm not familiar with dynamic_printk, so that will take some research. >>>> >>>>>> "syst" is giving me an error, which may be something completely >>>>>> unrelated: >>>>>> >>>>>> [r...@elm3c4 examples_v2.x]# ./syst >>>>>> cannot set affinity to CPU0: Invalid argument >>>>>> >>>>> Weird. You have a CPU0, don't you? >>>> >>>> Yes :) I'm still debugging this to figure out what's going on. No >>>> results yet >>>> (took me awhile to get systemtap running due to many pilot errors) >>> >>> Ok, I tracked the syst problem down. There is an error in syst.c which >>> manifests itself on big-endian machines when syst.c is compiled in 32-bit >>> mode. >>> >>> The bit vector which is used to describe the cpus that you want to set >>> the >>> affinity for is an array of 32-bit words (when using the >>> compat_sys_sched_setaffinity system call in 32-bit mode). syst programs >>> a >>> vector of 64-bit words. On a little endian machine, this wouldn't >>> matter, >>> because the least significant byte of the 32-bit or 64-bit word is always >>> at >>> offset 0. But on a big-endian machine, the least significant byte is at >>> offset 0x3 or 0x7 depending on the word size. So the result is that the >>> bit >>> vector is interpreted as setting the affinity for a cpu which does not >>> exist. >>> >> I think nowdays, we should simply use the libc cpu_set and call the >> regular sched_setaffinity() instead of having a custom version. That >> was from a long time ago. Hopefully, the official API will work on 32-bit >> big-endian systems. >> >>> There are a couple of ways to fix this, and I will post a patch which >>> contains both versions. >>> >>> So, after fixing this problem, syst does produce reliable results on >>> 2.6.30. >>> So I am assuming now that this the problem with the self test (and >>> others) >>> is that something is messed up with the per-thread context code. >>> >> Yes, most likely. That is why I asked you to try taskset -c 0 self to >> avoid >> switching from one CPU to another. But obviously you can be switched in >> and out. >> >> >>> I will be start working on this. >>> >>> - Corey >>> >>> > > -- > Regards, > > - Corey > > Corey Ashford > Software Engineer > IBM Linux Technology Center, Linux Toolchain > Beaverton, OR > 503-578-3507 > cjash...@us.ibm.com > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel