Re: [perfmon2] Pfmon and Hpcrun give inconsistent results.

Philip Mucci Wed, 16 Apr 2008 02:23:35 -0700

Folks,

hpcrun does it's sampling inside the target process using first person  
access, not a 3rd person ptrace() like pfmon, so the process is  
implicitly blocked when processing samples, i.e. there are no dropped  
samples unless something else has gone wrong.


Another thing, you cannot rely on the sample count of hpcrun to  
compute cycles. Why? Because those are samples that only have not been  
dropped. If samples occur ourside of the sample space (as can happen  
when one has floating point exceptions), the address will be in kernel  
space and it will be dropped. pfmon has no concept of filtering out  
addresses, so even if you ask for user-space samples, you'll still get  
samples in the output with kernel addresses. I'm not sure what the  
default is for your version of pfmon.

Which value is correct, according to /bin/time? 2Billion or 154 Billion?

Phil

On Apr 15, 2008, at 10:33 PM, [EMAIL PROTECTED] wrote:

> Stephane
>
> Thanks very much for the explanation.
>
> Just the other day I was wondering how the code in perfmon suspended  
> the
> monitored process
> while the monitoring was suspended (masked).  Now that I find out it  
> allows
> the monitored process
> to continue executing what is happening becomes much more clear.
>
> In the first work around you suggested when you say "make the sampling
> buffer much bigger" I assume
> you are referring to the sampling period which is the counter that  
> controls
> how often the overflows occur.
> If this is the case I have experimented with this value (have tried 1
> million and 1 billion) but it does not
> seem to change the results very much.
>
> I will use pfmon with sampling and with sampling using process  
> blocking to
> verify the results.
>
> I have looked at the hpcrun help and it does not seem to provide the
> ability to block the process
> when an overflow occurs.  I may need to implement an option like  
> this in
> hpcrun to resolve this
> problem for our customer.
>
> If pfmon with sampling behaves as you suggest (and my bets say it  
> will)
> then I think we have two
> choices.  We can either wait for your double buffered version of  
> perfmon or
> enhance the hpcrun
> to support the ability to block the process while monitoring is  
> suspended.
>
> I know that I hate to make estimates as to when I will have  
> something done
> so I will understand if you
> do not want to go there but, any idea when the double buffered  
> perfmon may
> be available ?
>
> Gary
>
>
> "stephane eranian" <[EMAIL PROTECTED]> wrote on 04/15/2008  
> 09:59:10
> AM:
>
>> Gary,
>>
>> On Tue, Apr 15, 2008 at 4:26 PM,  <[EMAIL PROTECTED]> wrote:
>>> Stephane
>>>
>>> Well you guessed right on both counts.  The /proc/perfmon shows the
> version
>>> is 2.0.
>>>
>>> The pfmon command looks like this:
>>> pfmon --debug -v -e CPU_CYCLES ./code.exe >pfmon 2>pfmon.debug
>>>
>>> The hpcrun command looks like this:
>>> hpcrun -e CPU_CYCLES:32767 -o hpcrun.data -- ./code.exe >hpcrun
>>> 2>hpcrun.debug
>>>
>>> So in both cases I run it as a tool that monitors another  
>>> process(does
> not
>>> use self monitoring).
>>>
>>> I fail to see how this can account for the differences but am very
>>> interested in the explanation.
>>>
>> I suspect that if you use pfmon in sampling mode as well, you will
>> see the same discrepancy:
>>
>>    pfmon -ecpu_cycles --long-smpl-periods=32767
>> --smpl-outfile=pfmon.data ./code.exe
>>
>>
>> The reason for the big difference is that there exists a blind spot
>> with sampling. When
>> the sampling buffer fills up, monitoring is stopped BUT the monitored
>> process keeps
>> on running by default. So you are actually missing parts of the
>> execution. This is a well
>> known issue with sampling buffers. The current default sampling  
>> buffer
>> format used by
>> perfmon is very simple, too simple actually. What you need is a  
>> format
>> that implements
>> a double-buffer. I have released a simple implementation of this as a
>> proof-of-concept.
>> It is not in the main GIT tree yet.
>>
>> In the meantime you have 2 workarounds possible:
>>    - make the sampling buffer much bigger. You'd have to look at the
>> hpcrun options
>>      maybe they offer a way for you to grow the buffer. In pfmon you
>> can experiment
>>      with this using the --smpl-entries options. If you get an error
>> message, check
>>      your resource limits with ulimit and try increasing the locked
>> memory (ulimit -l unlimited).
>>
>>   - have perfmon blocked the monitored process when the sampling
>> buffer fills up. This can
>>     be accomplished with pfmon using the --overflow-block option.
>> Don't know if hpcrun has
>>     an option for this. Careful, though, as this option is known to
>> have issues with processes
>>     using signals internally.
>>
>> Hope this clarifies the issue you are seeing.
>>
>>
>>
>>> As far as I know, this particular customer application is the only  
>>> one
> we
>>> have found that produces
>>> inconsistent results.  All other executables that I have run these
> tools
>>> against seem to produce
>>> counts for CPU_CYCLES that are very close.
>>>
>>> Please tell me more.
>>>
>>>
>>> [EMAIL PROTECTED] wrote on 04/12/2008
> 12:05:36
>>> AM:
>>>
>>>
>>>
>>>> Gary,
>>>>
>>>> I suspect you are running the stock perfmon as shipped with 2.6.18,
>>>> i.e., v2.0.
>>>> You can find out in /proc/perfmon.
>>>>
>>>> I would need the cmdline options used for pfmon.
>>>>
>>>> As for HPCRUN, I would need to know how this is run. In particular
>>>> whether this is a self-monitoring run or just like pfmon, a tool
>>>> monitoring another thread.
>>>> I suspect the latter which could explain the differences you are
> seeing.
>>>>
>>>> On Fri, Apr 11, 2008 at 8:04 PM,  <[EMAIL PROTECTED]> wrote:
>>>>> Stephane
>>>>>
>>>>> Our system is running:
>>>>>
>>>>> MODEL ia64   [type=ia64]
>>>>> CPU   8 x Itanium 2, 64 bits  1600.000442 Mhz
>>>>> MEM   8219456 kB  real memory
>>>>> OS    Bull Linux Advanced Server release 4 (V5) - kernel
>>> 2.6.18-B64k.1.7
>>>>>
>>>>> This kernel is based on the 2.6.18 kernel but has Bull specific
>>> patches
>>>>> included in it.
>>>>>
>>>>> Since perfmon is included in the kernel I do not know how to find
> its
>>>>> version.  I would
>>>>> expect that we are running the one that comes with the 2.6.18
> kernel.
>>> If
>>>>> you can tell me
>>>>> how to find a version for perfmon I will get it for you.  In
> addition
>>> if
>>>>> you can provide me
>>>>> with a list of the modules that make up perfmon, I can checkto
> see if
>>> Bull
>>>>> has made
>>>>> any patches to those modules.  I know that we have not yet
> installed
>>> the
>>>>> perfmon2
>>>>> kernel patches.  This is on our list to try but has not beendone
> yet.
>>>>>
>>>>> The value of 154 billion CPU_CYCLES is the approximate value
> reported
>>> by
>>>>> PFMON in its stdout.
>>>>>
>>>>> The value of 2 billion is the approximate result when I multiply
> the
>>> total
>>>>> number of samples reported by
>>>>> HPCPROF (about 68000) times the sampling period used in the
> HPCRUN
>>> (32767).
>>>>> As a point of interest
>>>>> the contents of /proc/interrupts also shows about 68000 perfmon
>>> interrupts
>>>>> occur during the HPCRUN.
>>>>>
>>>>> I will send the kernel debug data for both the PFMON and HPCRUN
> tests
>>> to
>>>>> your googlemail account
>>>>> in a separate email.
>>>>>
>>>>> At this point if you can just point me in the right direction and
>>> suggest
>>>>> some things to look for I will be
>>>>> a happy camper.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Gary
>>>>>
>>>>>
>>>>> "stephane eranian" <[EMAIL PROTECTED]> wrote on 04/10/2008
>>> 12:23:22
>>>>> PM:
>>>>>
>>>>>
>>>>>
>>>>>> Gary,
>>>>>>
>>>>>> On Wed, Apr 9, 2008 at 1:18 AM,  <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>> I have a customer who has an application that when run under
>>> pfmon
>>>>> reports
>>>>>>> 154 billion CPU_CYCLES used (appears to be a reasonable
> value).
>>> When
>>>>> this
>>>>>>> same application is run under Hpcrun (from HPCToolkit using
> PAPI)
>>> it
>>>>> only
>>>>>>> reports about 2 billion CPU_CYCLES used.  These tests are
> run on
>>> an
>>>>> Intel
>>>>>>> IA64 platform.
>>>>>>>
>>>>>> You need to tell me which kernel version, which perfmon
> version.
>>>>>>
>>>>>> Also how did you calculate those 2 numbers? What this simlpe
>>> counting and
>>>>>> derived from the samples you are getting.
>>>>>>
>>>>>> The 'losing interrupts' should not affect you because it is
> related
>>>>>> to handling
>>>>>> of signals in multi-threaded programs.
>>>>>>
>>>>>>
>>>>>> As for the log mail them to me directly.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>> This application runs as a single thread and does not set a
>>> signal
>>>>> handler
>>>>>>> or mask the SIGIO signal. Hpcrun produces 8 data output
> files
>>> when run
>>>>> on
>>>>>>> this application.  One for the application itself, 4 for
> bash
>>> scripts
>>>>> the
>>>>>>> application runs, 2 for 'rm' commands the application
> executes
>>> and 1
>>>>> for a
>>>>>>> gzip command it runs.
>>>>>>>
>>>>>>> The customer wants to know why Hpcrun only reports a little
> over
>>> 1% of
>>>>> the
>>>>>>> cpu
>>>>>>> cycles used.  I have been trying to compare what pfmon does
> to
>>> what
>>>>> hpcrun
>>>>>>> does
>>>>>>> and it seems that the only debug data available for both
> runs is
>>> the
>>>>> kernel
>>>>>>> debug
>>>>>>> data written by perfmon.  This data clearly shows that
>>> Hpcrun/Papi is
>>>>> using
>>>>>>> the perfmon services differently than pfmon does.  I tried
> to
>>> attach
>>>>> the
>>>>>>> debug output for these two runs to this mail but that
> exceeded
>>> the
>>>>> allowed
>>>>>>> message
>>>>>>> size for the list.
>>>>>>>
>>>>>>> I tried adding code (as a test case) to the Papi signal
> handler
>>> to
>>>>> count
>>>>>>> and print
>>>>>>> the number of signals paid during the run.  The values
> printed
>>> seemed
>>>>> to
>>>>>>> pretty
>>>>>>> much match the values reported as number of samples when
> hpcprof
>>> is
>>>>> run on
>>>>>>> the
>>>>>>> hpcrun data files.  This was an attempt to detect if my
> problem
>>> was
>>>>>>> handling signals
>>>>>>> or getting them and I think this test showed the problem is
> in
>>> getting
>>>>>>> them.
>>>>>>>
>>>>>>> I have also browsed this mailing list and found a thread
> called
>>>>>>> "papi on compute node linux" which was last updated
> 2008-03-10.
>>> The
>>>>>>> discussion in this thread sounds to me like it could easily
>>> explain
>>>>> what
>>>>>>> I am seeing.
>>>>>>>
>>>>>>> Is there a way I can determine if this discussion (ie:
> loosing
>>>>> interrupts)
>>>>>>> is what I am seeing ?
>>>>>>>
>>>>>>> Thanks for any help you can provide.
>>>>>>>
>>>>>>> Gary
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
> -------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by the 2008 JavaOne(SM)
> Conference
>>>>>>> Don't miss this year's exciting event. There's still time to
> save
>>>>> $100.
>>>>>>> Use priority code J8TL2D2.
>>>>>>>
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.
>>>>>> sun.com/javaone
>>>>>>> _______________________________________________
>>>>>>> perfmon2-devel mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>
> -------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>>> Don't miss this year's exciting event. There's still time to save
>>> $100.
>>>>> Use priority code J8TL2D2.
>>>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.
>>>> sun.com/javaone
>>>>> _______________________________________________
>>>>> perfmon2-devel mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>>>>>
>>>>
>>>>
> -------------------------------------------------------------------------
>>>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>> Don't miss this year's exciting event. There's still time to save
> $100.
>>>> Use priority code J8TL2D2.
>>>>
>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.
>> sun.com/javaone
>>>
>>>> _______________________________________________
>>>> perfmon2-devel mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>>>
>>>
>>>
> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>> Don't miss this year's exciting event. There's still time to save
> $100.
>>> Use priority code J8TL2D2.
>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.
>> sun.com/javaone
>>> _______________________________________________
>>> perfmon2-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>>>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save  
> $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> perfmon2-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Pfmon and Hpcrun give inconsistent results.

Reply via email to