On Tue, May 11, 2010 at 4:48 PM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Tue, 2010-05-11 at 16:04 +0200, Stephane Eranian wrote:
>> Hi,
>>
>>
>> I am confused by the inheritance cmd line option of perf record:
>>
>> $ perf record -h
>>  usage: perf record [<options>] [<command>]
>>     or: perf record [<options>] -- <command> [<options>]
>>
>>     -e, --event <event>   event selector. use 'perf list' to list
>> available events
>>         --filter <filter>
>>                           event filter
>>     -p, --pid <n>         record events on existing process id
>>     -t, --tid <n>         record events on existing thread id
>>     -r, --realtime <n>    collect data with this RT SCHED_FIFO priority
>>     -R, --raw-samples     collect raw sample records from all opened counters
>>     -a, --all-cpus        system-wide collection from all CPUs
>>     -A, --append          append to the output file to do incremental 
>> profiling
>>     -C, --profile_cpu <n>
>>                           CPU to profile on
>>     -f, --force           overwrite existing data file (deprecated)
>>     -c, --count           event period to sample
>>     -o, --output <file>   output file name
>>     -i, --inherit         child tasks inherit counters
>>
>> This leads to believe that by default inheritance in children is off.
>>
>> However, builtin-record.c says:
>>
>> static bool                     inherit                         =   true;
>>
>> If that's the case, what's the point of the -i option?
>
> Right, I think we should invert that, does --no-inherit work?
>
>> Another side effect of inheritance is that in per-thread mode,
>> perf creates as many "sessions" as you have CPUs. So
>> on a 16-way processor, sampling on cycles, perf creates
>> 16 events and 16 x 2-page sampling buffers. That's a lot of
>> resources consumed if I am just interested in monitoring
>> a single-threaded workload.
>
> Right, but I think the default of inherit is right, and once you do that
> you basically have to do the per-task-per-cpu thing, otherwise your
> fancy 16-way will start spending most of its time in cacheline bounces.
>
In that case, don't you think you should also ensure that the buffer is
allocated on the NUMA node of the designated per-thread-per-cpu?
I don't think it is the case today.

------------------------------------------------------------------------------

_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to