Re: [Python-Dev] Support for Linux perf

2014-11-23 Thread Francis Giraldeau
2014-11-22 7:44 GMT-05:00 Julian Taylor jtaylor.deb...@googlemail.com:

 On 17.11.2014 23:09, Francis Giraldeau wrote:
  Hi,
  ...
  The PEP-418 is about performance counters, but there is no mention o
  Anyway, I think we must change CPython to support tools such as perf.
  Any thoughts?
 

 there are some patches available adding systemtap and dtrace probes,
 which should at least help getting function level profiles:

 http://bugs.python.org/issue21590


Thanks for these links, the patches looks interesting.

As Jonas mentioned, Perf should be able to unwind a Python stack. It does
at the interpreter level, and the frame info is scattered in virtual
memory. It needs to be access offline.

I think it could be possible to use the function entry and exit hooks in
the interpreter to save important frame info, such as function name, file
and line number, in a memory map known to perf. Then, we can tell Perf to
record this compact zone of data in the sample as extra field for offline
use. Then, at the analysis time, each ELF interpreter frame could be
matched with the corresponding Python frame info. I think the perf handler
can't sleep, and accesses on each function entry/exit will also ensure the
page is present in main memory when the sample is recorded.

Thanks again for your inputs, I'll post any further developments.

Cheers,

Francis
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support for Linux perf

2014-11-22 Thread Julian Taylor
On 17.11.2014 23:09, Francis Giraldeau wrote:
 Hi, 
 ...
 The PEP-418 is about performance counters, but there is no mention o
 Anyway, I think we must change CPython to support tools such as perf.
 Any thoughts? 
 

there are some patches available adding systemtap and dtrace probes,
which should at least help getting function level profiles:

http://bugs.python.org/issue21590

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support for Linux perf

2014-11-21 Thread Jonas Wagner
Hi,

Anyway, I think we must change CPython to support tools such as perf. Any
 thoughts?


Not many thoughts, other than that it would be nice to be able to use a
sampling profiler on Python code. I think this would especially benefit
applications that use libraries written in C, or applications that call
external commands. It would also be useful if you're interested in other
metrics than time (e.g., page faults).

Python does have support for profiling, though, in the cProfile module.

The cleanest solution for this might be to add some sort of plugin support
to Perf. Each plugin could teach Perf how to unwind a certain stack. I'm
thinking this because the problem is not specific to Python. Any
higher-level language would benefit from mapping the low-level instruction
pointer and C stack back to higher-level function calls. Databases might
use something like this to individual transactions or compiled SQL
queries...

These plugins would be quite closely tied to a particular interpreter
implementation. You only want them in certain circumstances, and it should
be possible to turn them off, e.g., when you want to profile the
interpreter itself.

On the other hand, this strays somewhat far from what Perf was designed
for. Maybe a custom stack walker could be more easily implemented in
something like SystemTap.

Best,
Jonas
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Support for Linux perf

2014-11-17 Thread Francis Giraldeau
Hi,

The PEP-418 is about performance counters, but there is no mention of
performance management unit (PMU) counters, such as cache misses and
instruction counts.

The Linux perf tool aims at recording these samples at the system level. I
ran linux perf on CPython for profiling. The resulting callstack is inside
libpython.so, mostly recursive calls to PyEval_EvalFrameEx(), because the
tool works at the ELF level. Here is an example with a dummy program
(linux-tools on Ubuntu 14.04):

$ perf record python crunch.py
$ perf report --stdio
# Overhead  Command   Shared ObjectSymbol
#   ...  ..  
#
32.37%   python  python2.7   [.] PyEval_EvalFrameEx
13.70%   python  libm-2.19.so[.] __sin_avx
 5.25%   python  python2.7   [.] binary_op1.5010
 4.82%   python  python2.7   [.] PyObject_GetAttr

While this may be insightful for the interpreter developers, it it not so
for the average Python developer. The report should display Python code
instead. It seems obvious, still I haven't found the feature for that.

When a performance counter reaches a given value, a sample is recorded. The
most basic sample only records a timestamps, thread ID and the program
counter (%rip). In addition, all executable memory maps of libraries are
recorded. For the callstack, frame pointers are traversed, but most of the
time, they are optimized on x86, so there is a fall back to unwind, which
requires saving register values and a chunk of the stack. The memory space
of the process is reconstructed offline.

CPython seems to allocates code and frames on mmap() pages. If the data is
outside about 1k from the top of stack, it is not available offline in the
trace. We need some way to reconstitute this memory space of the
interpreter to resolve the symbols, probably by  dumping the data on disk.

In Java, there is a small HotSpot agent that spits out the symbols of JIT
code:

https://github.com/jrudolph/perf-map-agent

The problem is that CPython does not JIT code, and executed code is the ELF
library itself. The executed frames are parameters of functions of the
interpreter. I don't think the same approach can be used (maybe this can be
applied to PyPy?).

I looked at how Python frames are handled in GDB
(file cpython/Tools/gdb/libpython.py). A python frame is detected in
Frame(gdbframe).is_evalframeex() by a C call to PyEval_EvalFrameEx().
However, the traceback accesses PyFrameObject on the heap (at least for
f-f_back = 0xa57460), which is possible in GDB when the program is paused
and the whole memory space is available, but is not recorded for offline
use in perf. Here is an example of callstack from GDB:

#0  PyEval_EvalFrameEx (f=Frame 0x77f1b060, for file crunch.py, line 7,
in bar (num=466829),
throwflag=0) at ../Python/ceval.c:1039
#1  0x00527877 in fast_function (func=function at remote
0x76ec45a0,
pp_stack=0x7fffd280, n=1, na=1, nk=0) at ../Python/ceval.c:4106
#2  0x00527582 in call_function (pp_stack=0x7fffd280, oparg=1)
at ../Python/ceval.c:4041


We could add a kernel module that knows how to make samples of CPython,
but it means python structures becomes sort of ABI, and kernel devs won't
allow a python interpreter in kernel mode ;-).

What we really want is f_code data and related objects:

(gdb) print (void *)(f-f_code)
$8 = (void *) 0x77e370f0

Maybe we could save these pages every time some code is loaded from the
interpreter? (the memory range is about 1.7MB, but )

Anyway, I think we must change CPython to support tools such as perf. Any
thoughts?

Cheers,

Francis
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com