Re: [Python-Dev] Support for Linux perf
2014-11-22 7:44 GMT-05:00 Julian Taylor jtaylor.deb...@googlemail.com: On 17.11.2014 23:09, Francis Giraldeau wrote: Hi, ... The PEP-418 is about performance counters, but there is no mention o Anyway, I think we must change CPython to support tools such as perf. Any thoughts? there are some patches available adding systemtap and dtrace probes, which should at least help getting function level profiles: http://bugs.python.org/issue21590 Thanks for these links, the patches looks interesting. As Jonas mentioned, Perf should be able to unwind a Python stack. It does at the interpreter level, and the frame info is scattered in virtual memory. It needs to be access offline. I think it could be possible to use the function entry and exit hooks in the interpreter to save important frame info, such as function name, file and line number, in a memory map known to perf. Then, we can tell Perf to record this compact zone of data in the sample as extra field for offline use. Then, at the analysis time, each ELF interpreter frame could be matched with the corresponding Python frame info. I think the perf handler can't sleep, and accesses on each function entry/exit will also ensure the page is present in main memory when the sample is recorded. Thanks again for your inputs, I'll post any further developments. Cheers, Francis ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for Linux perf
On 17.11.2014 23:09, Francis Giraldeau wrote: Hi, ... The PEP-418 is about performance counters, but there is no mention o Anyway, I think we must change CPython to support tools such as perf. Any thoughts? there are some patches available adding systemtap and dtrace probes, which should at least help getting function level profiles: http://bugs.python.org/issue21590 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for Linux perf
Hi, Anyway, I think we must change CPython to support tools such as perf. Any thoughts? Not many thoughts, other than that it would be nice to be able to use a sampling profiler on Python code. I think this would especially benefit applications that use libraries written in C, or applications that call external commands. It would also be useful if you're interested in other metrics than time (e.g., page faults). Python does have support for profiling, though, in the cProfile module. The cleanest solution for this might be to add some sort of plugin support to Perf. Each plugin could teach Perf how to unwind a certain stack. I'm thinking this because the problem is not specific to Python. Any higher-level language would benefit from mapping the low-level instruction pointer and C stack back to higher-level function calls. Databases might use something like this to individual transactions or compiled SQL queries... These plugins would be quite closely tied to a particular interpreter implementation. You only want them in certain circumstances, and it should be possible to turn them off, e.g., when you want to profile the interpreter itself. On the other hand, this strays somewhat far from what Perf was designed for. Maybe a custom stack walker could be more easily implemented in something like SystemTap. Best, Jonas ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Support for Linux perf
Hi, The PEP-418 is about performance counters, but there is no mention of performance management unit (PMU) counters, such as cache misses and instruction counts. The Linux perf tool aims at recording these samples at the system level. I ran linux perf on CPython for profiling. The resulting callstack is inside libpython.so, mostly recursive calls to PyEval_EvalFrameEx(), because the tool works at the ELF level. Here is an example with a dummy program (linux-tools on Ubuntu 14.04): $ perf record python crunch.py $ perf report --stdio # Overhead Command Shared ObjectSymbol # ... .. # 32.37% python python2.7 [.] PyEval_EvalFrameEx 13.70% python libm-2.19.so[.] __sin_avx 5.25% python python2.7 [.] binary_op1.5010 4.82% python python2.7 [.] PyObject_GetAttr While this may be insightful for the interpreter developers, it it not so for the average Python developer. The report should display Python code instead. It seems obvious, still I haven't found the feature for that. When a performance counter reaches a given value, a sample is recorded. The most basic sample only records a timestamps, thread ID and the program counter (%rip). In addition, all executable memory maps of libraries are recorded. For the callstack, frame pointers are traversed, but most of the time, they are optimized on x86, so there is a fall back to unwind, which requires saving register values and a chunk of the stack. The memory space of the process is reconstructed offline. CPython seems to allocates code and frames on mmap() pages. If the data is outside about 1k from the top of stack, it is not available offline in the trace. We need some way to reconstitute this memory space of the interpreter to resolve the symbols, probably by dumping the data on disk. In Java, there is a small HotSpot agent that spits out the symbols of JIT code: https://github.com/jrudolph/perf-map-agent The problem is that CPython does not JIT code, and executed code is the ELF library itself. The executed frames are parameters of functions of the interpreter. I don't think the same approach can be used (maybe this can be applied to PyPy?). I looked at how Python frames are handled in GDB (file cpython/Tools/gdb/libpython.py). A python frame is detected in Frame(gdbframe).is_evalframeex() by a C call to PyEval_EvalFrameEx(). However, the traceback accesses PyFrameObject on the heap (at least for f-f_back = 0xa57460), which is possible in GDB when the program is paused and the whole memory space is available, but is not recorded for offline use in perf. Here is an example of callstack from GDB: #0 PyEval_EvalFrameEx (f=Frame 0x77f1b060, for file crunch.py, line 7, in bar (num=466829), throwflag=0) at ../Python/ceval.c:1039 #1 0x00527877 in fast_function (func=function at remote 0x76ec45a0, pp_stack=0x7fffd280, n=1, na=1, nk=0) at ../Python/ceval.c:4106 #2 0x00527582 in call_function (pp_stack=0x7fffd280, oparg=1) at ../Python/ceval.c:4041 We could add a kernel module that knows how to make samples of CPython, but it means python structures becomes sort of ABI, and kernel devs won't allow a python interpreter in kernel mode ;-). What we really want is f_code data and related objects: (gdb) print (void *)(f-f_code) $8 = (void *) 0x77e370f0 Maybe we could save these pages every time some code is loaded from the interpreter? (the memory range is about 1.7MB, but ) Anyway, I think we must change CPython to support tools such as perf. Any thoughts? Cheers, Francis ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com