Author: Antonio Cuni <anto.c...@gmail.com> Branch: gc-hooks Changeset: r94358:3fe7e9dc4d49 Date: 2018-04-17 11:22 +0200 http://bitbucket.org/pypy/pypy/changeset/3fe7e9dc4d49/
Log: add docs about GC hooks diff --git a/pypy/doc/gc_info.rst b/pypy/doc/gc_info.rst --- a/pypy/doc/gc_info.rst +++ b/pypy/doc/gc_info.rst @@ -121,6 +121,160 @@ alive by GC objects, but not accounted in the GC +GC Hooks +-------- + +GC hooks are user-defined functions which are called whenever a specific GC +event occur, and can be used to monitor GC activity and pauses. You can +install the hooks by setting the following attributes: + +``gc.hook.on_gc_minor`` + Called whenever a minor collection occurs. It corresponds to + ``gc-minor`` sections inside ``PYPYLOG``. + +``gc.hook.on_gc_collect_step`` + Called whenever an incremental step of a major collection occurs. It + corresponds to ``gc-collect-step`` sections inside ``PYPYLOG``. + +``gc.hook.on_gc_collect`` + Called after the last incremental step, when a major collection is fully + done. It corresponds to ``gc-collect-done`` sections inside ``PYPYLOG``. + +To uninstall a hook, simply set the corresponding attribute to ``None``. To +install all hooks at once, you can call ``gc.hooks.set(obj)``, which will look +for methods ``on_gc_*`` on ``obj``. To uninstall all the hooks at once, you +can call ``gc.hooks.reset()``. + +The functions called by the hooks receive a single ``stats`` argument, which +contains various statistics about the event. + +Note that PyPy cannot call the hooks immediately after a GC event, but it has +to wait until it reaches a point in which the interpreter is in a known state +and calling user-defined code is harmless. It might happen that multiple +events occur before the hook is invoked: in this case, you can inspect the +value ``stats.count`` to know how many times the event occured since the last +time the hook was called. Similarly, ``stats.duration`` contains the +**total** time spent by the GC for this specific event since the last time the +hook was called. + +On the other hand, all the other fields of the ``stats`` object are relative +only to the **last** event of the series. + +The attributes for ``GcMinorStats`` are: + +``count`` + The number of minor collections occured since the last hook call. + +``duration`` + The total time spent inside minor collections since the last hook + call. See below for more information on the unit. + + ``total_memory_used`` + The amount of memory used at the end of the minor collection, in + bytes. This include the memory used in arenas (for GC-managed memory) and + raw-malloced memory (e.g., the content of numpy arrays). + +``pinned_objects`` + the number of pinned objects. + + +The attributes for ``GcCollectStepStats`` are: + +``count``, ``duration`` + See above. + +``oldstate``, ``newstate`` + Integers which indicate the state of the GC before and after the step. + +The value of ``oldstate`` and ``newstate`` is one of these constants, defined +inside ``gc.GcCollectStepStats``: ``STATE_SCANNING``, ``STATE_MARKING``, +``STATE_SWEEPING``, ``STATE_FINALIZING``. It is possible to get a string +representation of it by indexing the ``GC_STATS`` tuple. + + +The attributes for ``GcCollectStats`` are: + +``count`` + See above. + +``num_major_collects`` + The total number of major collections which have been done since the + start. Contrarily to ``count``, this is an always-growing counter and it's + not reset between invocations. + +``arenas_count_before``, ``arenas_count_after`` + Number of arenas used before and after the major collection. + +``arenas_bytes`` + Total number of bytes used by GC-managed objects. + +``rawmalloc_bytes_before``, ``rawmalloc_bytes_after`` + Total number of bytes used by raw-malloced objects, before and after the + major collection. + +Note that ``GcCollectStats`` has **not** got a ``duration`` field. This is +because all the GC work is done inside ``gc-collect-step``: +``gc-collect-done`` is used only to give additional stats, but doesn't do any +actual work. + +A note about the ``duration`` field: depending on the architecture and +operating system, PyPy uses different ways to read timestamps, so ``duration`` +is expressed in varying units. It is possible to know which by calling +``__pypy__.debug_get_timestamp_unit()``, which can be one of the following +values: + +``tsc`` + The default on ``x86`` machines: timestamps are expressed in CPU ticks, as + read by the `Time Stamp Counter`_. + +``ns`` + Timestamps are expressed in nanoseconds. + +``QueryPerformanceCounter`` + On Windows, in case for some reason ``tsc`` is not available: timestamps + are read using the win API ``QueryPerformanceCounter()``. + + +Unfortunately, there does not seem to be a reliable standard way for +converting ``tsc`` ticks into nanoseconds, although in practice on modern CPUs +it is enough to divide the ticks by the maximum nominal frequency of the CPU. +For this reason, PyPy gives the raw value, and leaves the job of doing the +conversion to external libraries. + +Here is an example of GC hooks in use:: + + import sys + import gc + + class MyHooks(object): + done = False + + def on_gc_minor(self, stats): + print 'gc-minor: count = %02d, duration = %d' % (stats.count, + stats.duration) + + def on_gc_collect_step(self, stats): + old = gc.GcCollectStepStats.GC_STATES[stats.oldstate] + new = gc.GcCollectStepStats.GC_STATES[stats.newstate] + print 'gc-collect-step: %s --> %s' % (old, new) + print ' count = %02d, duration = %d' % (stats.count, + stats.duration) + + def on_gc_collect(self, stats): + print 'gc-collect-done: count = %02d' % stats.count + self.done = True + + hooks = MyHooks() + gc.hooks.set(hooks) + + # simulate some GC activity + lst = [] + while not hooks.done: + lst = [lst, 1, 2, 3] + + +.. _`Time Stamp Counter`: https://en.wikipedia.org/wiki/Time_Stamp_Counter + .. _minimark-environment-variables: Environment variables _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit