Author: Antonio Cuni <anto.c...@gmail.com> Branch: extradoc Changeset: r5923:3c9e148d058f Date: 2018-12-23 11:29 +0100 http://bitbucket.org/pypy/extradoc/changeset/3c9e148d058f/
Log: add a blog post about the gc-disable branch diff --git a/blog/draft/2018-12-gc-disable.rst b/blog/draft/2018-12-gc-disable.rst new file mode 100644 --- /dev/null +++ b/blog/draft/2018-12-gc-disable.rst @@ -0,0 +1,102 @@ +PyPy for low-latency systems +============================= + +Recently I have merged the gc-disable branch, introducing a couple of features +which are useful when you need to respond to certain events with the lowest +possible latency. This work has been kindly sponsored by `Gambit Research`_ +(which, by the way, is a very cool and geeky place where to work, in case you +are interested_). + +The PyPy VM manages the memory using a generational, moving Garbage Collector: +periodically, the GC scans the whole heap to find unreachable objects and +frees the corresponding memory. Although at a first look this strategy might +sound expensive, in practice the total cost of memory management is far less +than e.g. on CPython, which is based on reference counting. This happens for +various reasons, the most important ones being that allocation is very fast +(especially compared to malloc-based allocators), and deallocation of objects +which die young is basically for free. More information about the PyPy GC is +available here_. + +As we said, the total cost of memory managment is less on PyPy than on +CPython, and it's one of the reasons why PyPy is so fast. However, one big +disadvantage is that while on CPython the cost of memory management is spread +all over the execution of the program, on PyPy it is concentrated when the GC +runs, causing observable pauses which interrupt the execution of the user +program. + +To avoid excessively long pauses, the PyPy GC has been using an `incremental +strategy since 2013`_: the GC runs as a series of "steps", letting the user +program to progress between each step. + +The following chart shows the behavior of a real-world, long-running process: + +.. image:: 2018-12-gc-timing.png + +The orange line shows the amount of memory used by the program, which +increases linearly while the program progresses. Every ~5 minutes, the GC +kicks in and the memory usage drops from ~5.2GB to ~2.8GB (this is not a +casual ratio, it is controlled by the PYPY_GC_MAJOR_COLLECT_ env variable). + +The purple line shows aggregated data about the GC timing: the whole +collection takes ~1400 individual steps over the course of ~1 minute: each +point represent the **maximum** time a single step took during the past 10 +seconds. Most steps take ~10-20 ms, although we see a horrible peak of ~100 ms +towards the end. We have not investigated yet what it is caused by, but we +suspect it is related with the deallocation of raw objects. + +This is clearly a problem for systems where it is important to respond to +certain events with a latency which is both low and consistent: the GC kicks +in at the wrong time, it might causes unacceptable pauses during the response. + +Let's look again at our real-world example: this is a system which +continuously monitors an external stream; when a certain event occurs, we want +to take an action. The following chart shows the maximum time it takes to +complete one of such actions, aggregated every minute: + +.. image:: 2018-12-normal-max.png + +You can clearly see that the baseline response time is around ~20-30 +ms. However, we can also see periodic spikes around ~50-100 ms, with peaks up +to ~350-450 ms! After a bit of investigation, we concluded that most (although +not all) of the spikes were caused by the GC kicking in at the wrong time. + +The work I did in the ``gc-disable`` branch aims to fix this problem by +introducing `two new features`_ to the ``gc`` module: + + - ``gc.disable()``, which used to simply inhibits the execution of + finalizers, now disables the GC major collections for real. After a call + to it, you will see the memory usage to grow indefinitely. + + - ``gc.collect_step()`` is a new function which you can use to manually + execute a single incremental step. + +Combining these two functions, it is possible to take control of the GC to +make sure it runs only when it is acceptable to do so. For an example of +usage, you can look at the implementation of a `custom GC`_ inside pypytools_. +The peculiarity is that is also defines a ``with nogc():`` context manager +which you can use to mark performance-critical sections where the GC is not +allowed to run. + +The following chart compares the behavior of the default PyPy GC and the new +custom GC, after a careful placing of ``nogc()`` sections: + +.. image:: 2018-12-nogc.png + +The yellow line is the same as before, while the purple line shows the new +system: almost all spikes have gone, and the baseline performance is about 10% +better. There is still one spike towards the end, but after some investigation +we concluded that it was **not** caused by the GC. + +All in all, a pretty big success, I think. These functionalities are already +available in the nightly builds of PyPy, and will be included in the next +release: take this as a Christmas present :) + + +.. _`Gambit Research`: https://www.gambitresearch.com/ +.. _interested: https://www.gambitresearch.com/jobs.html +.. _here: https://pypy.readthedocs.io/en/latest/gc_info.html#incminimark +.. _`incremental strategy since 2013`: https://morepypy.blogspot.com/2013/10/incremental-garbage-collector-in-pypy.html +.. _PYPY_GC_MAJOR_COLLECT: https://pypy.readthedocs.io/en/latest/gc_info.html#environment-variables +.. _`two new features`: https://pypy.readthedocs.io/en/latest/gc_info.html#semi-manual-gc-management +.. _`Custom GC`: https://bitbucket.org/antocuni/pypytools/src/0273afc3e8bedf0eb1ef630c3bc69e8d9dd661fe/pypytools/gc/custom.py?at=default&fileviewer=file-view-default +.. _pypytools: https://pypi.org/project/pypytools/ diff --git a/blog/draft/2018-12-gc-timing.png b/blog/draft/2018-12-gc-timing.png new file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..cfa8335b6d21b94e36a44f930c3bd01cb2affa3e GIT binary patch [cut] diff --git a/blog/draft/2018-12-nogc-max.png b/blog/draft/2018-12-nogc-max.png new file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..ae9d5ad296f82a8d56408c2818691dbefa0fd301 GIT binary patch [cut] diff --git a/blog/draft/2018-12-normal-max.png b/blog/draft/2018-12-normal-max.png new file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..29b3381a8162378ae89e1eeeaae9ae8b5cf3d838 GIT binary patch [cut] _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit