Author: Antonio Cuni <anto.c...@gmail.com>
Branch: extradoc
Changeset: r5923:3c9e148d058f
Date: 2018-12-23 11:29 +0100
http://bitbucket.org/pypy/extradoc/changeset/3c9e148d058f/

Log:    add a blog post about the gc-disable branch

diff --git a/blog/draft/2018-12-gc-disable.rst 
b/blog/draft/2018-12-gc-disable.rst
new file mode 100644
--- /dev/null
+++ b/blog/draft/2018-12-gc-disable.rst
@@ -0,0 +1,102 @@
+PyPy for low-latency systems
+=============================
+
+Recently I have merged the gc-disable branch, introducing a couple of features
+which are useful when you need to respond to certain events with the lowest
+possible latency.  This work has been kindly sponsored by `Gambit Research`_
+(which, by the way, is a very cool and geeky place where to work, in case you
+are interested_).
+
+The PyPy VM manages the memory using a generational, moving Garbage Collector:
+periodically, the GC scans the whole heap to find unreachable objects and
+frees the corresponding memory.  Although at a first look this strategy might
+sound expensive, in practice the total cost of memory management is far less
+than e.g. on CPython, which is based on reference counting.  This happens for
+various reasons, the most important ones being that allocation is very fast
+(especially compared to malloc-based allocators), and deallocation of objects
+which die young is basically for free. More information about the PyPy GC is
+available here_.
+
+As we said, the total cost of memory managment is less on PyPy than on
+CPython, and it's one of the reasons why PyPy is so fast.  However, one big
+disadvantage is that while on CPython the cost of memory management is spread
+all over the execution of the program, on PyPy it is concentrated when the GC
+runs, causing observable pauses which interrupt the execution of the user
+program.
+
+To avoid excessively long pauses, the PyPy GC has been using an `incremental
+strategy since 2013`_: the GC runs as a series of "steps", letting the user
+program to progress between each step.
+
+The following chart shows the behavior of a real-world, long-running process:
+
+.. image:: 2018-12-gc-timing.png
+
+The orange line shows the amount of memory used by the program, which
+increases linearly while the program progresses. Every ~5 minutes, the GC
+kicks in and the memory usage drops from ~5.2GB to ~2.8GB (this is not a
+casual ratio, it is controlled by the PYPY_GC_MAJOR_COLLECT_ env variable).
+
+The purple line shows aggregated data about the GC timing: the whole
+collection takes ~1400 individual steps over the course of ~1 minute: each
+point represent the **maximum** time a single step took during the past 10
+seconds. Most steps take ~10-20 ms, although we see a horrible peak of ~100 ms
+towards the end. We have not investigated yet what it is caused by, but we
+suspect it is related with the deallocation of raw objects.
+
+This is clearly a problem for systems where it is important to respond to
+certain events with a latency which is both low and consistent: the GC kicks
+in at the wrong time, it might causes unacceptable pauses during the response.
+
+Let's look again at our real-world example: this is a system which
+continuously monitors an external stream; when a certain event occurs, we want
+to take an action. The following chart shows the maximum time it takes to
+complete one of such actions, aggregated every minute:
+
+.. image:: 2018-12-normal-max.png
+
+You can clearly see that the baseline response time is around ~20-30
+ms. However, we can also see periodic spikes around ~50-100 ms, with peaks up
+to ~350-450 ms! After a bit of investigation, we concluded that most (although
+not all) of the spikes were caused by the GC kicking in at the wrong time.
+
+The work I did in the ``gc-disable`` branch aims to fix this problem by
+introducing `two new features`_ to the ``gc`` module:
+
+  - ``gc.disable()``, which used to simply inhibits the execution of
+    finalizers, now disables the GC major collections for real. After a call
+    to it, you will see the memory usage to grow indefinitely.
+
+  - ``gc.collect_step()`` is a new function which you can use to manually
+    execute a single incremental step.
+
+Combining these two functions, it is possible to take control of the GC to
+make sure it runs only when it is acceptable to do so.  For an example of
+usage, you can look at the implementation of a `custom GC`_ inside pypytools_.
+The peculiarity is that is also defines a ``with nogc():`` context manager
+which you can use to mark performance-critical sections where the GC is not
+allowed to run.
+
+The following chart compares the behavior of the default PyPy GC and the new
+custom GC, after a careful placing of ``nogc()`` sections:
+
+.. image:: 2018-12-nogc.png
+
+The yellow line is the same as before, while the purple line shows the new
+system: almost all spikes have gone, and the baseline performance is about 10%
+better. There is still one spike towards the end, but after some investigation
+we concluded that it was **not** caused by the GC.
+
+All in all, a pretty big success, I think.  These functionalities are already
+available in the nightly builds of PyPy, and will be included in the next
+release: take this as a Christmas present :)
+
+
+.. _`Gambit Research`: https://www.gambitresearch.com/
+.. _interested: https://www.gambitresearch.com/jobs.html
+.. _here: https://pypy.readthedocs.io/en/latest/gc_info.html#incminimark
+.. _`incremental strategy since 2013`: 
https://morepypy.blogspot.com/2013/10/incremental-garbage-collector-in-pypy.html
+.. _PYPY_GC_MAJOR_COLLECT: 
https://pypy.readthedocs.io/en/latest/gc_info.html#environment-variables
+.. _`two new features`: 
https://pypy.readthedocs.io/en/latest/gc_info.html#semi-manual-gc-management
+.. _`Custom GC`: 
https://bitbucket.org/antocuni/pypytools/src/0273afc3e8bedf0eb1ef630c3bc69e8d9dd661fe/pypytools/gc/custom.py?at=default&fileviewer=file-view-default
+.. _pypytools: https://pypi.org/project/pypytools/
diff --git a/blog/draft/2018-12-gc-timing.png b/blog/draft/2018-12-gc-timing.png
new file mode 100644
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..cfa8335b6d21b94e36a44f930c3bd01cb2affa3e
GIT binary patch

[cut]

diff --git a/blog/draft/2018-12-nogc-max.png b/blog/draft/2018-12-nogc-max.png
new file mode 100644
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..ae9d5ad296f82a8d56408c2818691dbefa0fd301
GIT binary patch

[cut]

diff --git a/blog/draft/2018-12-normal-max.png 
b/blog/draft/2018-12-normal-max.png
new file mode 100644
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..29b3381a8162378ae89e1eeeaae9ae8b5cf3d838
GIT binary patch

[cut]

_______________________________________________
pypy-commit mailing list
pypy-commit@python.org
https://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to