[pypy-commit] extradoc extradoc: merge

hakanardo Fri, 21 Nov 2014 00:52:56 -0800

Author: Hakan Ardo <ha...@debian.org>
Branch: extradoc
Changeset: r5465:206c058dcb38
Date: 2014-11-21 09:50 +0100
http://bitbucket.org/pypy/extradoc/changeset/206c058dcb38/


Log:    merge

diff --git a/blog/draft/io-improvements.rst b/blog/draft/io-improvements.rst
new file mode 100644
--- /dev/null
+++ b/blog/draft/io-improvements.rst
@@ -0,0 +1,88 @@
+
+Hello everyone!
+
+We've wrapped up the Warsaw sprint, so I would like to describe some
+branches which have been recently merged and which improved the I/O and the
+GC: `gc_no_cleanup_nursery`_ and `gc-incminimark-pinning`_.
+
+.. _`gc_no_cleanup_nursery`: 
https://bitbucket.org/pypy/pypy/commits/9e2f7a37c1e2
+.. _`gc-incminimark-pinning`: 
https://bitbucket.org/pypy/pypy/commits/64017d818038
+
+The first branch was started by Wenzhu Man for her Google Summer of Code
+and finished by Maciej Fija&#322;kowski and Armin Rigo.
+The PyPy GC works by allocating new objects in the young object
+area (the nursery), simply by incrementing a pointer. After each minor
+collection, the nursery has to be cleaned up. For simplicity, the GC used 
+to do it by zeroing the whole nursery.
+
+This approach has bad effects on the cache, since you zero a large piece of
+memory at once and do unnecessary work for things that don't require zeroing
+like large strings. We mitigated the first problem somewhat with incremental
+nursery zeroing, but this branch removes the zeroing completely, thus
+improving the string handling and recursive code (since jitframes don't
+requires zeroed memory either). I measured the effect on two examples: 
+a recursive implementation of  `fibonacci`_ and `gcbench`_,
+to measure GC performance.
+
+.. _`fibonacci`: 
https://bitbucket.org/pypy/benchmarks/src/69152c2aee7766051aab15735b0b82a46b82b802/own/fib.py?at=default
+.. _`gcbench`: 
https://bitbucket.org/pypy/benchmarks/src/69152c2aee7766051aab15735b0b82a46b82b802/own/gcbench.py?at=default
+
+The results for fibonacci and gcbench are below (normalized to cpython
+2.7). Benchmarks were run 50 times each (note that the big standard
+deviation comes mostly from the warmup at the beginning, true figures
+are smaller):
+
++----------------+----- 
------------+-------------------------+--------------------+
+| benchmark      | CPython          | PyPy 2.4                | PyPy non-zero  
    |
++----------------+------------------+-------------------------+--------------------+
+| fibonacci      | 4.8+-0.15 (1.0x) | 0.59+-0.07 (8.1x)       | 0.45+-0.07 
(10.6x) |
++----------------+------------------+-------------------------+--------------------+
+| gcbench        | 22+-0.36 (1.0x)  | 1.34+-0.28 (16.4x)      | 1.02+-0.15 
(21.6x) |
++----------------+------------------+-------------------------+--------------------+
+
+The second branch was done by Gregor Wegberg for his master thesis and finished
+by Maciej Fija&#322;kowski and Armin Rigo. Because of the way it works, the 
PyPy GC from
+time to time moves the objects in memory, meaning that their address can 
change.
+Therefore, if you want to pass pointers to some external C function (for
+example, write(2) or read(2)), you need to ensure that the objects they are
+pointing to will not be moved by the GC (e.g. when running a different thread).
+PyPy up to 2.4 solves the problem by copying the data into or from a 
non-movable buffer, which
+is obviously inefficient.
+The branch introduce the concept of "pinning", which allows us to inform the
+GC that it is not allowed to move a certain object for a short period of time.
+This introduces a bit of extra complexity
+in the garbage collector, but improves the I/O performance quite drastically,
+because we no longer need the extra copy to and from the non-movable buffers.
+
+In `this benchmark`_, which does I/O in a loop,
+we either write a number of bytes from a freshly allocated string into
+/dev/null or read a number of bytes from /dev/full. I'm showing the results
+for PyPy 2.4, PyPy with non-zero-nursery and PyPy with non-zero-nursery and
+object pinning. Those are wall times for cases using ``os.read/os.write``
+and ``file.read/file.write``, normalized against CPython 2.7.
+
+Benchmarks were done using PyPy 2.4 and revisions ``85646d1d07fb`` for
+non-zero-nursery and ``3d8fe96dc4d9`` for non-zero-nursery and pinning.
+The benchmarks were run once, since the standard deviation was small.
+
+XXXX
+
+What we can see is that ``os.read`` and ``os.write`` both improved greatly
+and outperforms CPython now for each combination. ``file`` operations are
+a little more tricky, and while those branches improved the situation a bit,
+the improvement is not as drastic as in ``os`` versions.  It really should not
+be the case and it showcases how our ``file`` buffering is inferior to CPython.
+We plan on removing our own buffering and using ``FILE*`` in C in the near 
future,
+so we should outperform CPython on those too (since our allocations are 
cheaper).
+If you look carefully in the benchmark, the write function is copied three 
times.
+This hack is intended to avoid JIT overspecializing the assembler code, which 
happens
+because the buffering code was written way before the JIT was done. In fact, 
our buffering
+is hilariously bad, but if stars align correctly it can be JIT-compiled to 
something
+that's not half bad. Try removing the hack and seeing how the performance of 
the last
+benchmark drops :-) Again, this hack should be absolutely unnecessary once we 
remove
+our own buffering, stay tuned for more.
+
+Cheers,
+fijal
+
+.. _`this benchmark`: 
https://bitbucket.org/pypy/benchmarks/src/69152c2aee7766051aab15735b0b82a46b82b802/io/iobasic.py?at=default
diff --git a/blog/draft/iobase.png b/blog/draft/iobase.png
new file mode 100644
index 
0000000000000000000000000000000000000000..0b8bedc12421ce39c24931d68f8993da74bb2808
GIT binary patch

[cut]

diff --git a/blog/draft/tornado-stm.rst b/blog/draft/tornado-stm.rst
new file mode 100644
--- /dev/null
+++ b/blog/draft/tornado-stm.rst
@@ -0,0 +1,252 @@
+Tornado without a GIL on PyPy STM
+=================================
+
+Python has a GIL, right? Not quite - PyPy STM is a python implementation
+without a GIL, so it can scale CPU-bound work to several cores.
+PyPy STM is developed by Armin Rigo and Remi Meier,
+and supported by community `donations <http://pypy.org/tmdonate2.html>`_.
+You can read more about it in the
+`docs <http://pypy.readthedocs.org/en/latest/stm.html>`_.
+
+Although PyPy STM is still a work in progress, in many cases it can already
+run CPU-bound code faster than regular PyPy, when using multiple cores.
+Here we will see how to slightly modify Tornado IO loop to use
+`transaction 
<https://bitbucket.org/pypy/pypy/raw/stmgc-c7/lib_pypy/transaction.py>`_
+module.
+This module is `described 
<http://pypy.readthedocs.org/en/latest/stm.html#atomic-sections-transactions-etc-a-better-way-to-write-parallel-programs>`_
+in the docs and is really simple to use - please see an example there.
+An event loop of Tornado, or any other asynchronous
+web server, looks like this (with some simplifications)::
+
+    while True:
+        for callback in list(self._callbacks):
+            self._run_callback(callback)
+        event_pairs = self._impl.poll()
+        self._events.update(event_pairs)
+        while self._events:
+            fd, events = self._events.popitem()
+            handler = self._handlers[fd]
+            self._handle_event(fd, handler, events)
+
+We get IO events, and run handlers for all of them, these handlers can
+also register new callbacks, which we run too. When using such a framework,
+it is very nice to have a guaranty that all handlers are run serially,
+so you do not have to put any locks. This is an ideal case for the
+transaction module - it gives us guaranties that things appear
+to be run serially, so in user code we do not need any locks. We just
+need to change the code above to something like::
+
+    while True:
+        for callback in list(self._callbacks):
+            transaction.add(
+            self._run_callback, callback)   # added
+        transaction.run()                   # added
+        event_pairs = self._impl.poll()
+        self._events.update(event_pairs)
+        while self._events:
+            fd, events = self._events.popitem()
+            handler = self._handlers[fd]
+            transaction.add(                # added
+                self._handle_event, fd, handler, events)
+        transaction.run()                   # added
+
+The actual commit is
+`here 
<https://github.com/lopuhin/tornado/commit/246c5e71ce8792b20c56049cf2e3eff192a01b20>`_,
+- we had to extract a little function to run the callback.
+
+Part 1: a simple benchmark: primes
+----------------------------------
+
+Now we need a simple benchmark, lets start with
+`this 
<https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/primes.py?at=default>`_
+- just calculate a list of primes up to the given number, and return it
+as JSON::
+
+    def is_prime(n):
+        for i in xrange(2, n):
+            if n % i == 0:
+                return False
+        return True
+
+    class MainHandler(tornado.web.RequestHandler):
+        def get(self, num):
+            num = int(num)
+            primes = [n for n in xrange(2, num + 1) if is_prime(n)]
+            self.write({'primes': primes})
+
+
+We can benchmark it with ``siege``::
+
+    siege -c 50 -t 20s http://localhost:8888/10000
+
+But this does not scale. The CPU load is at 101-104 %, and we handle 30 %
+less request per second. The reason for the slowdown is STM overhead,
+which needs to keep track of all writes and reads in order to detect conflicts.
+And the reason for using only one core is, obviously, conflicts!
+Fortunately, we can see what this conflicts are, if we run code like this
+(here 4 is the number of cores to use)::
+
+    PYPYSTM=stm.log ./primes.py 4
+
+Then we can use `print_stm_log.py 
<https://bitbucket.org/pypy/pypy/raw/stmgc-c7/pypy/stm/print_stm_log.py>`_
+to analyse this log. It lists the most expensive conflicts::
+
+    14.793s lost in aborts, 0.000s paused (1258x STM_CONTENTION_INEVITABLE)
+    File "/home/ubuntu/tornado-stm/tornado/tornado/httpserver.py", line 455, 
in __init__
+        self._start_time = time.time()
+    File "/home/ubuntu/tornado-stm/tornado/tornado/httpserver.py", line 455, 
in __init__
+        self._start_time = time.time()
+    ...
+
+There are only three kinds of conflicts, they are described in
+`stm source 
<https://bitbucket.org/pypy/pypy/src/6355617bf9a2a0fa8b74ae17906e4a591b38e2b5/rpython/translator/stm/src_stm/stm/contention.c?at=stmgc-c7>`_,
+Here we see that two threads call into external function to get current time,
+and we can not rollback any of them, so one of them must wait till the other
+transaction finishes.
+For now we can hack around this by disabling this timing - this is only
+needed for internal profiling in tornado.
+
+If we do it, we get the following results (but see caveats below):
+
+============  =========
+Impl.           req/s
+============  =========
+PyPy 2.4        14.4
+------------  ---------
+CPython 2.7      3.2
+------------  ---------
+PyPy-STM 1       9.3
+------------  ---------
+PyPy-STM 2      16.4
+------------  ---------
+PyPy-STM 3      20.4
+------------  ---------
+PyPy STM 4      24.2
+============  =========
+
+.. image:: results-1.png
+
+As we can see, in this benchmark PyPy STM using just two cores
+can beat regular PyPy!
+This is not linear scaling, there are still conflicts left, and this
+is a very simple example but still, it works!
+
+But its not that simple yet :)
+
+First, these are best-case numbers after long (much longer than for regular
+PyPy) warmup. Second, it can sometimes crash (although removing old pyc files
+fixes it). Third, benchmark meta-parameters are also tuned.
+
+Here we get relatively good results only when there are a lot of concurrent
+clients - as a results, a lot of requests pile up, the server is not keeping
+with the load, and transaction module is busy with work running this piled up
+requests. If we decrease the number of concurrent clients, results get 
slightly worse.
+Another thing we can tune is how heavy is each request - again, if we ask
+primes up to a lower number, then less time is spent doing calculations,
+more time is spent in tornado, and results get much worse.
+
+Besides the ``time.time()`` conflict described above, there are a lot of 
others.
+The bulk of time is lost in these two conflicts::
+
+    14.153s lost in aborts, 0.000s paused (270x STM_CONTENTION_INEVITABLE)
+    File "/home/ubuntu/tornado-stm/tornado/tornado/web.py", line 1082, in 
compute_etag
+        hasher = hashlib.sha1()
+    File "/home/ubuntu/tornado-stm/tornado/tornado/web.py", line 1082, in 
compute_etag
+        hasher = hashlib.sha1()
+
+    13.484s lost in aborts, 0.000s paused (130x STM_CONTENTION_WRITE_READ)
+    File "/home/ubuntu/pypy/lib_pypy/transaction.py", line 164, in _run_thread
+        got_exception)
+
+The first one is presumably calling into some C function from stdlib, and we 
get
+the same conflict as for ``time.time()`` above, but is can be fixed on PyPy
+side, as we can be sure that computing sha1 is pure.
+
+It is easy to hack around this one too, just removing etag support, but if
+we do it, performance is much worse, only slightly faster than regular PyPy,
+with the top conflict being::
+
+    83.066s lost in aborts, 0.000s paused (459x STM_CONTENTION_WRITE_WRITE)
+    File "/home/arigo/hg/pypy/stmgc-c7/lib-python/2.7/_weakrefset.py", line 
70, in __contains__
+    File "/home/arigo/hg/pypy/stmgc-c7/lib-python/2.7/_weakrefset.py", line 
70, in __contains__
+
+**FIXME** why does it happen?
+
+The second conflict (without etag tweaks) originates
+in the transaction module, from this piece of code::
+
+    while True:
+        self._do_it(self._grab_next_thing_to_do(tloc_pending),
+                    got_exception)
+        counter[0] += 1
+
+**FIXME** why does it happen?
+
+Tornado modification used in this blog post is based on 3.2.dev2.
+As of now, the latest version is 4.0.2, and if we
+`apply 
<https://github.com/lopuhin/tornado/commit/04cd7407f8690fd1dc55b686eb78e3795f4363e6>`_
+the same changes to this version, then we no longer get any scaling on this 
benchmark,
+and there are no conflicts that take any substantial time.
+
+
+Part 2: a more interesting benchmark: A-star
+--------------------------------------------
+
+Although we have seen that PyPy STM is not all moonlight and roses,
+it is interesting to see how it works on a more realistic application.
+
+`astar.py 
<https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/astar.py>`_
+is a simple game where several players move on a map
+(represented as a list of lists of integers),
+build and destroy walls, and ask server to give them
+shortest paths between two points
+using A-star search, adopted from `ActiveState recipie 
<http://code.activestate.com/recipes/577519-a-star-shortest-path-algorithm/>`_.
+
+The benchmark `bench_astar.py 
<https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/bench_astar.py>`_
+is simulating players, and tries to put the main load on A-star search,
+but also does some wall building and destruction. There are no locks
+around map modifications, as normal tornado is executing all callbacks
+serially, and we can keep this guaranty with atomic blocks of PyPy STM.
+This is also an example of a program that is not trivial
+to scale to multiple cores with separate processes (assuming
+more interesting shared state and logic).
+
+This benchmark is very noisy due to randomness of client interactions
+(also it could be not linear), so just lower and upper bounds for
+number of requests are reported
+
+============  ==========
+Impl.           req/s
+============  ==========
+PyPy 2.4        5 .. 7
+------------  ----------
+CPython 2.7   0.5 .. 0.9
+------------  ----------
+PyPy-STM 1      2 .. 4
+------------  ----------
+PyPy STM 4      2 .. 6
+============  ==========
+
+Clearly this is a very benchmark, but still we can see that scaling is worse
+and STM overhead is sometimes higher.
+The bulk of conflicts come from the transaction module (we have seen it
+above)::
+
+    91.655s lost in aborts, 0.000s paused (249x STM_CONTENTION_WRITE_READ)
+    File "/home/ubuntu/pypy/lib_pypy/transaction.py", line 164, in _run_thread
+        got_exception)
+
+
+Although it is definitely not ready for production use, you can already try
+to run things, report bugs, and see what is missing in user-facing tools
+and libraries.
+
+
+Benchmarks setup:
+
+* Amazon c3.xlarge (4 cores) running Ubuntu 14.04
+* pypy-c-r74011-stm-jit for the primes benchmark (but it has more bugs
+  than more recent versions), and
+  `pypy-c-r74378-74379-stm-jit 
<http://cobra.cs.uni-duesseldorf.de/~buildmaster/misc/pypy-c-r74378-74379-stm-jit.xz>`_
+  for astar benchmark (put it inside pypy source checkout at 38c9afbd253c)
+* http://bitbucket.org/kostialopuhin/tornado-stm-bench at 65144cda7a1f
diff --git a/sprintinfo/warsaw-2014/announcement.txt 
b/sprintinfo/warsaw-2014/announcement.txt
--- a/sprintinfo/warsaw-2014/announcement.txt
+++ b/sprintinfo/warsaw-2014/announcement.txt
@@ -38,8 +38,8 @@
 ------------
 
 The sprint will happen within a room of Warsaw University.  The
-address is Pasteura 5 (which is a form of "Pasteur street"), room 550.
-The person of contact is Maciej Fijalkowski.
+address is Pasteura 5 (which is a form of "Pasteur street"), dept. of
+Physics, room 450.  The person of contact is Maciej Fijalkowski.
 
 
 --------------
diff --git a/sprintinfo/warsaw-2014/people.txt 
b/sprintinfo/warsaw-2014/people.txt
--- a/sprintinfo/warsaw-2014/people.txt
+++ b/sprintinfo/warsaw-2014/people.txt
@@ -9,5 +9,12 @@
 ==================== ============== =======================
     Name              Arrive/Depart     Accomodation 
 ==================== ============== =======================
-Armin Rigo           20/10-2X/10    with fijal?
+Armin Rigo           20/10-28/10    with fijal
+Maciej Fijalkowski   20/10-30/10    private
+Romain Guillebert    19/10-26-10    ibis Reduta with mjacob
+Manuel Jacob         20/10-26/10    ibis Reduta with rguillebert
+Kostia Lopuhin
+Antonio Cuni         20/10-26/10    ibis Reduta 
http://www.ibis.com/gb/hotel-7125-ibis-warszawa-reduta/index.shtml
+Matti Picus          20/10-20/10    just a long layover between flights
+Ronan Lamy           19/10-26/10    ibis Reduta
 ==================== ============== =======================
diff --git a/sprintinfo/warsaw-2014/planning.txt 
b/sprintinfo/warsaw-2014/planning.txt
new file mode 100644
--- /dev/null
+++ b/sprintinfo/warsaw-2014/planning.txt
@@ -0,0 +1,41 @@
+Topics
+======
+
+* cffi.verify dlopen flag - TO BE MERGED
+
+* PyPy/CPython Bridge (Romain, kostia) - MORE PROGRESS
+
+* Profiler (Antonio, Armin) - IN PROGRESS
+
+* Merge improve-docs (Manuel, Ronan) - IN PROGRESS
+
+* Merge kill-multimethod remove-remaining-smm  (Manuel, Antonio, fijal) - 
MERGED remove-remaining-smm, kill-multimethod WAITING FOR REVIEW
+
+* STM presentation (Everybody) - DONE
+
+* Refactor annotator/rtyper (Ronan?) - LOOKING FOR PAIRING
+
+* Python 3.3 - IN PROGRESS
+
+* look into merging gc pinning (fijal, arigo) - ALMOST READY, more debugging 
needed
+
+* investigate -fPIC slowdown (fijal, arigo) - IN PROGRESS, complete mess
+
+* NumPyPy discussion (everybody) DONE
+
+* Trying stuff on PyPy-STM (Antonio, Kostia)
+
+* convincing anto why resume refactoring is a good idea
+
+* switchify chains of guard_value (Armin, Romain...)
+
+People
+=====
+
+Antonio
+Armin
+Kostia
+Ronan
+Romain
+Manuel
+Maciej
diff --git a/talk/img/baroquesoftware.png b/talk/img/baroquesoftware.png
new file mode 100644
index 
0000000000000000000000000000000000000000..038e80b25722d7917a1fbecb581ce25b54707ab2
GIT binary patch

[cut]

diff --git a/talk/pyconie2014/Makefile b/talk/pyconie2014/Makefile
new file mode 100644
--- /dev/null
+++ b/talk/pyconie2014/Makefile
@@ -0,0 +1,18 @@
+# you can find rst2beamer.py here:
+# https://bitbucket.org/antocuni/env/raw/default/bin/rst2beamer.py
+
+# WARNING: to work, it needs this patch for docutils
+# 
https://sourceforge.net/tracker/?func=detail&atid=422032&aid=1459707&group_id=38414
+
+talk.pdf: talk.rst author.latex stylesheet.latex
+       python `which rst2beamer.py` --stylesheet=stylesheet.latex 
--documentoptions=14pt talk.rst talk.latex || exit
+       #/home/antocuni/.virtualenvs/rst2beamer/bin/python `which 
rst2beamer.py` --stylesheet=stylesheet.latex --documentoptions=14pt talk.rst 
talk.latex || exit
+       sed 's/\\date{}/\\input{author.latex}/' -i talk.latex || exit
+       #sed 's/\\maketitle/\\input{title.latex}/' -i talk.latex || exit
+       pdflatex talk.latex  || exit
+
+view: talk.pdf
+       evince talk.pdf &
+
+xpdf: talk.pdf
+       xpdf talk.pdf &
diff --git a/talk/pyconie2014/author.latex b/talk/pyconie2014/author.latex
new file mode 100644
--- /dev/null
+++ b/talk/pyconie2014/author.latex
@@ -0,0 +1,9 @@
+\definecolor{rrblitbackground}{rgb}{0.0, 0.0, 0.0}
+
+\title[PyPy : A fast Python Virtual Machine]{PyPy : A fast Python Virtual 
Machine}
+\author[rguillebert]
+{Romain Guillebert\\
+\includegraphics[width=80px]{../img/py-web-new.png}}
+
+\institute{Pycon IE}
+\date{October 12th, 2014}
diff --git a/talk/pyconie2014/beamerdefs.txt b/talk/pyconie2014/beamerdefs.txt
new file mode 100644
--- /dev/null
+++ b/talk/pyconie2014/beamerdefs.txt
@@ -0,0 +1,108 @@
+.. colors
+.. ===========================
+
+.. role:: green
+.. role:: red
+
+
+.. general useful commands
+.. ===========================
+
+.. |pause| raw:: latex
+
+   \pause
+
+.. |small| raw:: latex
+
+   {\small
+
+.. |end_small| raw:: latex
+
+   }
+
+.. |scriptsize| raw:: latex
+
+   {\scriptsize
+
+.. |end_scriptsize| raw:: latex
+
+   }
+
+.. |strike<| raw:: latex
+
+   \sout{
+
+.. closed bracket
+.. ===========================
+
+.. |>| raw:: latex
+
+   }
+
+
+.. example block
+.. ===========================
+
+.. |example<| raw:: latex
+
+   \begin{exampleblock}{
+
+
+.. |end_example| raw:: latex
+
+   \end{exampleblock}
+
+
+
+.. alert block
+.. ===========================
+
+.. |alert<| raw:: latex
+
+   \begin{alertblock}{
+
+
+.. |end_alert| raw:: latex
+
+   \end{alertblock}
+
+
+
+.. columns
+.. ===========================
+
+.. |column1| raw:: latex
+
+   \begin{columns}
+      \begin{column}{0.45\textwidth}
+
+.. |column2| raw:: latex
+
+      \end{column}
+      \begin{column}{0.45\textwidth}
+
+
+.. |end_columns| raw:: latex
+
+      \end{column}
+   \end{columns}
+
+
+
+.. |snake| image:: ../../img/py-web-new.png
+           :scale: 15%
+           
+
+
+.. nested blocks
+.. ===========================
+
+.. |nested| raw:: latex
+
+   \begin{columns}
+      \begin{column}{0.85\textwidth}
+
+.. |end_nested| raw:: latex
+
+      \end{column}
+   \end{columns}
diff --git a/talk/pyconie2014/speed.png b/talk/pyconie2014/speed.png
new file mode 100644
index 
0000000000000000000000000000000000000000..4640c76f8a665af1c414dc4c4ca22be3bd8ff360
GIT binary patch

[cut]

diff --git a/talk/pyconie2014/stylesheet.latex 
b/talk/pyconie2014/stylesheet.latex
new file mode 100644
--- /dev/null
+++ b/talk/pyconie2014/stylesheet.latex
@@ -0,0 +1,9 @@
+\setbeamercovered{transparent}
+\setbeamertemplate{navigation symbols}{}
+
+\definecolor{darkgreen}{rgb}{0, 0.5, 0.0}
+\newcommand{\docutilsrolegreen}[1]{\color{darkgreen}#1\normalcolor}
+\newcommand{\docutilsrolered}[1]{\color{red}#1\normalcolor}
+
+\newcommand{\green}[1]{\color{darkgreen}#1\normalcolor}
+\newcommand{\red}[1]{\color{red}#1\normalcolor}
diff --git a/talk/pyconie2014/talk.pdf b/talk/pyconie2014/talk.pdf
new file mode 100644
index 
0000000000000000000000000000000000000000..748494720b5c056a3734955b2bbb4ad2f93692e4
GIT binary patch

[cut]

diff --git a/talk/pyconie2014/talk.rst b/talk/pyconie2014/talk.rst
new file mode 100644
--- /dev/null
+++ b/talk/pyconie2014/talk.rst
@@ -0,0 +1,170 @@
+.. include:: beamerdefs.txt
+
+PyPy : A fast Python Virtual Machine
+====================================
+
+Me
+--
+
+- rguillebert on twitter and irc
+
+- PyPy contributor since 2011
+
+- NumPyPy contributor
+
+- Software consultant (hire me !)
+
+Introduction
+------------
+
+- "PyPy is a fast, compliant alternative implementation of the Python language"
+
+- Aims to reach the best performance possible without changing the syntax or 
semantics
+
+- Supports x86, x86_64, ARM
+
+- Production ready
+
+- MIT Licensed
+
+Speed
+-----
+
+.. image:: speed.png
+   :scale: 37%
+
+Speed
+-----
+
+- Automatically generated tracing just-in-time compiler
+
+- Generates linear traces from loops
+
+- Generates efficient machine code based on runtime observations
+
+- Removes overhead when unnecessary
+
+- But Python features which require overhead remain available (frame 
introspection, pdb)
+
+Performance ?
+-------------
+
+- Things get done faster
+
+- Serve more requests per second
+
+- Lower latency
+
+- Less servers for the same performance
+
+Demo
+----
+
+- Real-time edge detection
+
+
+Compatibility
+-------------
+
+- Fully compatible with CPython 2.7 & 3.2 (minus implementation details)
+
+- Partial and slow support of the C-API
+
+- Alternatives might exist
+
+Ecosystem
+---------
+
+- We should (slowly, incrementally) move away from the C extension API
+
+  * Makes assumptions on refcounting, object layout, the GIL
+
+  * The future of Python is bound to the future of CPython (a more than 20 
years old interpreter)
+
+  * It's hard for a new Python VM without C extension support to get traction 
(not only PyPy)
+
+- This doesn't mean we should lose Python's ability to interface with C easily
+
+- CFFI is the PyPy team's attempt at solving this
+
+CFFI (1/2)
+----------
+
+- Where do we go from here ?
+
+- CFFI is a fairly new way of interacting with C in an implementation 
independant way
+
+- Very fast on PyPy
+
+- Decently fast on CPython
+
+- The Jython project is working on support
+
+CFFI (2/2)
+----------
+
+- More convenient, safer, faster than ctypes
+
+- Can call C functions easily, API and ABI mode
+
+- Python functions can be exposed to C
+
+- Already used by pyopenssl, psycopg2cffi, pygame_cffi, lxml_cffi
+
+- Other tools could be built on top of it (Cython cffi backend ?)
+
+Success stories
+---------------
+
+    Magnetic is the leader in online search retargeting, with a large, high 
volume, performance-critical platform written in Python. [...] 
+
+    The Magnetic bidders were ported from CPython to PyPy, yielding an overall 
30% performance gain.
+
+- Julian Berman
+
+  magnetic.com
+
+Success stories
+---------------
+
+    Currently we have improvements in raw performance (read: response times) 
that span from 8% to a pretty interesting 40%, but we have a peak of an 
astonishing 100-120% and even more.
+
+    Take into the account that most of our apps are simple "blocking-on-db" 
ones, so a 2x increase is literally money.
+
+- Roberto De Ioris
+
+  Unbit
+
+Success stories
+---------------
+
+    In addition to this our main (almost secret) objective was reducing 
resource usage of the application servers, which directly translates to being 
able to host more customers on the same server.
+
+- Roberto De Ioris
+
+  Unbit
+
+Success stories
+---------------
+
+    PyPy is an excellent choice for every pure Python project that depends on 
speed of execution of readable and maintainable large source code.
+    [...]
+    We had roughly a 2x speedup with PyPy over CPython.
+
+- Marko Tasic (Web and Data processing)
+
+Future
+------
+
+- Python 3.3
+
+- NumPyPy
+
+- STM
+
+- You can donate to help the progress of these features : pypy.org
+
+Questions
+---------
+
+- Questions ?
diff --git a/talk/pyconpl-2014/Makefile b/talk/pyconpl-2014/Makefile
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/Makefile
@@ -0,0 +1,18 @@
+# you can find rst2beamer.py here:
+# https://bitbucket.org/antocuni/env/raw/default/bin/rst2beamer.py
+
+# WARNING: to work, it needs this patch for docutils
+# 
https://sourceforge.net/tracker/?func=detail&atid=422032&aid=1459707&group_id=38414
+
+talk.pdf: talk.rst author.latex stylesheet.latex
+       rst2beamer --stylesheet=stylesheet.latex --documentoptions=14pt 
--output-encoding=utf8 --overlaybullets=False talk.rst talk.latex || exit
+       #/home/antocuni/.virtualenvs/rst2beamer/bin/python `which 
rst2beamer.py` --stylesheet=stylesheet.latex --documentoptions=14pt talk.rst 
talk.latex || exit
+       sed 's/\\date{}/\\input{author.latex}/' -i talk.latex || exit
+       #sed 's/\\maketitle/\\input{title.latex}/' -i talk.latex || exit
+       pdflatex talk.latex  || exit
+
+view: talk.pdf
+       evince talk.pdf &
+
+xpdf: talk.pdf
+       xpdf talk.pdf &
diff --git a/talk/pyconpl-2014/author.latex b/talk/pyconpl-2014/author.latex
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/author.latex
@@ -0,0 +1,10 @@
+\definecolor{rrblitbackground}{rgb}{0.0, 0.0, 0.0}
+
+\title[PyPy]{PyPy}
+\author[arigo, fijal]
+{Armin Rigo, Maciej Fija&#322;kowski\\
+\includegraphics[width=80px]{../img/py-web-new.png}
+\hspace{1em}
+\includegraphics[width=80px]{../img/baroquesoftware.png}}
+\institute{PyCon PL}
+\date{October 2014}
diff --git a/talk/pyconpl-2014/beamerdefs.txt b/talk/pyconpl-2014/beamerdefs.txt
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/beamerdefs.txt
@@ -0,0 +1,108 @@
+.. colors
+.. ===========================
+
+.. role:: green
+.. role:: red
+
+
+.. general useful commands
+.. ===========================
+
+.. |pause| raw:: latex
+
+   \pause
+
+.. |small| raw:: latex
+
+   {\small
+
+.. |end_small| raw:: latex
+
+   }
+
+.. |scriptsize| raw:: latex
+
+   {\scriptsize
+
+.. |end_scriptsize| raw:: latex
+
+   }
+
+.. |strike<| raw:: latex
+
+   \sout{
+
+.. closed bracket
+.. ===========================
+
+.. |>| raw:: latex
+
+   }
+
+
+.. example block
+.. ===========================
+
+.. |example<| raw:: latex
+
+   \begin{exampleblock}{
+
+
+.. |end_example| raw:: latex
+
+   \end{exampleblock}
+
+
+
+.. alert block
+.. ===========================
+
+.. |alert<| raw:: latex
+
+   \begin{alertblock}{
+
+
+.. |end_alert| raw:: latex
+
+   \end{alertblock}
+
+
+
+.. columns
+.. ===========================
+
+.. |column1| raw:: latex
+
+   \begin{columns}
+      \begin{column}{0.45\textwidth}
+
+.. |column2| raw:: latex
+
+      \end{column}
+      \begin{column}{0.45\textwidth}
+
+
+.. |end_columns| raw:: latex
+
+      \end{column}
+   \end{columns}
+
+
+
+.. |snake| image:: ../../img/py-web-new.png
+           :scale: 15%
+           
+
+
+.. nested blocks
+.. ===========================
+
+.. |nested| raw:: latex
+
+   \begin{columns}
+      \begin{column}{0.85\textwidth}
+
+.. |end_nested| raw:: latex
+
+      \end{column}
+   \end{columns}
diff --git a/talk/pyconpl-2014/benchmarks/abstract.rst 
b/talk/pyconpl-2014/benchmarks/abstract.rst
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/benchmarks/abstract.rst
@@ -0,0 +1,7 @@
+How to benchmark code
+---------------------
+
+In this talk, we would like to present basics of how the Python virtual 
machines
+like CPython or PyPy work and how to use that knowledge to write meaningful
+benchmarks for your programs. We'll show what's wrong with microbenchmarks
+and how to improve the situation.
diff --git a/talk/pyconpl-2014/benchmarks/f1.py 
b/talk/pyconpl-2014/benchmarks/f1.py
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/benchmarks/f1.py
@@ -0,0 +1,8 @@
+
+def f():
+    i = 0
+    while i < 100000000:
+        i += 1
+    return i
+
+f()
diff --git a/talk/pyconpl-2014/benchmarks/f2.py 
b/talk/pyconpl-2014/benchmarks/f2.py
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/benchmarks/f2.py
@@ -0,0 +1,10 @@
+
+def f():
+    i = 0
+    s = 0
+    while i < 100000000:
+        s += len(str(i))
+        i += 1
+    return s
+
+print f()
diff --git a/talk/pyconpl-2014/benchmarks/fib.py 
b/talk/pyconpl-2014/benchmarks/fib.py
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/benchmarks/fib.py
@@ -0,0 +1,31 @@
+
+import time
+import numpy
+try:
+    from matplotlib import pylab
+except:
+    from embed.emb import import_mod
+    pylab = import_mod('matplotlib.pylab')
+
+def fib(n):
+    if n == 0 or n == 1:
+        return 1
+    return fib(n - 1) + fib(n - 2)
+
+def f():
+    for i in range(10000):
+        "".join(list(str(i)))
+
+times = []
+for i in xrange(1000):
+    t0 = time.time()
+    #f()
+    fib(17)
+    times.append(time.time() - t0)
+
+hist, bins = numpy.histogram(times, 20)
+#pylab.plot(bins[:-1], hist)
+pylab.ylim(0, max(times) * 1.2)
+pylab.plot(numpy.array(times))
+#pylab.hist(hist, bins, histtype='bar')
+pylab.show()
diff --git a/talk/pyconpl-2014/benchmarks/talk.rst 
b/talk/pyconpl-2014/benchmarks/talk.rst
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/benchmarks/talk.rst
@@ -0,0 +1,163 @@
+.. include:: ../beamerdefs.txt
+
+---------------------
+How to benchmark code
+---------------------
+
+Who are we?
+------------
+
+* Maciej Fijalkowski, Armin Rigo
+
+* working on PyPy
+
+* interested in performance
+
+What is this talk is about?
+---------------------------
+
+* basics how CPython and PyPy run programs
+
+* a bit of theory about measuring performance
+
+* microbenchmarks
+
+* complicated picture of "real world"
+
+CPython
+-------
+
+* a "simple" virtual machine
+
+* compiles python code to bytecode
+
+* runs the bytecode
+
+* usually invokes tons of runtime functions written in C
+
+CPython (demo)
+--------------
+
+PyPy
+----
+
+* not so simple virtual machine
+
+* all of the above
+
+* ... and then if the loop/function gets called often enough
+  it's compiled down to an optimized assembler by the JIT
+
+PyPy (demo)
+-----------
+
+Measurments 101
+---------------
+
+* run your benchmark multiple times
+
+* the distribution should be gaussian
+
+* take the average and the variation
+
+* if the variation is too large, increase the number of iterations
+
+Let's do it (demo)
+------------------
+
+Problems
+--------
+
+* the whole previous slide is a bunch of nonsense
+
+* ...
+
+"Solution"
+----------
+
+* you try your best and do the average anyway
+
+* presumably cutting off the warmup time
+
+|pause|
+
+* not ideal at all
+
+Writing benchmarks - typical approach
+-------------------------------------
+
+* write a set of small programs that exercise one particular thing
+
+  * recursive fibonacci
+
+  * pybench
+
+PyBench
+-------
+
+* used to be a tool to compare python implementations
+
+* only uses microbenchmarks
+
+* assumes operation times are concatenative
+
+Problems
+--------
+
+* a lot of effects are not concatenative
+
+* optimizations often collapse consecutive operations
+
+* large scale effects only show up on large programs
+
+An example
+----------
+
+* python 2.6 vs python 2.7 had minimal performance changes
+
+* somewhere in the changelog, there is a gc change mentioned
+
+* it made pypy translation toolchain jump from 3h to 1h
+
+* it's "impossible" to write a microbenchmarks for this
+
+More problems
+-------------
+
+* half of the blog posts comparing VM performance uses recursive fibonacci
+
+* most of the others use computer language shootout
+
+PyPy benchmark suite
+--------------------
+
+* programs from small to medium and large
+
+* 50 LOC to 100k LOC
+
+* try to exercise various parts of language (but e.g. lack IO)
+
+Solutions
+---------
+
+* measure what you are really interested in
+
+* derive microbenchmarks from your bottlenecks
+
+* be skeptical
+
+* understand what you're measuring
+
+Q&A
+---
+
+- http://pypy.org/
+
+- http://morepypy.blogspot.com/
+
+- http://baroquesoftware.com/
+
+- ``#pypy`` at freenode.net
+
+- Any question?
+
diff --git a/talk/pyconpl-2014/speed.png b/talk/pyconpl-2014/speed.png
new file mode 100644
index 
0000000000000000000000000000000000000000..33fe20ac9d81ddbd3ced48f52f9717693dc15518
GIT binary patch

[cut]

diff --git a/talk/pyconpl-2014/standards.png b/talk/pyconpl-2014/standards.png
new file mode 100644
index 
0000000000000000000000000000000000000000..5d38303773dd4f1b798a91bec62d05e0423a6a0d
GIT binary patch

[cut]

diff --git a/talk/pyconpl-2014/stylesheet.latex 
b/talk/pyconpl-2014/stylesheet.latex
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/stylesheet.latex
@@ -0,0 +1,9 @@
+\setbeamercovered{transparent}
+\setbeamertemplate{navigation symbols}{}
+
+\definecolor{darkgreen}{rgb}{0, 0.5, 0.0}
+\newcommand{\docutilsrolegreen}[1]{\color{darkgreen}#1\normalcolor}
+\newcommand{\docutilsrolered}[1]{\color{red}#1\normalcolor}
+
+\newcommand{\green}[1]{\color{darkgreen}#1\normalcolor}
+\newcommand{\red}[1]{\color{red}#1\normalcolor}
diff --git a/talk/pyconpl-2014/talk.pdf b/talk/pyconpl-2014/talk.pdf
new file mode 100644
index 
0000000000000000000000000000000000000000..96cfd649da81c326dec0dbe7d92087b9864229d7
GIT binary patch

[cut]

diff --git a/talk/pyconpl-2014/talk.rst b/talk/pyconpl-2014/talk.rst
new file mode 100644
--- /dev/null
+++ b/talk/pyconpl-2014/talk.rst
@@ -0,0 +1,297 @@
+.. include:: beamerdefs.txt
+
+================================
+PyPy
+================================
+
+Who We Are
+----------
+
+* Maciej Fija&#322;kowski
+
+* Armin Rigo
+
+* PyPy developers for a long time
+
+* baroquesoftware
+
+What is PyPy?
+--------------
+
+* Python interpreter, alternative to CPython
+
+* Supports Python 2.7 and (beta) Python 3.2/3.3
+
+* Compatible and generally much faster (JIT)
+
+Benchmarks
+--------------------------------
+
+.. image:: speed.png
+   :scale: 44%
+   :align: center
+
+Demo
+--------------------------------
+
+
+Recent developments
+--------------------------------
+
+Between PyPy 2.0 (May 2013) and PyPy 2.4 (now):
+
+.
+
+* All kinds of speed improvements for all kinds of programs
+
+  - JIT improvements, incremental GC (garbage collector),
+    specific Python corners improved, ...
+
+* Support for ARM in addition to x86
+
+  - Thanks to the Raspberry-Pi foundation
+
+* Python 3 support 
+
+  - py3k, in addition to Python 2.7
+
+* Numpy more complete (but still not done)
+
+Status
+-----------------------------
+
+- Python code "just works"
+
+  * generally much faster than with CPython
+
+- C code: improving support
+
+  * cpyext: tries to load CPython C extension modules, slowly
+
+  * CFFI: the future
+
+  * cppyy for C++
+
+  * A very small native PyPy C API for embedding, WIP
+
+- Lots of CFFI modules around:
+
+  * pyopenssl, pygame_cffi, psycopg2cffi, lxml...
+
+Fundraising Campaign
+---------------------
+
+- py3k: 55'000 $ of 105'000 $ (52%)
+
+- numpy: 48'000 $ of 60'000 $ (80%)
+
+- STM, 1st call: 38'000 $
+
+- STM, 2nd call: 17'000 $ of 80'000 $ (22%)
+
+- Thanks to all donors!
+
+Commercial support
+------------------
+
+- We offer commercial support for PyPy
+
+- Consultancy and training
+
+- Performance issues for open- or closed-source programs, porting,
+  improving support in parts of the Python or non-Python interpreters,
+  etc.
+
+- http://baroquesoftware.com
+
+Recent developments (2)
+--------------------------------
+
+* CFFI
+
+  - C Foreign Function Interface
+
+* STM
+
+  - Software Transactional Memory
+
+CFFI
+-----
+
+- Python <-> C interfacing done right
+
+  * existing shared libraries
+
+  * custom C code
+
+- Alternative to the CPython Extension API, ctypes, Cython, etc.
+
+- Fast-ish on CPython, super-fast on PyPy, Jython support in the future
+
+- Simple, does not try to be magic
+
+CFFI
+----
+
+.. image:: standards.png
+   :scale: 50%
+   :align: center
+
+CFFI demo
+---------
+
+CFFI idea
+---------
+
+* C and Python are enough, we don't need an extra language
+
+* C is well defined, let's avoid magic
+
+* all the logic (and magic!) can be done in Python
+
+* API vs ABI
+
+* Inspired by LuaJIT's FFI
+
+Work in Progress: STM
+---------------------
+
+- Software Transactional Memory
+
+- Solving the GIL problem
+
+  * GIL = Global Interpreter Lock
+
+- Without bringing the threads and locks mess
+
+- Preliminary versions of pypy-jit-stm available
+
+STM (2)
+-------
+
+- STM = Free Threading done right
+
+  * with some overhead: 30-40% so far
+
+- Done at the level of RPython
+
+- The interpreter author doesn't have to worry
+  about adding tons of locks
+  
+  - that's us
+
+- The user *can* if he likes, but doesn't have to either
+
+  - that's you ``:-)``
+
+STM (3)
+-------
+
+- Works "like a GIL" but runs optimistically in parallel
+
+- A few bytecodes from thread A run on core 1
+
+- A few bytecodes from thread B run on core 2
+
+- If there is no conflict, we're happy
+
+- If there is a conflict, one of the two aborts and retries
+
+- Same effect as transactions in databases
+
+STM (4)
+-------
+
+- Threading made simpler for the user
+
+- It is generally efficient with *very coarse locks*
+
+  * no fine-grained locking needed
+
+- Easy to convert a number of existing single-threaded programs
+
+  * start multiple threads, run blocks of code in each
+
+  * use a single lock around everything
+
+  * normally, you win absolutely nothing
+
+  * but STM can (try to) *execute the blocks in parallel* anyway
+
+STM (Demo)
+----------
+
+PyPy and RPython
+---------------------------
+
+* PyPy is an interpreter/JIT-compiled for Python
+
+* PyPy is written in RPython
+
+* RPython is a language for writing interpreters:
+  it provides GC-for-free, JIT-for-free, etc.
+
+* Ideal for writing VMs for dynamic languages
+
+More PyPy-Powered Languages
+----------------------------
+
+- Topaz: implementing Ruby
+
+  * most of the language implemented
+
+  * "definitely faster than MRI"
+
+  * https://github.com/topazproject/topaz
+
+- HippyVM: implementing PHP
+
+  * ~7x faster than standard PHP
+
+  * comparable speed as HHVM
+
+  * http://hippyvm.com/
+
+- And more
+
+Future
+------
+
+* future is hard to predict
+
+* continue working on general improvements
+
+* improved IO performance in the pipeline
+
+* warmup improvements
+
+* numpy
+
+Warmup improvements
+-------------------
+
+* biggest complain - slow to warmup, memory hog
+
+* we have ideas how to improve the situation
+
+* still looking for funding
+
+Numpy
+-----
+
+* numpy is mostly complete
+
+* performance can be improved, especially the vectorized versions
+
+* scipy, matplotlib, the entire ecosystem, we have a hackish idea
+
+Contacts, Q&A
+--------------
+
+- http://pypy.org
+
+- http://morepypy.blogspot.com/
+
+- ``#pypy`` at freenode.net
+
+- Any question?
_______________________________________________
pypy-commit mailing list
pypy-commit@python.org
https://mail.python.org/mailman/listinfo/pypy-commit

[pypy-commit] extradoc extradoc: merge

Reply via email to