Lee Skillen created PYLUCENE-31:
-----------------------------------

             Summary: JCC Parallel/Multiprocess Compilation + Caching
                 Key: PYLUCENE-31
                 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
             Project: PyLucene
          Issue Type: Improvement
         Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
            Reporter: Lee Skillen
            Priority: Minor


JCC utilises distutils.Extension() in order to build JCC itself and the 
packages that it generates for Java wrapping - Unfortunately distutils performs 
its build sequentially and doesn't take advantage of any additional free cores 
for parallel building.  As discussed on the list this is likely a design 
decision due to potential issues that may arise when building projects with 
awkward, cyclic or recursive dependencies.

These issues shouldn't appear within JCC-based projects because of the 
generative nature of the build; i.e. all dependencies are resolved and 
generated prior to building, and the build process itself is about compilation 
and construction of the wrapper alone, of which the wrapper files are contained 
to a sequence of flattened compilation units.

Enabling this requires monkey patching of distutils, which was also discussed 
on the list as being a potential source of issues, although we feel that the 
risk is likely lower than the current setuptools patching utilised.  This would 
be optional functionality that is also only enabled if the monkey-patching 
succeeds.  Distutils itself is also part of the standard library and might be 
less susceptible to change than setuptools, and the area of code monkey patched 
almost hasn't changed since 2002 (see: 
http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).

In addition to the distutils changes this patch also includes changes to the 
wrapper class generation to make it more cache friendly, with the target being 
that no changes in the wrapped code equals no changes in the wrapper code.  So 
any changes that minimally change the wrapped code mean that with a tool such 
as ccache the rebuild time would be significantly reduced (almost to a nth, 
where n is the number of files and only one has changed).

Obviously the maintainers would have to assess this risk and decide whether 
they would like to accept the patch or not.  Code has only been tested on Linux 
with Python 2.7.5 but should gracefully fail and prevent parallelisation if one 
of the requirements hasn't been met (not on linux, no multiprocessing support, 
or monkey patching somehow fails).  The change to caching should still benefit 
everyone regardless.

Please note that an additional dependency on orderedset has been added to 
achieve the more deterministic ordering - This may not be desirable (i.e. 
another package might be desired, such as ordered-set, or the code might be 
inlined into the package instead), as per maintainer comments.

--- [following repeated from mailing list] ---

Performance Statistics :-

The following are some quick and dirty statistics for building the jcc pylucene 
itself (incl. java lucene which accounts for about 30-ish seconds upfront) - 
The JCC files are split using --files 8, and each build is preceded with a make 
clean:

Serial (unpatched):

real    5m1.502s
user    5m22.887s
sys     0m7.749s

Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):

real    1m37.382s
user    7m16.658s
sys     0m8.697s

Furthermore, some additional changes were made to the wrapped file generation 
to make the generated code more ccache friendly (additional deterministic 
sorting for methods and some usage of an ordered set).  With these in place and 
the CC and CCACHE_COMPILERCHECK environment variables set to "ccache gcc" and 
"content" respectively, and ensuring ccache is installed, subsequent 
compilation time is reduced again as follows:

Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
enabled):

real    0m43.051s
user    1m10.392s
sys     0m4.547s

This was a run in which nothing changed between runs, so a realistic run in 
which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
depending on how drastic the change was. If many changes are expected and you 
want to keep it more cache friendly then using a higher --files would probably 
work (to an extent), or ideally use --files separate, although it doesn't 
currently work for me (need to investigate).

We're mostly utilising the PyLucene build as a test bed since it is repeatable 
for others, rather than just showing numbers for own application compilations; 
we also use it to run the unit test suite after changes to JCC itself to ensure 
it still works as intended for PyLucene.  For illustrative purposes though our 
application takes 1m53s to compile with JCC from scratch serially, 0m31s in 
parallel (8 jobs), 0m14s in parallel with ccache enabled and minimal changes, 
and 0m8s with ccache and no changes.  A very agreeable result!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to