[ https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080873#comment-14080873 ]
Lee Skillen commented on PYLUCENE-31: ------------------------------------- Andi - Did you (or anyone else) get a chance to review/try this? Maybe it's a little too experimental, but thoughts appreciated. :-) > JCC Parallel/Multiprocess Compilation + Caching > ----------------------------------------------- > > Key: PYLUCENE-31 > URL: https://issues.apache.org/jira/browse/PYLUCENE-31 > Project: PyLucene > Issue Type: Improvement > Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux > Reporter: Lee Skillen > Priority: Minor > Labels: build, cache, ccache, distutils, jcc, parallel > Attachments: feature-parallel-build.patch > > > JCC utilises distutils.Extension() in order to build JCC itself and the > packages that it generates for Java wrapping - Unfortunately distutils > performs its build sequentially and doesn't take advantage of any additional > free cores for parallel building. As discussed on the list this is likely a > design decision due to potential issues that may arise when building projects > with awkward, cyclic or recursive dependencies. > These issues shouldn't appear within JCC-based projects because of the > generative nature of the build; i.e. all dependencies are resolved and > generated prior to building, and the build process itself is about > compilation and construction of the wrapper alone, of which the wrapper files > are contained to a sequence of flattened compilation units. > Enabling this requires monkey patching of distutils, which was also discussed > on the list as being a potential source of issues, although we feel that the > risk is likely lower than the current setuptools patching utilised. This > would be optional functionality that is also only enabled if the > monkey-patching succeeds. Distutils itself is also part of the standard > library and might be less susceptible to change than setuptools, and the area > of code monkey patched almost hasn't changed since 2002 (see: > http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py). > In addition to the distutils changes this patch also includes changes to the > wrapper class generation to make it more cache friendly, with the target > being that no changes in the wrapped code equals no changes in the wrapper > code. So any changes that minimally change the wrapped code mean that with a > tool such as ccache the rebuild time would be significantly reduced (almost > to a nth, where n is the number of files and only one has changed). > Obviously the maintainers would have to assess this risk and decide whether > they would like to accept the patch or not. Code has only been tested on > Linux with Python 2.7.5 but should gracefully fail and prevent > parallelisation if one of the requirements hasn't been met (not on linux, no > multiprocessing support, or monkey patching somehow fails). The change to > caching should still benefit everyone regardless. > Please note that an additional dependency on orderedset has been added to > achieve the more deterministic ordering - This may not be desirable (i.e. > another package might be desired, such as ordered-set, or the code might be > inlined into the package instead), as per maintainer comments. > --- [following repeated from mailing list] --- > Performance Statistics :- > The following are some quick and dirty statistics for building the jcc > pylucene itself (incl. java lucene which accounts for about 30-ish seconds > upfront) - The JCC files are split using --files 8, and each build is > preceded with a make clean: > Serial (unpatched): > real 5m1.502s > user 5m22.887s > sys 0m7.749s > Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs): > real 1m37.382s > user 7m16.658s > sys 0m8.697s > Furthermore, some additional changes were made to the wrapped file generation > to make the generated code more ccache friendly (additional deterministic > sorting for methods and some usage of an ordered set). With these in place > and the CC and CCACHE_COMPILERCHECK environment variables set to "ccache gcc" > and "content" respectively, and ensuring ccache is installed, subsequent > compilation time is reduced again as follows: > Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache > enabled): > real 0m43.051s > user 1m10.392s > sys 0m4.547s > This was a run in which nothing changed between runs, so a realistic run in > which changes occur it'll be a figure between 0m43.051s and 1m37.382s, > depending on how drastic the change was. If many changes are expected and you > want to keep it more cache friendly then using a higher --files would > probably work (to an extent), or ideally use --files separate, although it > doesn't currently work for me (need to investigate). > We're mostly utilising the PyLucene build as a test bed since it is > repeatable for others, rather than just showing numbers for own application > compilations; we also use it to run the unit test suite after changes to JCC > itself to ensure it still works as intended for PyLucene. For illustrative > purposes though our application takes 1m53s to compile with JCC from scratch > serially, 0m31s in parallel (8 jobs), 0m14s in parallel with ccache enabled > and minimal changes, and 0m8s with ccache and no changes. A very agreeable > result! -- This message was sent by Atlassian JIRA (v6.2#6252)