Re: [jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching
No, no review yet. My thoughts are the same as last time - maintaining a monkeypatch of distutils is a bit scary. But I need to take a closer look first. Andi.. > On Jul 31, 2014, at 15:39, "Lee Skillen (JIRA)" wrote: > > >[ > https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080873#comment-14080873 > ] > > Lee Skillen commented on PYLUCENE-31: > - > > Andi - Did you (or anyone else) get a chance to review/try this? Maybe it's > a little too experimental, but thoughts appreciated. :-) > >> JCC Parallel/Multiprocess Compilation + Caching >> --- >> >>Key: PYLUCENE-31 >>URL: https://issues.apache.org/jira/browse/PYLUCENE-31 >>Project: PyLucene >> Issue Type: Improvement >>Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux >> Reporter: Lee Skillen >> Priority: Minor >> Labels: build, cache, ccache, distutils, jcc, parallel >>Attachments: feature-parallel-build.patch >> >> >> JCC utilises distutils.Extension() in order to build JCC itself and the >> packages that it generates for Java wrapping - Unfortunately distutils >> performs its build sequentially and doesn't take advantage of any additional >> free cores for parallel building. As discussed on the list this is likely a >> design decision due to potential issues that may arise when building >> projects with awkward, cyclic or recursive dependencies. >> These issues shouldn't appear within JCC-based projects because of the >> generative nature of the build; i.e. all dependencies are resolved and >> generated prior to building, and the build process itself is about >> compilation and construction of the wrapper alone, of which the wrapper >> files are contained to a sequence of flattened compilation units. >> Enabling this requires monkey patching of distutils, which was also >> discussed on the list as being a potential source of issues, although we >> feel that the risk is likely lower than the current setuptools patching >> utilised. This would be optional functionality that is also only enabled if >> the monkey-patching succeeds. Distutils itself is also part of the standard >> library and might be less susceptible to change than setuptools, and the >> area of code monkey patched almost hasn't changed since 2002 (see: >> http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py). >> In addition to the distutils changes this patch also includes changes to the >> wrapper class generation to make it more cache friendly, with the target >> being that no changes in the wrapped code equals no changes in the wrapper >> code. So any changes that minimally change the wrapped code mean that with >> a tool such as ccache the rebuild time would be significantly reduced >> (almost to a nth, where n is the number of files and only one has changed). >> Obviously the maintainers would have to assess this risk and decide whether >> they would like to accept the patch or not. Code has only been tested on >> Linux with Python 2.7.5 but should gracefully fail and prevent >> parallelisation if one of the requirements hasn't been met (not on linux, no >> multiprocessing support, or monkey patching somehow fails). The change to >> caching should still benefit everyone regardless. >> Please note that an additional dependency on orderedset has been added to >> achieve the more deterministic ordering - This may not be desirable (i.e. >> another package might be desired, such as ordered-set, or the code might be >> inlined into the package instead), as per maintainer comments. >> --- [following repeated from mailing list] --- >> Performance Statistics :- >> The following are some quick and dirty statistics for building the jcc >> pylucene itself (incl. java lucene which accounts for about 30-ish seconds >> upfront) - The JCC files are split using --files 8, and each build is >> preceded with a make clean: >> Serial (unpatched): >> real5m1.502s >> user5m22.887s >> sys 0m7.749s >> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs): >> real1m37.382s >> user7m16.658s >> sys 0m8.697s >> Furthermore, some additional changes were made to the wrapped file >> generation to make the generated code more ccache friendly (additional >> deterministic sorting for methods and some usage of an ordered set). With >> these in place and the CC and CCACHE_COMPILERCHECK environment variables set >> to "ccache gcc" and "content" respectively, and ensuring ccache is >> installed, subsequent compilation time is reduced again as follows: >> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache >> enabled): >> real0m43.051s >> user1m10.392s >> sys 0m4.547s >> This was a run in
[jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching
[ https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080873#comment-14080873 ] Lee Skillen commented on PYLUCENE-31: - Andi - Did you (or anyone else) get a chance to review/try this? Maybe it's a little too experimental, but thoughts appreciated. :-) > JCC Parallel/Multiprocess Compilation + Caching > --- > > Key: PYLUCENE-31 > URL: https://issues.apache.org/jira/browse/PYLUCENE-31 > Project: PyLucene > Issue Type: Improvement > Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux >Reporter: Lee Skillen >Priority: Minor > Labels: build, cache, ccache, distutils, jcc, parallel > Attachments: feature-parallel-build.patch > > > JCC utilises distutils.Extension() in order to build JCC itself and the > packages that it generates for Java wrapping - Unfortunately distutils > performs its build sequentially and doesn't take advantage of any additional > free cores for parallel building. As discussed on the list this is likely a > design decision due to potential issues that may arise when building projects > with awkward, cyclic or recursive dependencies. > These issues shouldn't appear within JCC-based projects because of the > generative nature of the build; i.e. all dependencies are resolved and > generated prior to building, and the build process itself is about > compilation and construction of the wrapper alone, of which the wrapper files > are contained to a sequence of flattened compilation units. > Enabling this requires monkey patching of distutils, which was also discussed > on the list as being a potential source of issues, although we feel that the > risk is likely lower than the current setuptools patching utilised. This > would be optional functionality that is also only enabled if the > monkey-patching succeeds. Distutils itself is also part of the standard > library and might be less susceptible to change than setuptools, and the area > of code monkey patched almost hasn't changed since 2002 (see: > http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py). > In addition to the distutils changes this patch also includes changes to the > wrapper class generation to make it more cache friendly, with the target > being that no changes in the wrapped code equals no changes in the wrapper > code. So any changes that minimally change the wrapped code mean that with a > tool such as ccache the rebuild time would be significantly reduced (almost > to a nth, where n is the number of files and only one has changed). > Obviously the maintainers would have to assess this risk and decide whether > they would like to accept the patch or not. Code has only been tested on > Linux with Python 2.7.5 but should gracefully fail and prevent > parallelisation if one of the requirements hasn't been met (not on linux, no > multiprocessing support, or monkey patching somehow fails). The change to > caching should still benefit everyone regardless. > Please note that an additional dependency on orderedset has been added to > achieve the more deterministic ordering - This may not be desirable (i.e. > another package might be desired, such as ordered-set, or the code might be > inlined into the package instead), as per maintainer comments. > --- [following repeated from mailing list] --- > Performance Statistics :- > The following are some quick and dirty statistics for building the jcc > pylucene itself (incl. java lucene which accounts for about 30-ish seconds > upfront) - The JCC files are split using --files 8, and each build is > preceded with a make clean: > Serial (unpatched): > real5m1.502s > user5m22.887s > sys 0m7.749s > Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs): > real1m37.382s > user7m16.658s > sys 0m8.697s > Furthermore, some additional changes were made to the wrapped file generation > to make the generated code more ccache friendly (additional deterministic > sorting for methods and some usage of an ordered set). With these in place > and the CC and CCACHE_COMPILERCHECK environment variables set to "ccache gcc" > and "content" respectively, and ensuring ccache is installed, subsequent > compilation time is reduced again as follows: > Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache > enabled): > real0m43.051s > user1m10.392s > sys 0m4.547s > This was a run in which nothing changed between runs, so a realistic run in > which changes occur it'll be a figure between 0m43.051s and 1m37.382s, > depending on how drastic the change was. If many changes are expected and you > want to keep it more cache friendly then using a higher --files would > probably work (to an exten
[jira] [Commented] (PYLUCENE-30) JCC: Through-Layer Python Exception Support
[ https://issues.apache.org/jira/browse/PYLUCENE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080866#comment-14080866 ] Lee Skillen commented on PYLUCENE-30: - That's great, thank you very much for your help as well Andi. > JCC: Through-Layer Python Exception Support > --- > > Key: PYLUCENE-30 > URL: https://issues.apache.org/jira/browse/PYLUCENE-30 > Project: PyLucene > Issue Type: Improvement > Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux > JCC version 2.20 (svn trunk) >Reporter: Lee Skillen > Labels: exception, jcc, python > Attachments: feature-thru-exception-3.patch, jccthrutest.tgz > > > Add the capability to throw and re-capture the original Python exception when > thrown from the PythonVM layer (e.g. in an extension), passed through the > JavaVM, and re-caught within the host PythonVM. Informally entitled as > through-layer python exception support. > Work between myself and Andi Vajda has been conducted to add support for > this, with the original patch being submitted on the mailing list on Friday, > 4th July 2014 - The latest patch which incorporates suggested code by Andi > was posted to the list on Thursday, 10th July (this patch will also be > attached to this issue). > See: JCC Project Extensions email thread on the mailing list for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)