Re: [jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-07-31 Thread Andi Vajda
No, no review yet. My thoughts are the same as last time - maintaining a 
monkeypatch of distutils is a bit scary. But I need to take a closer look first.

Andi..

> On Jul 31, 2014, at 15:39, "Lee Skillen (JIRA)"  wrote:
> 
> 
>[ 
> https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080873#comment-14080873
>  ] 
> 
> Lee Skillen commented on PYLUCENE-31:
> -
> 
> Andi - Did you (or anyone else) get a chance to review/try this?  Maybe it's 
> a little too experimental, but thoughts appreciated. :-)
> 
>> JCC Parallel/Multiprocess Compilation + Caching
>> ---
>> 
>>Key: PYLUCENE-31
>>URL: https://issues.apache.org/jira/browse/PYLUCENE-31
>>Project: PyLucene
>> Issue Type: Improvement
>>Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
>>   Reporter: Lee Skillen
>>   Priority: Minor
>> Labels: build, cache, ccache, distutils, jcc, parallel
>>Attachments: feature-parallel-build.patch
>> 
>> 
>> JCC utilises distutils.Extension() in order to build JCC itself and the 
>> packages that it generates for Java wrapping - Unfortunately distutils 
>> performs its build sequentially and doesn't take advantage of any additional 
>> free cores for parallel building.  As discussed on the list this is likely a 
>> design decision due to potential issues that may arise when building 
>> projects with awkward, cyclic or recursive dependencies.
>> These issues shouldn't appear within JCC-based projects because of the 
>> generative nature of the build; i.e. all dependencies are resolved and 
>> generated prior to building, and the build process itself is about 
>> compilation and construction of the wrapper alone, of which the wrapper 
>> files are contained to a sequence of flattened compilation units.
>> Enabling this requires monkey patching of distutils, which was also 
>> discussed on the list as being a potential source of issues, although we 
>> feel that the risk is likely lower than the current setuptools patching 
>> utilised.  This would be optional functionality that is also only enabled if 
>> the monkey-patching succeeds.  Distutils itself is also part of the standard 
>> library and might be less susceptible to change than setuptools, and the 
>> area of code monkey patched almost hasn't changed since 2002 (see: 
>> http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
>> In addition to the distutils changes this patch also includes changes to the 
>> wrapper class generation to make it more cache friendly, with the target 
>> being that no changes in the wrapped code equals no changes in the wrapper 
>> code.  So any changes that minimally change the wrapped code mean that with 
>> a tool such as ccache the rebuild time would be significantly reduced 
>> (almost to a nth, where n is the number of files and only one has changed).
>> Obviously the maintainers would have to assess this risk and decide whether 
>> they would like to accept the patch or not.  Code has only been tested on 
>> Linux with Python 2.7.5 but should gracefully fail and prevent 
>> parallelisation if one of the requirements hasn't been met (not on linux, no 
>> multiprocessing support, or monkey patching somehow fails).  The change to 
>> caching should still benefit everyone regardless.
>> Please note that an additional dependency on orderedset has been added to 
>> achieve the more deterministic ordering - This may not be desirable (i.e. 
>> another package might be desired, such as ordered-set, or the code might be 
>> inlined into the package instead), as per maintainer comments.
>> --- [following repeated from mailing list] ---
>> Performance Statistics :-
>> The following are some quick and dirty statistics for building the jcc 
>> pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
>> upfront) - The JCC files are split using --files 8, and each build is 
>> preceded with a make clean:
>> Serial (unpatched):
>> real5m1.502s
>> user5m22.887s
>> sys 0m7.749s
>> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
>> real1m37.382s
>> user7m16.658s
>> sys 0m8.697s
>> Furthermore, some additional changes were made to the wrapped file 
>> generation to make the generated code more ccache friendly (additional 
>> deterministic sorting for methods and some usage of an ordered set).  With 
>> these in place and the CC and CCACHE_COMPILERCHECK environment variables set 
>> to "ccache gcc" and "content" respectively, and ensuring ccache is 
>> installed, subsequent compilation time is reduced again as follows:
>> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
>> enabled):
>> real0m43.051s
>> user1m10.392s
>> sys 0m4.547s
>> This was a run in 

[jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-07-31 Thread Lee Skillen (JIRA)

[ 
https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080873#comment-14080873
 ] 

Lee Skillen commented on PYLUCENE-31:
-

Andi - Did you (or anyone else) get a chance to review/try this?  Maybe it's a 
little too experimental, but thoughts appreciated. :-)

> JCC Parallel/Multiprocess Compilation + Caching
> ---
>
> Key: PYLUCENE-31
> URL: https://issues.apache.org/jira/browse/PYLUCENE-31
> Project: PyLucene
>  Issue Type: Improvement
> Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
>Reporter: Lee Skillen
>Priority: Minor
>  Labels: build, cache, ccache, distutils, jcc, parallel
> Attachments: feature-parallel-build.patch
>
>
> JCC utilises distutils.Extension() in order to build JCC itself and the 
> packages that it generates for Java wrapping - Unfortunately distutils 
> performs its build sequentially and doesn't take advantage of any additional 
> free cores for parallel building.  As discussed on the list this is likely a 
> design decision due to potential issues that may arise when building projects 
> with awkward, cyclic or recursive dependencies.
> These issues shouldn't appear within JCC-based projects because of the 
> generative nature of the build; i.e. all dependencies are resolved and 
> generated prior to building, and the build process itself is about 
> compilation and construction of the wrapper alone, of which the wrapper files 
> are contained to a sequence of flattened compilation units.
> Enabling this requires monkey patching of distutils, which was also discussed 
> on the list as being a potential source of issues, although we feel that the 
> risk is likely lower than the current setuptools patching utilised.  This 
> would be optional functionality that is also only enabled if the 
> monkey-patching succeeds.  Distutils itself is also part of the standard 
> library and might be less susceptible to change than setuptools, and the area 
> of code monkey patched almost hasn't changed since 2002 (see: 
> http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
> In addition to the distutils changes this patch also includes changes to the 
> wrapper class generation to make it more cache friendly, with the target 
> being that no changes in the wrapped code equals no changes in the wrapper 
> code.  So any changes that minimally change the wrapped code mean that with a 
> tool such as ccache the rebuild time would be significantly reduced (almost 
> to a nth, where n is the number of files and only one has changed).
> Obviously the maintainers would have to assess this risk and decide whether 
> they would like to accept the patch or not.  Code has only been tested on 
> Linux with Python 2.7.5 but should gracefully fail and prevent 
> parallelisation if one of the requirements hasn't been met (not on linux, no 
> multiprocessing support, or monkey patching somehow fails).  The change to 
> caching should still benefit everyone regardless.
> Please note that an additional dependency on orderedset has been added to 
> achieve the more deterministic ordering - This may not be desirable (i.e. 
> another package might be desired, such as ordered-set, or the code might be 
> inlined into the package instead), as per maintainer comments.
> --- [following repeated from mailing list] ---
> Performance Statistics :-
> The following are some quick and dirty statistics for building the jcc 
> pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
> upfront) - The JCC files are split using --files 8, and each build is 
> preceded with a make clean:
> Serial (unpatched):
> real5m1.502s
> user5m22.887s
> sys 0m7.749s
> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
> real1m37.382s
> user7m16.658s
> sys 0m8.697s
> Furthermore, some additional changes were made to the wrapped file generation 
> to make the generated code more ccache friendly (additional deterministic 
> sorting for methods and some usage of an ordered set).  With these in place 
> and the CC and CCACHE_COMPILERCHECK environment variables set to "ccache gcc" 
> and "content" respectively, and ensuring ccache is installed, subsequent 
> compilation time is reduced again as follows:
> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
> enabled):
> real0m43.051s
> user1m10.392s
> sys 0m4.547s
> This was a run in which nothing changed between runs, so a realistic run in 
> which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
> depending on how drastic the change was. If many changes are expected and you 
> want to keep it more cache friendly then using a higher --files would 
> probably work (to an exten

[jira] [Commented] (PYLUCENE-30) JCC: Through-Layer Python Exception Support

2014-07-31 Thread Lee Skillen (JIRA)

[ 
https://issues.apache.org/jira/browse/PYLUCENE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080866#comment-14080866
 ] 

Lee Skillen commented on PYLUCENE-30:
-

That's great, thank you very much for your help as well Andi.

> JCC: Through-Layer Python Exception Support
> ---
>
> Key: PYLUCENE-30
> URL: https://issues.apache.org/jira/browse/PYLUCENE-30
> Project: PyLucene
>  Issue Type: Improvement
> Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
> JCC version 2.20 (svn trunk)
>Reporter: Lee Skillen
>  Labels: exception, jcc, python
> Attachments: feature-thru-exception-3.patch, jccthrutest.tgz
>
>
> Add the capability to throw and re-capture the original Python exception when 
> thrown from the PythonVM layer (e.g. in an extension), passed through the 
> JavaVM, and re-caught within the host PythonVM.  Informally entitled as 
> through-layer python exception support.
> Work between myself and Andi Vajda has been conducted to add support for 
> this, with the original patch being submitted on the mailing list on Friday, 
> 4th July 2014 - The latest patch which incorporates suggested code by Andi 
> was posted to the list on Thursday, 10th July (this patch will also be 
> attached to this issue).
> See: JCC Project Extensions email thread on the mailing list for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)