[jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-09-10 Thread Lee Skillen (JIRA)

[ 
https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128277#comment-14128277
 ] 

Lee Skillen commented on PYLUCENE-31:
-

Not a problem Andi - Please let me know if you have any questions about the 
code when you have a look.  I'm happy to help!

 JCC Parallel/Multiprocess Compilation + Caching
 ---

 Key: PYLUCENE-31
 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
Reporter: Lee Skillen
Priority: Minor
  Labels: build, cache, ccache, distutils, jcc, parallel
 Attachments: feature-parallel-build.patch


 JCC utilises distutils.Extension() in order to build JCC itself and the 
 packages that it generates for Java wrapping - Unfortunately distutils 
 performs its build sequentially and doesn't take advantage of any additional 
 free cores for parallel building.  As discussed on the list this is likely a 
 design decision due to potential issues that may arise when building projects 
 with awkward, cyclic or recursive dependencies.
 These issues shouldn't appear within JCC-based projects because of the 
 generative nature of the build; i.e. all dependencies are resolved and 
 generated prior to building, and the build process itself is about 
 compilation and construction of the wrapper alone, of which the wrapper files 
 are contained to a sequence of flattened compilation units.
 Enabling this requires monkey patching of distutils, which was also discussed 
 on the list as being a potential source of issues, although we feel that the 
 risk is likely lower than the current setuptools patching utilised.  This 
 would be optional functionality that is also only enabled if the 
 monkey-patching succeeds.  Distutils itself is also part of the standard 
 library and might be less susceptible to change than setuptools, and the area 
 of code monkey patched almost hasn't changed since 2002 (see: 
 http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
 In addition to the distutils changes this patch also includes changes to the 
 wrapper class generation to make it more cache friendly, with the target 
 being that no changes in the wrapped code equals no changes in the wrapper 
 code.  So any changes that minimally change the wrapped code mean that with a 
 tool such as ccache the rebuild time would be significantly reduced (almost 
 to a nth, where n is the number of files and only one has changed).
 Obviously the maintainers would have to assess this risk and decide whether 
 they would like to accept the patch or not.  Code has only been tested on 
 Linux with Python 2.7.5 but should gracefully fail and prevent 
 parallelisation if one of the requirements hasn't been met (not on linux, no 
 multiprocessing support, or monkey patching somehow fails).  The change to 
 caching should still benefit everyone regardless.
 Please note that an additional dependency on orderedset has been added to 
 achieve the more deterministic ordering - This may not be desirable (i.e. 
 another package might be desired, such as ordered-set, or the code might be 
 inlined into the package instead), as per maintainer comments.
 --- [following repeated from mailing list] ---
 Performance Statistics :-
 The following are some quick and dirty statistics for building the jcc 
 pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
 upfront) - The JCC files are split using --files 8, and each build is 
 preceded with a make clean:
 Serial (unpatched):
 real5m1.502s
 user5m22.887s
 sys 0m7.749s
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
 real1m37.382s
 user7m16.658s
 sys 0m8.697s
 Furthermore, some additional changes were made to the wrapped file generation 
 to make the generated code more ccache friendly (additional deterministic 
 sorting for methods and some usage of an ordered set).  With these in place 
 and the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc 
 and content respectively, and ensuring ccache is installed, subsequent 
 compilation time is reduced again as follows:
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
 enabled):
 real0m43.051s
 user1m10.392s
 sys 0m4.547s
 This was a run in which nothing changed between runs, so a realistic run in 
 which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
 depending on how drastic the change was. If many changes are expected and you 
 want to keep it more cache friendly then using a higher --files would 
 probably work (to an extent), or ideally use --files separate, although it 
 doesn't currently work for me (need to 

[jira] [Commented] (PYLUCENE-30) JCC: Through-Layer Python Exception Support

2014-07-31 Thread Lee Skillen (JIRA)

[ 
https://issues.apache.org/jira/browse/PYLUCENE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080866#comment-14080866
 ] 

Lee Skillen commented on PYLUCENE-30:
-

That's great, thank you very much for your help as well Andi.

 JCC: Through-Layer Python Exception Support
 ---

 Key: PYLUCENE-30
 URL: https://issues.apache.org/jira/browse/PYLUCENE-30
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
 JCC version 2.20 (svn trunk)
Reporter: Lee Skillen
  Labels: exception, jcc, python
 Attachments: feature-thru-exception-3.patch, jccthrutest.tgz


 Add the capability to throw and re-capture the original Python exception when 
 thrown from the PythonVM layer (e.g. in an extension), passed through the 
 JavaVM, and re-caught within the host PythonVM.  Informally entitled as 
 through-layer python exception support.
 Work between myself and Andi Vajda has been conducted to add support for 
 this, with the original patch being submitted on the mailing list on Friday, 
 4th July 2014 - The latest patch which incorporates suggested code by Andi 
 was posted to the list on Thursday, 10th July (this patch will also be 
 attached to this issue).
 See: JCC Project Extensions email thread on the mailing list for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-07-31 Thread Lee Skillen (JIRA)

[ 
https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080873#comment-14080873
 ] 

Lee Skillen commented on PYLUCENE-31:
-

Andi - Did you (or anyone else) get a chance to review/try this?  Maybe it's a 
little too experimental, but thoughts appreciated. :-)

 JCC Parallel/Multiprocess Compilation + Caching
 ---

 Key: PYLUCENE-31
 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
Reporter: Lee Skillen
Priority: Minor
  Labels: build, cache, ccache, distutils, jcc, parallel
 Attachments: feature-parallel-build.patch


 JCC utilises distutils.Extension() in order to build JCC itself and the 
 packages that it generates for Java wrapping - Unfortunately distutils 
 performs its build sequentially and doesn't take advantage of any additional 
 free cores for parallel building.  As discussed on the list this is likely a 
 design decision due to potential issues that may arise when building projects 
 with awkward, cyclic or recursive dependencies.
 These issues shouldn't appear within JCC-based projects because of the 
 generative nature of the build; i.e. all dependencies are resolved and 
 generated prior to building, and the build process itself is about 
 compilation and construction of the wrapper alone, of which the wrapper files 
 are contained to a sequence of flattened compilation units.
 Enabling this requires monkey patching of distutils, which was also discussed 
 on the list as being a potential source of issues, although we feel that the 
 risk is likely lower than the current setuptools patching utilised.  This 
 would be optional functionality that is also only enabled if the 
 monkey-patching succeeds.  Distutils itself is also part of the standard 
 library and might be less susceptible to change than setuptools, and the area 
 of code monkey patched almost hasn't changed since 2002 (see: 
 http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
 In addition to the distutils changes this patch also includes changes to the 
 wrapper class generation to make it more cache friendly, with the target 
 being that no changes in the wrapped code equals no changes in the wrapper 
 code.  So any changes that minimally change the wrapped code mean that with a 
 tool such as ccache the rebuild time would be significantly reduced (almost 
 to a nth, where n is the number of files and only one has changed).
 Obviously the maintainers would have to assess this risk and decide whether 
 they would like to accept the patch or not.  Code has only been tested on 
 Linux with Python 2.7.5 but should gracefully fail and prevent 
 parallelisation if one of the requirements hasn't been met (not on linux, no 
 multiprocessing support, or monkey patching somehow fails).  The change to 
 caching should still benefit everyone regardless.
 Please note that an additional dependency on orderedset has been added to 
 achieve the more deterministic ordering - This may not be desirable (i.e. 
 another package might be desired, such as ordered-set, or the code might be 
 inlined into the package instead), as per maintainer comments.
 --- [following repeated from mailing list] ---
 Performance Statistics :-
 The following are some quick and dirty statistics for building the jcc 
 pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
 upfront) - The JCC files are split using --files 8, and each build is 
 preceded with a make clean:
 Serial (unpatched):
 real5m1.502s
 user5m22.887s
 sys 0m7.749s
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
 real1m37.382s
 user7m16.658s
 sys 0m8.697s
 Furthermore, some additional changes were made to the wrapped file generation 
 to make the generated code more ccache friendly (additional deterministic 
 sorting for methods and some usage of an ordered set).  With these in place 
 and the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc 
 and content respectively, and ensuring ccache is installed, subsequent 
 compilation time is reduced again as follows:
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
 enabled):
 real0m43.051s
 user1m10.392s
 sys 0m4.547s
 This was a run in which nothing changed between runs, so a realistic run in 
 which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
 depending on how drastic the change was. If many changes are expected and you 
 want to keep it more cache friendly then using a higher --files would 
 probably work (to an extent), or ideally use --files separate, although it 
 doesn't currently work for me 

[jira] [Created] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-07-21 Thread Lee Skillen (JIRA)
Lee Skillen created PYLUCENE-31:
---

 Summary: JCC Parallel/Multiprocess Compilation + Caching
 Key: PYLUCENE-31
 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
Reporter: Lee Skillen
Priority: Minor


JCC utilises distutils.Extension() in order to build JCC itself and the 
packages that it generates for Java wrapping - Unfortunately distutils performs 
its build sequentially and doesn't take advantage of any additional free cores 
for parallel building.  As discussed on the list this is likely a design 
decision due to potential issues that may arise when building projects with 
awkward, cyclic or recursive dependencies.

These issues shouldn't appear within JCC-based projects because of the 
generative nature of the build; i.e. all dependencies are resolved and 
generated prior to building, and the build process itself is about compilation 
and construction of the wrapper alone, of which the wrapper files are contained 
to a sequence of flattened compilation units.

Enabling this requires monkey patching of distutils, which was also discussed 
on the list as being a potential source of issues, although we feel that the 
risk is likely lower than the current setuptools patching utilised.  This would 
be optional functionality that is also only enabled if the monkey-patching 
succeeds.  Distutils itself is also part of the standard library and might be 
less susceptible to change than setuptools, and the area of code monkey patched 
almost hasn't changed since 2002 (see: 
http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).

In addition to the distutils changes this patch also includes changes to the 
wrapper class generation to make it more cache friendly, with the target being 
that no changes in the wrapped code equals no changes in the wrapper code.  So 
any changes that minimally change the wrapped code mean that with a tool such 
as ccache the rebuild time would be significantly reduced (almost to a nth, 
where n is the number of files and only one has changed).

Obviously the maintainers would have to assess this risk and decide whether 
they would like to accept the patch or not.  Code has only been tested on Linux 
with Python 2.7.5 but should gracefully fail and prevent parallelisation if one 
of the requirements hasn't been met (not on linux, no multiprocessing support, 
or monkey patching somehow fails).  The change to caching should still benefit 
everyone regardless.

Please note that an additional dependency on orderedset has been added to 
achieve the more deterministic ordering - This may not be desirable (i.e. 
another package might be desired, such as ordered-set, or the code might be 
inlined into the package instead), as per maintainer comments.

--- [following repeated from mailing list] ---

Performance Statistics :-

The following are some quick and dirty statistics for building the jcc pylucene 
itself (incl. java lucene which accounts for about 30-ish seconds upfront) - 
The JCC files are split using --files 8, and each build is preceded with a make 
clean:

Serial (unpatched):

real5m1.502s
user5m22.887s
sys 0m7.749s

Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):

real1m37.382s
user7m16.658s
sys 0m8.697s

Furthermore, some additional changes were made to the wrapped file generation 
to make the generated code more ccache friendly (additional deterministic 
sorting for methods and some usage of an ordered set).  With these in place and 
the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc and 
content respectively, and ensuring ccache is installed, subsequent 
compilation time is reduced again as follows:

Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
enabled):

real0m43.051s
user1m10.392s
sys 0m4.547s

This was a run in which nothing changed between runs, so a realistic run in 
which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
depending on how drastic the change was. If many changes are expected and you 
want to keep it more cache friendly then using a higher --files would probably 
work (to an extent), or ideally use --files separate, although it doesn't 
currently work for me (need to investigate).

We're mostly utilising the PyLucene build as a test bed since it is repeatable 
for others, rather than just showing numbers for own application compilations; 
we also use it to run the unit test suite after changes to JCC itself to ensure 
it still works as intended for PyLucene.  For illustrative purposes though our 
application takes 1m53s to compile with JCC from scratch serially, 0m31s in 
parallel (8 jobs), 0m14s in parallel with ccache enabled and minimal changes, 
and 0m8s with 

[jira] [Updated] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-07-21 Thread Lee Skillen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Skillen updated PYLUCENE-31:


Attachment: feature-parallel-build.patch

 JCC Parallel/Multiprocess Compilation + Caching
 ---

 Key: PYLUCENE-31
 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
Reporter: Lee Skillen
Priority: Minor
  Labels: build, cache, ccache, distutils, jcc, parallel
 Attachments: feature-parallel-build.patch


 JCC utilises distutils.Extension() in order to build JCC itself and the 
 packages that it generates for Java wrapping - Unfortunately distutils 
 performs its build sequentially and doesn't take advantage of any additional 
 free cores for parallel building.  As discussed on the list this is likely a 
 design decision due to potential issues that may arise when building projects 
 with awkward, cyclic or recursive dependencies.
 These issues shouldn't appear within JCC-based projects because of the 
 generative nature of the build; i.e. all dependencies are resolved and 
 generated prior to building, and the build process itself is about 
 compilation and construction of the wrapper alone, of which the wrapper files 
 are contained to a sequence of flattened compilation units.
 Enabling this requires monkey patching of distutils, which was also discussed 
 on the list as being a potential source of issues, although we feel that the 
 risk is likely lower than the current setuptools patching utilised.  This 
 would be optional functionality that is also only enabled if the 
 monkey-patching succeeds.  Distutils itself is also part of the standard 
 library and might be less susceptible to change than setuptools, and the area 
 of code monkey patched almost hasn't changed since 2002 (see: 
 http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
 In addition to the distutils changes this patch also includes changes to the 
 wrapper class generation to make it more cache friendly, with the target 
 being that no changes in the wrapped code equals no changes in the wrapper 
 code.  So any changes that minimally change the wrapped code mean that with a 
 tool such as ccache the rebuild time would be significantly reduced (almost 
 to a nth, where n is the number of files and only one has changed).
 Obviously the maintainers would have to assess this risk and decide whether 
 they would like to accept the patch or not.  Code has only been tested on 
 Linux with Python 2.7.5 but should gracefully fail and prevent 
 parallelisation if one of the requirements hasn't been met (not on linux, no 
 multiprocessing support, or monkey patching somehow fails).  The change to 
 caching should still benefit everyone regardless.
 Please note that an additional dependency on orderedset has been added to 
 achieve the more deterministic ordering - This may not be desirable (i.e. 
 another package might be desired, such as ordered-set, or the code might be 
 inlined into the package instead), as per maintainer comments.
 --- [following repeated from mailing list] ---
 Performance Statistics :-
 The following are some quick and dirty statistics for building the jcc 
 pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
 upfront) - The JCC files are split using --files 8, and each build is 
 preceded with a make clean:
 Serial (unpatched):
 real5m1.502s
 user5m22.887s
 sys 0m7.749s
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
 real1m37.382s
 user7m16.658s
 sys 0m8.697s
 Furthermore, some additional changes were made to the wrapped file generation 
 to make the generated code more ccache friendly (additional deterministic 
 sorting for methods and some usage of an ordered set).  With these in place 
 and the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc 
 and content respectively, and ensuring ccache is installed, subsequent 
 compilation time is reduced again as follows:
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
 enabled):
 real0m43.051s
 user1m10.392s
 sys 0m4.547s
 This was a run in which nothing changed between runs, so a realistic run in 
 which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
 depending on how drastic the change was. If many changes are expected and you 
 want to keep it more cache friendly then using a higher --files would 
 probably work (to an extent), or ideally use --files separate, although it 
 doesn't currently work for me (need to investigate).
 We're mostly utilising the PyLucene build as a test bed since it is 
 repeatable for others, rather than just showing 

[jira] [Updated] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-07-21 Thread Lee Skillen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Skillen updated PYLUCENE-31:


Attachment: (was: feature-parallel-build.patch)

 JCC Parallel/Multiprocess Compilation + Caching
 ---

 Key: PYLUCENE-31
 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
Reporter: Lee Skillen
Priority: Minor
  Labels: build, cache, ccache, distutils, jcc, parallel

 JCC utilises distutils.Extension() in order to build JCC itself and the 
 packages that it generates for Java wrapping - Unfortunately distutils 
 performs its build sequentially and doesn't take advantage of any additional 
 free cores for parallel building.  As discussed on the list this is likely a 
 design decision due to potential issues that may arise when building projects 
 with awkward, cyclic or recursive dependencies.
 These issues shouldn't appear within JCC-based projects because of the 
 generative nature of the build; i.e. all dependencies are resolved and 
 generated prior to building, and the build process itself is about 
 compilation and construction of the wrapper alone, of which the wrapper files 
 are contained to a sequence of flattened compilation units.
 Enabling this requires monkey patching of distutils, which was also discussed 
 on the list as being a potential source of issues, although we feel that the 
 risk is likely lower than the current setuptools patching utilised.  This 
 would be optional functionality that is also only enabled if the 
 monkey-patching succeeds.  Distutils itself is also part of the standard 
 library and might be less susceptible to change than setuptools, and the area 
 of code monkey patched almost hasn't changed since 2002 (see: 
 http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
 In addition to the distutils changes this patch also includes changes to the 
 wrapper class generation to make it more cache friendly, with the target 
 being that no changes in the wrapped code equals no changes in the wrapper 
 code.  So any changes that minimally change the wrapped code mean that with a 
 tool such as ccache the rebuild time would be significantly reduced (almost 
 to a nth, where n is the number of files and only one has changed).
 Obviously the maintainers would have to assess this risk and decide whether 
 they would like to accept the patch or not.  Code has only been tested on 
 Linux with Python 2.7.5 but should gracefully fail and prevent 
 parallelisation if one of the requirements hasn't been met (not on linux, no 
 multiprocessing support, or monkey patching somehow fails).  The change to 
 caching should still benefit everyone regardless.
 Please note that an additional dependency on orderedset has been added to 
 achieve the more deterministic ordering - This may not be desirable (i.e. 
 another package might be desired, such as ordered-set, or the code might be 
 inlined into the package instead), as per maintainer comments.
 --- [following repeated from mailing list] ---
 Performance Statistics :-
 The following are some quick and dirty statistics for building the jcc 
 pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
 upfront) - The JCC files are split using --files 8, and each build is 
 preceded with a make clean:
 Serial (unpatched):
 real5m1.502s
 user5m22.887s
 sys 0m7.749s
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
 real1m37.382s
 user7m16.658s
 sys 0m8.697s
 Furthermore, some additional changes were made to the wrapped file generation 
 to make the generated code more ccache friendly (additional deterministic 
 sorting for methods and some usage of an ordered set).  With these in place 
 and the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc 
 and content respectively, and ensuring ccache is installed, subsequent 
 compilation time is reduced again as follows:
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
 enabled):
 real0m43.051s
 user1m10.392s
 sys 0m4.547s
 This was a run in which nothing changed between runs, so a realistic run in 
 which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
 depending on how drastic the change was. If many changes are expected and you 
 want to keep it more cache friendly then using a higher --files would 
 probably work (to an extent), or ideally use --files separate, although it 
 doesn't currently work for me (need to investigate).
 We're mostly utilising the PyLucene build as a test bed since it is 
 repeatable for others, rather than just showing numbers for own application 
 compilations; we 

[jira] [Updated] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-07-21 Thread Lee Skillen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Skillen updated PYLUCENE-31:


Attachment: feature-parallel-build.patch

Corrected out-of-date patch.

 JCC Parallel/Multiprocess Compilation + Caching
 ---

 Key: PYLUCENE-31
 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
Reporter: Lee Skillen
Priority: Minor
  Labels: build, cache, ccache, distutils, jcc, parallel
 Attachments: feature-parallel-build.patch


 JCC utilises distutils.Extension() in order to build JCC itself and the 
 packages that it generates for Java wrapping - Unfortunately distutils 
 performs its build sequentially and doesn't take advantage of any additional 
 free cores for parallel building.  As discussed on the list this is likely a 
 design decision due to potential issues that may arise when building projects 
 with awkward, cyclic or recursive dependencies.
 These issues shouldn't appear within JCC-based projects because of the 
 generative nature of the build; i.e. all dependencies are resolved and 
 generated prior to building, and the build process itself is about 
 compilation and construction of the wrapper alone, of which the wrapper files 
 are contained to a sequence of flattened compilation units.
 Enabling this requires monkey patching of distutils, which was also discussed 
 on the list as being a potential source of issues, although we feel that the 
 risk is likely lower than the current setuptools patching utilised.  This 
 would be optional functionality that is also only enabled if the 
 monkey-patching succeeds.  Distutils itself is also part of the standard 
 library and might be less susceptible to change than setuptools, and the area 
 of code monkey patched almost hasn't changed since 2002 (see: 
 http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
 In addition to the distutils changes this patch also includes changes to the 
 wrapper class generation to make it more cache friendly, with the target 
 being that no changes in the wrapped code equals no changes in the wrapper 
 code.  So any changes that minimally change the wrapped code mean that with a 
 tool such as ccache the rebuild time would be significantly reduced (almost 
 to a nth, where n is the number of files and only one has changed).
 Obviously the maintainers would have to assess this risk and decide whether 
 they would like to accept the patch or not.  Code has only been tested on 
 Linux with Python 2.7.5 but should gracefully fail and prevent 
 parallelisation if one of the requirements hasn't been met (not on linux, no 
 multiprocessing support, or monkey patching somehow fails).  The change to 
 caching should still benefit everyone regardless.
 Please note that an additional dependency on orderedset has been added to 
 achieve the more deterministic ordering - This may not be desirable (i.e. 
 another package might be desired, such as ordered-set, or the code might be 
 inlined into the package instead), as per maintainer comments.
 --- [following repeated from mailing list] ---
 Performance Statistics :-
 The following are some quick and dirty statistics for building the jcc 
 pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
 upfront) - The JCC files are split using --files 8, and each build is 
 preceded with a make clean:
 Serial (unpatched):
 real5m1.502s
 user5m22.887s
 sys 0m7.749s
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
 real1m37.382s
 user7m16.658s
 sys 0m8.697s
 Furthermore, some additional changes were made to the wrapped file generation 
 to make the generated code more ccache friendly (additional deterministic 
 sorting for methods and some usage of an ordered set).  With these in place 
 and the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc 
 and content respectively, and ensuring ccache is installed, subsequent 
 compilation time is reduced again as follows:
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
 enabled):
 real0m43.051s
 user1m10.392s
 sys 0m4.547s
 This was a run in which nothing changed between runs, so a realistic run in 
 which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
 depending on how drastic the change was. If many changes are expected and you 
 want to keep it more cache friendly then using a higher --files would 
 probably work (to an extent), or ideally use --files separate, although it 
 doesn't currently work for me (need to investigate).
 We're mostly utilising the PyLucene build as a test bed since it is 
 repeatable for others, 

[jira] [Updated] (PYLUCENE-30) JCC: Through-Layer Python Exception Support

2014-07-14 Thread Lee Skillen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PYLUCENE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Skillen updated PYLUCENE-30:


Attachment: feature-thru-exception-3.patch

 JCC: Through-Layer Python Exception Support
 ---

 Key: PYLUCENE-30
 URL: https://issues.apache.org/jira/browse/PYLUCENE-30
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
 JCC version 2.20 (svn trunk)
Reporter: Lee Skillen
  Labels: exception, jcc, python
 Attachments: feature-thru-exception-3.patch


 Add the capability to throw and re-capture the original Python exception when 
 thrown from the PythonVM layer (e.g. in an extension), passed through the 
 JavaVM, and re-caught within the host PythonVM.  Informally entitled as 
 through-layer python exception support.
 Work between myself and Andi Vajda has been conducted to add support for 
 this, with the original patch being submitted on the mailing list on Friday, 
 4th July 2014 - The latest patch which incorporates suggested code by Andi 
 was posted to the list on Thursday, 10th July (this patch will also be 
 attached to this issue).
 See: JCC Project Extensions email thread on the mailing list for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PYLUCENE-30) JCC: Through-Layer Python Exception Support

2014-07-14 Thread Lee Skillen (JIRA)
Lee Skillen created PYLUCENE-30:
---

 Summary: JCC: Through-Layer Python Exception Support
 Key: PYLUCENE-30
 URL: https://issues.apache.org/jira/browse/PYLUCENE-30
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
JCC version 2.20 (svn trunk)
Reporter: Lee Skillen
 Attachments: feature-thru-exception-3.patch

Add the capability to throw and re-capture the original Python exception when 
thrown from the PythonVM layer (e.g. in an extension), passed through the 
JavaVM, and re-caught within the host PythonVM.  Informally entitled as 
through-layer python exception support.

Work between myself and Andi Vajda has been conducted to add support for this, 
with the original patch being submitted on the mailing list on Friday, 4th July 
2014 - The latest patch which incorporates suggested code by Andi was posted to 
the list on Thursday, 10th July (this patch will also be attached to this 
issue).

See: JCC Project Extensions email thread on the mailing list for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PYLUCENE-30) JCC: Through-Layer Python Exception Support

2014-07-14 Thread Lee Skillen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PYLUCENE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Skillen updated PYLUCENE-30:


Attachment: jccthrutest.tgz

Separate test simulation of the through-layer exception problem (not really 
needed now that a test case was added to the patch, but might still be useful 
since it doesn't require pylucene building).

 JCC: Through-Layer Python Exception Support
 ---

 Key: PYLUCENE-30
 URL: https://issues.apache.org/jira/browse/PYLUCENE-30
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
 JCC version 2.20 (svn trunk)
Reporter: Lee Skillen
  Labels: exception, jcc, python
 Attachments: feature-thru-exception-3.patch, jccthrutest.tgz


 Add the capability to throw and re-capture the original Python exception when 
 thrown from the PythonVM layer (e.g. in an extension), passed through the 
 JavaVM, and re-caught within the host PythonVM.  Informally entitled as 
 through-layer python exception support.
 Work between myself and Andi Vajda has been conducted to add support for 
 this, with the original patch being submitted on the mailing list on Friday, 
 4th July 2014 - The latest patch which incorporates suggested code by Andi 
 was posted to the list on Thursday, 10th July (this patch will also be 
 attached to this issue).
 See: JCC Project Extensions email thread on the mailing list for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PYLUCENE-30) JCC: Through-Layer Python Exception Support

2014-07-14 Thread Lee Skillen (JIRA)

[ 
https://issues.apache.org/jira/browse/PYLUCENE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060671#comment-14060671
 ] 

Lee Skillen edited comment on PYLUCENE-30 at 7/14/14 2:20 PM:
--

Added jcchrutest.tgz which is a separate test simulation of the through-layer 
exception problem (not really needed now that a test case was added to the 
patch, but might still be useful since it doesn't require pylucene building).


was (Author: lskillen):
Separate test simulation of the through-layer exception problem (not really 
needed now that a test case was added to the patch, but might still be useful 
since it doesn't require pylucene building).

 JCC: Through-Layer Python Exception Support
 ---

 Key: PYLUCENE-30
 URL: https://issues.apache.org/jira/browse/PYLUCENE-30
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
 JCC version 2.20 (svn trunk)
Reporter: Lee Skillen
  Labels: exception, jcc, python
 Attachments: feature-thru-exception-3.patch, jccthrutest.tgz


 Add the capability to throw and re-capture the original Python exception when 
 thrown from the PythonVM layer (e.g. in an extension), passed through the 
 JavaVM, and re-caught within the host PythonVM.  Informally entitled as 
 through-layer python exception support.
 Work between myself and Andi Vajda has been conducted to add support for 
 this, with the original patch being submitted on the mailing list on Friday, 
 4th July 2014 - The latest patch which incorporates suggested code by Andi 
 was posted to the list on Thursday, 10th July (this patch will also be 
 attached to this issue).
 See: JCC Project Extensions email thread on the mailing list for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)