Hi,

Apologies in advance for the long post. It boils down to this: Is there any 
interest from the sage community in participating in the development of a 
python distribution for large-scale distributed memory parallel machines? 
 I'm posting this on behalf of (but not representing) a group of government 
scientists  who are trying to work toward a common python distribution on 
the government systems we use.   The reasons we're doing this are 1) we 
can't trust the system python on many HPC systems if it even exists; 2) Due 
to 1. almost all of us spend too much time building and maintaining our own 
python "stack" based on some mixture of make, cmake, autoconf, and/or the 
sage spkg system; and 3)  our community suffers from the fact that we can't 
always share python modules and scripts on these systems because we're not 
 working from equivalent python environments.

Here's what I think we need:

1) A standard, which specifies a python version, and a list of python 
packages and their dependent packages. This allows for-profit vendors to 
build to our standard.

2) A build system that allows extensive configuration of the entire system 
but with enough granularity that the format of a package is standardized and 
relatively straightforward. On the other hand, the whole system must be 
designed such that it can be built repeatedly from scratch without any 
interactive steps.

3) A testing system that is simple enough that the community can easily 
contribute tests to ensure that the community python is reliable for their 
needs

4) A framework for making this environment extensible without requiring 
forking it and creating yet more distributions

Here's a straw man:

1) Standard:

Python 2.7.2  PLUS:

   - numpy *
   - scipy
   - matplotlib *
   - vtk (python wrappers + C++ libs) *
   - elementtree *
   - ctypes *
   - readline (i.e. a functional readline extension module) *
   - swig
   - mpi4py *
   - petsc4py *
   - pympi
   - nose *
   - pytables *
   - basemap
   - cython *
   - sympy *
   - pycuda
   - pyopencl
   - IPython *
   - wxpython
   - PyQt  *
   - pygtk
   - PyTrilinos
   - virtualenv *
   - Pandas
   - numexpr *
   - pygrib

Note:
*Our group has these in the python stack we build for our PDE solver 
framework (http://proteus.usace.army.mil), which we build on a range of 
machines at 4 major supercomputing centers. 

The main issue I see with 1) is that this is somewhat different from the 
sage package list. We would need many optional sage packages but wouldn't 
need some of the standard sage packages.

2) Build System: 

a. Use cmake* for the top level configuration, storing the part relevant for 
each package in a subdirectory for each package (call it package_name_Config 
e.g. numpyConfig, petsc4pyConfig, ...)

b. store each package as an spkg** that meets sage community standards 
except that spkg-install will rely on information from package_name_Config 
(maybe it would be OK to edit files in package_name_Config located INSIDE 
package_name_version.spkg during the interactive configuration step?)  

c. each package will still get built with it's native built system***

Notes:

*Our group simply uses make instead of cmake, with a top level Makefile 
containing 'editConfig' and 'newConfig' targets that allows you to edit and 
copy existing configurations
**Our group only produces a top level spkg, but I think we could easily 
generate a finer grained set of spkg's for ones that don't already exist
***Our group does this (i.e. we don't rewrite upstream build systems).  I 
think spkg's also use the native build system in most cases, right?

The main issue  with 2. (the build system) is that building on HPC systems 
requires extensive configuration of individual packages: numpy needs to get 
built with the right vendor blas/lapack and potentially the correct, 
non-gcc, optimizing compilers (maybe even a mixture of gcc and some vendor 
fortran). Likewise petsc4py might need to use PETSc libraries installed as 
part of the HPC baseline configuration rather than building the source 
included with this distribution. My impression is that sage very reasonably 
opted to focus on the web notebook and a gnu-based linux environment so the 
spkg system alone doesn't fully meet the needs of the HPC community. We need 
the ability to specify different compilers for different packages and to do 
a range of things from building every dependency to building only python 
wrappers for many dependencies.

3) buildbot + nose and a package_nameTest directory for community supplied 
tests of each package in addition to the packages' own tests. This way users 
only have to add test_NAME.py files to 

4) virtualenv + pip should allow users to extend the python installation 
into a their private environment where they can update and add new packages 
as necessary.  An issue here is that it wouldn't allow a per-user sage 
environment so I'm not sure whether users could also install spkg's or even 
use their modified python environment from sage.

Anyway, I'd be grateful for any input, regardless of whether this project 
seems like a good fit for more formal participation from the sage community.

Thanks,
Chris


-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org

Reply via email to