since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-07 Thread Jed Brown
On Mon, 7 Dec 2009 10:53:10 -0500, Alex Peyser peyser.alex at gmail.com wrote: I've had endless trouble integrating petsc into my system because the object- oriented C is both only partially object oriented while simultaneously making access to the underlying functionality obscure. Okay, I'm

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-07 Thread Chris Kees
This whole thread has been really interesting. I'm interested in Barry's suggestion, and I know a handful of people not on the list who would also be interested. I've got a couple of questions/responses to the MatSetValues, point-wise physics, and support tools issues that were raised.

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-07 Thread Matthew Knepley
On Mon, Dec 7, 2009 at 3:05 PM, Chris Kees christopher.e.kees at usace.army.mil wrote: This whole thread has been really interesting. I'm interested in Barry's suggestion, and I know a handful of people not on the list who would also be interested. I've got a couple of questions/responses

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-07 Thread Barry Smith
On Dec 7, 2009, at 9:53 AM, Alex Peyser wrote: Have y'all considered a different approach? I've had endless trouble integrating petsc into my system because the object- oriented C is both only partially object oriented while simultaneously making access to the underlying functionality

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-06 Thread Jed Brown
On Sat, 5 Dec 2009 16:50:38 -0600, Matthew Knepley knepley at gmail.com wrote: You assign a few threads per element to calculate the FEM integral. You could maintain this unassembled if you only need actions. You can also store it with much less memory as just values at quadrature points.

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Matthew Knepley
On Fri, Dec 4, 2009 at 10:42 PM, Barry Smith bsmith at mcs.anl.gov wrote: Suggestion: 1) Discard PETSc 2) Develop a general Py{CL, CUDA, OpenMP-C} system that dispatches tasks onto GPUs and multi-core systems (generally we would have one python process per compute node and local

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Jed Brown
This is an interesting proposal. Two thoughts: Residual and Jacobian evaluation cannot be written in Python (though it can be prototyped there). After a discretization is chosen, the physics is usually representable as a tiny kernel (Riemann solvers/pointwise operation at quadrature points),

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Matthew Knepley
On Sat, Dec 5, 2009 at 1:06 PM, Jed Brown jed at 59a2.org wrote: This is an interesting proposal. Two thoughts: Residual and Jacobian evaluation cannot be written in Python (though it can be prototyped there). After a discretization is chosen, the physics is usually representable as a tiny

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Jed Brown
On Sat, 5 Dec 2009 13:09:33 -0600, Matthew Knepley knepley at gmail.com wrote: Then kernels are moved to an accelerator. These kernels necessarily involve user code (physics). It's a lot to ask users to maintain two versions of their physics, one which is debuggable and another which is fast

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Brad Aagaard
As someone who has a finite-element code built upon PETSc/Sieve with the top-level code in Python, I am in favor of Barry's approach. As Matt mentions debugging multi-languages is more complex. Unit testing helps solve some of this because tests associated with the low-level code involve only

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Jed Brown
Somehow this drifted off the list, hopefully the deep citations provide sufficient context. On Sat, 5 Dec 2009 13:42:31 -0600, Matthew Knepley knepley at gmail.com wrote: On Sat, Dec 5, 2009 at 1:32 PM, Jed Brown jed at 59a2.org wrote: On Sat, 5 Dec 2009 13:20:20 -0600, Matthew Knepley

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Dima Karpeyev
Cython can accelerate almost any Python code nearly immediately (although it supports a somewhat restricted subset of Python). This is simply due to converting it to equivalent C code that is compiled and runs within CPython. Then, chunks of the Python code can be explicitly typed, which can

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Matthew Knepley
On Sat, Dec 5, 2009 at 2:29 PM, Ethan Coon etc2103 at columbia.edu wrote: I'm a big fan of Barry's approach as well. However, the current state of debugging tools is not up to snuff for this type of model. In using petsc4py regularly, debugging cython and python (user-defined) functions

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Ethan Coon
This is a very interesting issue. Suppose you write the RHSFunction in Python and pass to SNES. Are you saying that pdb cannot stop in that method when you step over SNESSolve() in Python? That would suck. If on the other hand, you passed in C, I can see how you are relegated to obscure

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Jed Brown
On Fri, 4 Dec 2009 22:42:35 -0600, Barry Smith bsmith at mcs.anl.gov wrote: generally we would have one python process per compute node and local parallelism would be done via the low-level kernels to the cores and/or GPUs. I think one MPI process per node is fine for MPI performance on good

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Matthew Knepley
On Sat, Dec 5, 2009 at 3:50 PM, Jed Brown jed at 59a2.org wrote: On Fri, 4 Dec 2009 22:42:35 -0600, Barry Smith bsmith at mcs.anl.gov wrote: generally we would have one python process per compute node and local parallelism would be done via the low-level kernels to the cores and/or GPUs.

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Jed Brown
On Sat, 5 Dec 2009 16:02:38 -0600, Matthew Knepley knepley at gmail.com wrote: I need to understand better. You are asking about the case where we have many GPUs and one CPU? If its always one or two GPUs per CPU I do not see the problem. Barry initially proposed one Python thread per node,

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Matthew Knepley
On Sat, Dec 5, 2009 at 4:25 PM, Jed Brown jed at 59a2.org wrote: On Sat, 5 Dec 2009 16:02:38 -0600, Matthew Knepley knepley at gmail.com wrote: I need to understand better. You are asking about the case where we have many GPUs and one CPU? If its always one or two GPUs per CPU I do not

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-05 Thread Matthew Knepley
On Sat, Dec 5, 2009 at 6:01 PM, Jed Brown jed at 59a2.org wrote: On Sat, 5 Dec 2009 16:50:38 -0600, Matthew Knepley knepley at gmail.com wrote: You assign a few threads per element to calculate the FEM integral. You could maintain this unassembled if you only need actions. You can also

since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++

2009-12-04 Thread Barry Smith
Suggestion: 1) Discard PETSc 2) Develop a general Py{CL, CUDA, OpenMP-C} system that dispatches tasks onto GPUs and multi-core systems (generally we would have one python process per compute node and local parallelism would be done via the low-level kernels to the cores and/or GPUs.)