Re: [Numpy-discussion] Matlab vs. Python (Was: Re: [SciPy-Dev] Good-bye, sort of (John Hunter)) (Sturla Molden)
Hi everyone, I've been pretty happy with how Spider went along when I tried it. I use Emacs but I think Spyder is perfectly usable for someone used to Matlab. A few HPC-centric of my reasons (I've shamelessly copy-pasted this because I'm lazy right now): 1. Python is an expressive, full-fledged, general-purpose application language. There is slightly more boilerplate for doing math-related operations (i.e. creating a matrix is not as simple as A=[[1, 2, 3]; [4, 5, 6]] but rather A=numpy.array([[1,2,3],[4,5,6]])). On the other hand, everything other than math can be expressed without causing severe nausea and vomiting to the user. 2. The package-module structure of Python allows me to obtain a significantly less cluttered workspace. 3. Theres built-in support for automatically generating meaningful and useful documentation 4. The wonderful FFI support allows me to easily work with external C code. Writing MEX functions is a total mess, I hate that. On the other hand, its easy for me to integrate Python with a custom-compiled version of UMFPACK or any other solver, and wrappers can be automatically generated with SWIG for a minimal amount of effort. 5. There is a Matlab wrapper called mlabwrap, so legacy code written in Matlab is not lost effort. 6. I can use Emacs for my development rather than choosing between a) working in half-assed environment without code completion or b) working with Matlabs incredibly slow and sloppy user interface on Unix systems. 7. I have built-in support for primitives like linked lists, queues, stacks and tuples 8. I have standards-compliant support for MPI that does not look alien in Python (theres support for that in Matlab too, but it feels like youre coding on Mars in a library written somewhere in the Andromeda galaxy). This is extremely important to me right now because the algorithm were implementing will be presented in a conference in September, and I want to have an implementation that uses standard tools, available to every HPC developer. On the other hand, I will be able to further optimize communication (if this will seem to be required) by using a good distributed-object library like PyRO. Some details about what exactly it is that I am using it for are on my blog at http://zencoding.org/archives/137#more-137, although it's rather sketchy -- I plan to write a slightly more detailed document about my experience with Python for HPC and how it compares against Matlab's PCS and DCS. I've been banging my head against Matlab for a while so I'll gladly write a thing or two on the wiki if you think this sort of use case is relevant. One other observation -- Matlab does have some support for parallel processing via PCS, but you have to pay for that, and it's not too flexible. There's also DCS for distributed computing. Some of my colleagues have been using those and aren't too happy about their flexibility just yet. MathWorks is also not too verbose about what particular parts of Matlab are parallelized and not, so we've been randomly stumbling upon parts that were actually parallelized (e.g. UMFPACK is linked against a multithreaded BLAS) even though we thought they weren't. Their documentation is detailed, but as far as careful details about under-the-hood issues and optimization methods, it's a bad joke. If you need it as a glorified handheld calculator or as a prototyping tool, it's great, but writing full-fledged apps in it is painful. Best regards, Alexandru Lazar, Numerical Modeling Laboratory, Politehnica University of Bucharest ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] (no subject)
Hello everyone, I'm currently planning to use a Python-based infrastructure for our HPC project. I've previously used NumPy and SciPy for basic scientific computing tasks, but performance hasn't been quite an issue for me until now. At the moment I'm not too sure as to what to do next though, and I was hoping that someone with more experience in performance-related issues could point me to a way out of this. The trouble lays in the following piece of code: === w = 2 * math.pi * f M = A - (1j*w*E) n = M.shape[1] B1 = numpy.zeros(n) B2 = numpy.zeros(n) B1[n-2] = 1.0 B2[n-1] = 1.0 - slow part starts here umfpack.numeric(M) x1 = umfpack.solve( um.UMFPACK_A, M, B1, autoTranspose = False) x2 = umfpack.solve( um.UMFPACK_A, M, B2, autoTranspose = False) solution = scipy.array([ [ x1[n-2], x2[n-2] ], [ x1[n-1], x2[n-1] ]]) return solution This isn't really too much -- it's generating a small ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] UMFPACK interface is unexpectedly slow
Hello everyone, First of all, let me apologize for my earlier message; I made the mistake of trying to indent my code using SquirrelMail's horrible interface -- and pressing Tab and Space resulted in sending my (incomplete) e-mail to the list. Cursed be Opera's keyboard shortcuts now :-). I'm currently planning to use a Python-based infrastructure for our HPC project. I've previously used NumPy and SciPy for basic scientific computing tasks, so performance hasn't been quite an issue for me until now. At the moment I'm not too sure as to what to do next though, and I was hoping that someone with more experience in performance-related issues could point me to a way out of this. The trouble lays in the following piece of code: === w = 2 * math.pi * f M = A - (1j*w*E) n = M.shape[1] B1 = numpy.zeros(n) B2 = numpy.zeros(n) B1[n-2] = 1.0 B2[n-1] = 1.0 - slow part starts here umfpack.numeric(M) x1 = umfpack.solve( um.UMFPACK_A, M, B1, autoTranspose = False) x2 = umfpack.solve( um.UMFPACK_A, M, B2, autoTranspose = False) solution = scipy.array([ [ x1[n-2], x2[n-2] ], [ x1[n-1], x2[n-1] ]]) return solution This isn't really too much -- it's generating a system matrix via operations that take little time, as I was expecting. Trouble is, the solve part takes significantly more time than Octave -- about 4 times. I'm using the stock version of UMFPACK in Ubuntu's repository; it's compiled against standard BLAS, so it's fairly slow, but so is Octave -- so the problem isn't there. I'm obviously doing something wrong related to memory management here, because the memory consumption is also rocketing, but I'm not sure what exactly it is that I'm doing wrong. Could you point me towards some relevant documentation describing what I could do in order to improve the performance, or give me some hint related to that? Best regards, Alexandru Lazar ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] UMFPACK interface is unexpectedly slow
I hope I won't get identified as a spam bot :-). While I have not resolved the problem itself, this is an issue that I cannot reproduce on our cluster. I wanted to get back with some actual timings from the real hardware we are going to be using and some details about the matrices, so as not to chase ghosts, but this proved to be a headache saver. It's still baffling because on the cluster I have also used stock packages (albeit from Fedora, which is what our system administrator insists on using) rather than my hand-compiled and optimized GotoBLAS and UMFPACK. It didn't even occur to me to try to reproduce this on another system in the last 4 hours I've been struggling with this, because I assumed that using stock packages was giving me the uniformity I required. It seems I was wrong. Nonetheless, I think it's safe to assume in this case that the problem is not in NumPy or my code, and it would be wiser to bring this up in Ubuntu's trackpad. Thanks for your patience, Alexandru On Thu, July 22, 2010 4:10 am, Ioan-Alexandru Lazar wrote: Hello everyone, First of all, let me apologize for my earlier message; I made the mistake of trying to indent my code using SquirrelMail's horrible interface -- and pressing Tab and Space resulted in sending my (incomplete) e-mail to the list. Cursed be Opera's keyboard shortcuts now :-). I'm currently planning to use a Python-based infrastructure for our HPC project. I've previously used NumPy and SciPy for basic scientific computing tasks, so performance hasn't been quite an issue for me until now. At the moment I'm not too sure as to what to do next though, and I was hoping that someone with more experience in performance-related issues could point me to a way out of this. The trouble lays in the following piece of code: === w = 2 * math.pi * f M = A - (1j*w*E) n = M.shape[1] B1 = numpy.zeros(n) B2 = numpy.zeros(n) B1[n-2] = 1.0 B2[n-1] = 1.0 - slow part starts here umfpack.numeric(M) x1 = umfpack.solve( um.UMFPACK_A, M, B1, autoTranspose = False) x2 = umfpack.solve( um.UMFPACK_A, M, B2, autoTranspose = False) solution = scipy.array([ [ x1[n-2], x2[n-2] ], [ x1[n-1], x2[n-1] ]]) return solution This isn't really too much -- it's generating a system matrix via operations that take little time, as I was expecting. Trouble is, the solve part takes significantly more time than Octave -- about 4 times. I'm using the stock version of UMFPACK in Ubuntu's repository; it's compiled against standard BLAS, so it's fairly slow, but so is Octave -- so the problem isn't there. I'm obviously doing something wrong related to memory management here, because the memory consumption is also rocketing, but I'm not sure what exactly it is that I'm doing wrong. Could you point me towards some relevant documentation describing what I could do in order to improve the performance, or give me some hint related to that? Best regards, Alexandru Lazar ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion