Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Robert Kern skrev: No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. Then you should pick up a book on parallel computing. It is common to differentiate between four classes of computers: SISD, MISD, SIMD, and MIMD machines. A SISD system is the classical von Neuman machine. A MISD system is a pipelined von Neuman machine, for example the x86 processor. A SIMD system is one that has one CPU dedicated to control, and a large collection of subordinate ALUs for computation. Each ALU has a small amount of private memory. The IBM Cell processor is the typical SIMD machine. A special class of SIMD machines are the so-called vector machines, of which the most famous is the Cray C90. The MMX and SSE instructions in Intel Pentium processors are an example of vector instructions. Some computer scientists regard vector machines a subtype of MISD systems, orthogonal to piplines, because there are no subordinate ALUs with private memory. MIMD systems multiple independent CPUs. MIMD systems comes in two categories: shared-memory processors (SMP) and distributed-memory machines (also called cluster computers). The dual- and quad-core x86 processors are shared-memory MIMD machines. Many people associate the word SIMD with SSE due to Intel marketing. But to the extent that vector machines are MISD orthogonal to piplined von Neuman machines, SSE cannot be called SIMD. NumPy is a software simulated vector machine, usually executed on MISD hardware. To the extent that vector machines (such as SSE and C90) are SIMD, we must call NumPy an object-oriented SIMD library. S.M. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
OK, I should have said Object-oriented SIMD API that is implemented using hardware SIMD instructions. No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. I agree with Sturla, for instance nVidia GPUs do SIMD computations with blocs of 16 values at a time, but the hardware behind can't compute on so much data at a time. It's SIMD from our point of view, just like Numpy does ;) Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Matthieu Brucher skrev: I agree with Sturla, for instance nVidia GPUs do SIMD computations with blocs of 16 values at a time, but the hardware behind can't compute on so much data at a time. It's SIMD from our point of view, just like Numpy does ;) A computer with a CPU and a GPU is a SIMD machine by definition, due to the single CPU and the multiple ALUs in the GPU, which are subordinate to the CPU. But with modern computers, these classifications becomes a bit unclear. S.M. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Mathieu Blondel skrev: Peter Norvig suggested to merge Numpy into Cython but he didn't mention SIMD as the reason (this one is from me). I don't know what Norvig said or meant. However: There is NumPy support in Cython. Cython has a general syntax applicable to any PEP 3118 buffer. (As NumPy is not yet PEP 3118 compliant, NumPy arrays are converted to Py_buffer structs behind the scenes.) Support for optimized vector expressions might be added later. Currently, slicing works as with NumPy in Python, producing slice objects and invoking NumPy's own code, instead of being converted to fast inlined C. The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, replacing the current C source. That might be what Norvig meant if he suggested merging NumPy into Cython. S.M. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Thu, Oct 22, 2009 at 5:05 PM, Sturla Molden stu...@molden.no wrote: Mathieu Blondel skrev: The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, replacing the current C source. That might be what Norvig meant if he suggested merging NumPy into Cython. As I wrote earlier in this thread, I confused Cython and CPython. PN was suggesting to include Numpy in the CPython distribution (not Cython). The reason why was also given earlier. Mathieu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Mathieu Blondel skrev: As I wrote earlier in this thread, I confused Cython and CPython. PN was suggesting to include Numpy in the CPython distribution (not Cython). The reason why was also given earlier. First, that would currently not be possible, as NumPy does not support Py3k. Second, the easiest way to port NumPy to Py3k is Cython, which would prevent adoption in the Python standard library. At least they have to change their current policy. Also with NumPy in the standard library, any modification to NumPy would require a PEP. But Python should have a PEP 3118 compliant buffer object in the standard library, which NumPy could subclass. S.M. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
2009/10/21 Neal Becker ndbeck...@gmail.com ... I once wrote a module that replaces the built in transcendental functions of numpy by optimized versions from Intels vector math library. If someone is interested, I can publish it. In my experience it was of little use since real world problems are limited by memory bandwidth. Therefore extending numexpr with optimized transcendental functions was the better solution. Afterwards I discovered that I could have saved the effort of the first approach since gcc is able to use optimized functions from Intels vector math library or AMD's math core library, see the doc's of -mveclibabi. You just need to recompile numpy with proper compiler arguments. I'm interested. I'd like to try AMD rather than intel, because AMD is easier to obtain. I'm running on intel machine, I hope that doesn't matter too much. What exactly do I need to do? I once tried to recompile numpy with AMD's AMCL. Unfortunately I lost the settings after an upgrade. What I remember: install AMCL, (and read the docs ;-) ), mess with the compiler args (-mveclibabi and related), link with the AMCL. Then you get faster pow/sin/cos/exp. The transcendental functions of AMCL also work with Intel processors with the same performance. I did not try the Intel SVML, which belongs to the Intel compilers. This is different to the first approach, which is a small wrapper for Intels VML, put into a python module and which can inject it's ufuncs (via numpy.set_numeric_ops) into numpy. If you want I can send the package per private email. I see that numpy/site.cfg has an MKL section. I'm assuming I should not touch that, but just mess with gcc flags? This is for using the lapack provided by Intels MKL. These settings are not related to the above mentioned compiler options. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Robert Kern wrote: On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel math...@mblondel.org wrote: On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote: Mathieu Blondel skrev: Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. I think you are confusing SIMD with Intel's MMX/SSE instruction set. OK, I should have said Object-oriented SIMD API that is implemented using hardware SIMD instructions. No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. BTW, is there any term for this latter concept that's not SIMD or vector operation? It would be good to have a word to distinguish this concept from both CPU instructions and linear algebra. (Personally I think describing NumPy as SIMD and use SSE/MMX for CPU instructions makes best sense, but I'm happy to yield to conventions...) Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Oct 22, 2009, at 1:35 AM, Sturla Molden wrote: Robert Kern skrev: No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. Then you should pick up a book on parallel computing. It is common to differentiate between four classes of computers: SISD, MISD, SIMD, and MIMD machines. A SISD system is the classical von Neuman machine. A MISD system is a pipelined von Neuman machine, for example the x86 processor. A SIMD system is one that has one CPU dedicated to control, and a large collection of subordinate ALUs for computation. Each ALU has a small amount of private memory. The IBM Cell processor is the typical SIMD machine. A special class of SIMD machines are the so-called vector machines, of which the most famous is the Cray C90. The MMX and SSE instructions in Intel Pentium processors are an example of vector instructions. Some computer scientists regard vector machines a subtype of MISD systems, orthogonal to piplines, because there are no subordinate ALUs with private memory. MIMD systems multiple independent CPUs. MIMD systems comes in two categories: shared-memory processors (SMP) and distributed-memory machines (also called cluster computers). The dual- and quad-core x86 processors are shared-memory MIMD machines. Many people associate the word SIMD with SSE due to Intel marketing. But to the extent that vector machines are MISD orthogonal to piplined von Neuman machines, SSE cannot be called SIMD. NumPy is a software simulated vector machine, usually executed on MISD hardware. To the extent that vector machines (such as SSE and C90) are SIMD, we must call NumPy an object-oriented SIMD library. This is not the terminology I am familiar with. Calling NumPy an object-oriented SIMD library is very confusing for me. I worked in the parallel computer world for a while (back in the dark ages) and this terminology would have been confusing to everyone I dealt with. I've also read many parallel computing books. In my experience SIMD refers to hardware, not software. There is no reason that NumPy can't be written to run great (get good speed-ups) on an 8-core shared memory system. That would be a MIMD system, and there's nothing about it that doesn't fit with the NumPy abstraction. And, although SIMD can be a subset of MIMD, there are things that can be done in NumPy that be parallelized on MIMD machines but not on SIMD machines (e.g. the NumPy vector type is flexible enough it can store a list of tasks, and the operations on that vector can be parallelized easily on a shared memory MIMD machine - task parallelism - but not on a SIMD machine). If we say that NumPy is a software simulated vector machine or an object-oriented SIMD library we are pigeonholing NumPy in a way which is too limiting and isn't accurate. As a user it feels to me that NumPy is built around various algebra abstractions, many of which map well onto vector machine operations. That means that many of the operations are amenable to efficient implementation on SIMD hardware. But, IMO, one of the nice features of NumPy is it is built around high- level operations, and I would hate to see the project go down a path which insists that everything in NumPy be efficient on all SIMD hardware. Of course, I would also love to see implementations which take as much advantage of available HW as possible (e.g. exploit SIMD HW if available). That's my $0.02, worth only a couple cents less than that. -robert ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Thu, Oct 22, 2009 at 02:35, Sturla Molden stu...@molden.no wrote: Robert Kern skrev: No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. Then you should pick up a book on parallel computing. I would be delighted to see a reference to one that refers to a high level language's API as SIMD. Please point one out to me. It's certainly not any of the ones I have available to me. It is common to differentiate between four classes of computers: SISD, MISD, SIMD, and MIMD machines. A SISD system is the classical von Neuman machine. A MISD system is a pipelined von Neuman machine, for example the x86 processor. A SIMD system is one that has one CPU dedicated to control, and a large collection of subordinate ALUs for computation. Each ALU has a small amount of private memory. The IBM Cell processor is the typical SIMD machine. A special class of SIMD machines are the so-called vector machines, of which the most famous is the Cray C90. The MMX and SSE instructions in Intel Pentium processors are an example of vector instructions. Some computer scientists regard vector machines a subtype of MISD systems, orthogonal to piplines, because there are no subordinate ALUs with private memory. MIMD systems multiple independent CPUs. MIMD systems comes in two categories: shared-memory processors (SMP) and distributed-memory machines (also called cluster computers). The dual- and quad-core x86 processors are shared-memory MIMD machines. Many people associate the word SIMD with SSE due to Intel marketing. But to the extent that vector machines are MISD orthogonal to piplined von Neuman machines, SSE cannot be called SIMD. That's a fair point, but unrelated to whether or not numpy can be labeled SIMD. These all refer to hardware. NumPy is a software simulated vector machine, usually executed on MISD hardware. To the extent that vector machines (such as SSE and C90) are SIMD, we must call NumPy an object-oriented SIMD library. numpy does not simulate anything. It is an object-oriented library. If numpy could be said to simulate a vector machine, than just about any object-oriented library that overloads operators could. It creates a false equivalence between numpy and software that actually does simulate hardware. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Thu, Oct 22, 2009 at 06:20, Dag Sverre Seljebotn da...@student.matnat.uio.no wrote: Robert Kern wrote: On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel math...@mblondel.org wrote: On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote: Mathieu Blondel skrev: Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. I think you are confusing SIMD with Intel's MMX/SSE instruction set. OK, I should have said Object-oriented SIMD API that is implemented using hardware SIMD instructions. No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. BTW, is there any term for this latter concept that's not SIMD or vector operation? It would be good to have a word to distinguish this concept from both CPU instructions and linear algebra. Of course, vector instruction and vectorized operation sometimes also refer to the CPU instructions. :-) I don't think you will get much better than vectorized operation, though. While it's ambiguous, it has a long history in the high level language world thanks to Matlab. (Personally I think describing NumPy as SIMD and use SSE/MMX for CPU instructions makes best sense, but I'm happy to yield to conventions...) Well, SSE/MMX is also too limiting. Altivec instructions are also in the same class, and we should be able to use them on PPC platforms. Regardless of the origin of the term, SIMD is used to refer to all of these instructions in common practice. Sturla may be right in some prescriptive sense, but descriptively, he's quite wrong. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Robert Kern skrev: I would be delighted to see a reference to one that refers to a high level language's API as SIMD. Please point one out to me. It's certainly not any of the ones I have available to me. Numerical Receipes in Fortran 90, page 964 and 985-986, describes the syntax of Fortran 90 and 95 as SIMD. Peter Pacheco's book on MPI describes the difference between von Neumann machines and vector machines as analogous to the difference between Fortran77 and Fortran 90 (with an example from Fortran90 array slicing). He is ambigous as to whether vector machines really are SIMD, or more related to pipelined von Neumann machines. Grama et al. Introduction to Parallel Computing describes SIMD as an architecture, but it is more or less clear that the mean hardware. They do say the Fortran 90 where statement is a primitive used to support selective execution on SIMD processors, as conditional execution (if statements) are detrimental to performance. So at least we here have three books claiming that Fortran is a language with special primities for SIMD processors. That's a fair point, but unrelated to whether or not numpy can be labeled SIMD. These all refer to hardware. Actually I don't think the distinction is that important as we are taking about Turing machines. Also, a lot of what we call hardware is actually implemented as software on the chip: The most extreme example would be Transmeta, which completely software emulated x86 processors. The vague distinction between hardware and software is why we get patents on software in Europe, although pure software patents are prohibited. One can always argue that the program and the computer together constitutes a physical device; and circumventing patents by moving hardware into software should not be allowed. The distinction between hardware and software is not as clear as programmers tend to believe. Another thing is that performance issues for vector machines and vector languages (Fortran 90, Matlab, NumPy) are similar. Precisely the same situations that makes NumPy and Matlab code slow are detrimental on SIMD/vector hardware. That would for example be long for loops with conditional if statements. On the other hand, vectorized operations over arrays, possibly using where/find masks, are fast. So although NumPy is not executed on a vector machine like the Cray C90, it certainly behaves like one performance wise. I'd say that a MIMD machine running NumPy is a Turing machine emulating a SIMD/vector machine. And now I am done with this stupid discussion... Sturla Molden ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Tue, Oct 20, 2009 at 11:44 PM, Mathieu Blondel math...@mblondel.orgwrote: Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. You can have a look at the following pages for further details: http://tirania.org/blog/archive/2008/Nov-03.html (blog post) http://go-mono.com/docs/index.aspx?tlin...@n%3amono.simd (API reference) It seems to me that such an API would possibly be a great fit in Numpy too. It would also be possible to add classes that don't directly map to SIMD types. For example, Vector8f can easily be implemented in terms of 2 Vector4f. In addition to vectors, additional API may be added to support operations on matrices of fixed width or height. I search the archives for similar discussions but I only found a discussion about memory-alignment so I hope I am not restarting an existing discussion here. Memory-alignment is an import related issue since non-aligned movs can tank the performance. Any thoughts? I don't know the Numpy code base yet but I'm willing to help if such an effort is started. The licenses look all hodge-podge: - The C# compiler is dual-licensed under the MIT/X11 license and the GNU General Public Licensehttp://www.opensource.org/licenses/gpl-license.html (*http://www.opensource.org/licenses/gpl-license.html*) (GPL). - The tools are released under the terms of the GNU General Public License http://www.opensource.org/licenses/gpl-license.html (* http://www.opensource.org/licenses/gpl-license.html*) (GPL). - The runtime libraries are under the GNU Library GPL 2.0http://www.gnu.org/copyleft/library.html#TOC1 (*http://www.gnu.org/copyleft/library.html#TOC1*) (LGPL 2.0). - The class libraries are released under the terms of the MIT X11http://www.opensource.org/licenses/mit-license.html (*http://www.opensource.org/licenses/mit-license.html*) license. - ASP.NET MVC and ASP.NET AJAX client software are released by Microsoft under the open source Microsoft Permissive Licensehttp://www.opensource.org/licenses/ms-pl.html (*http://www.opensource.org/licenses/ms-pl.html*). However, if the good stuff is in the class libraries, that looks OK. But that still leaves it in C#, no? You could have a looksie to see how it would fit into, say, Cython. I don't know where it would go in numpy, maybe some of the vector bits would be suitable for some generalized ufuncs. Apart from that, I believe ATLAS can already make use of SIMD, but I have no idea how far it goes in using the full feature set. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Wed, Oct 21, 2009 at 6:12 PM, Francesc Alted fal...@pytables.org wrote: A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigué: Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. [clip] It is important to stress out that all the above operations, except probably sqrt, are all memory-bound operations, and that implementing them for numpy would not represent a significant improvement at all. This is because numpy is a package that works mainly with arrays in an element-wise way, and in this scenario, the time to transmit data to CPU dominates, by and large, over the time to perform operations. Is it general, or just for simple operations in numpy and ufunc ? I remember that for music softwares, SIMD used to matter a lot, even for simple bus mixing (which is basically a ax+by with a, b scalars and x y the input arrays). Do you have any interest in adding SIMD to some core numpy (transcendental functions). If so, I would try to go back to the problem of runtime SSE detection and loading of optimized shared library in a cross-platform way - that's something which should be done at some point in numpy, and people requiring it would be a good incentive. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Is it general, or just for simple operations in numpy and ufunc ? I remember that for music softwares, SIMD used to matter a lot, even for simple bus mixing (which is basically a ax+by with a, b scalars and x y the input arrays). Indeed, it shouldn't :| I think the main reason might not be SIMD, but the additional hypothesis you put on the arrays (aliasing). This way, todays compilers may not even need the actual SIMD instructions. I have the same opinion as Francesc, it would only be useful for operations that need more computations that load/store. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Wed, Oct 21, 2009 at 2:14 PM, Pauli Virtanen pav...@iki.fipav%2...@iki.fi wrote: Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote: [clip] Do you have any interest in adding SIMD to some core numpy (transcendental functions). If so, I would try to go back to the problem of runtime SSE detection and loading of optimized shared library in a cross-platform way - that's something which should be done at some point in numpy, and people requiring it would be a good incentive. I don't personally have a lot of interest implementing this for numpy. But in case anyone does, I find the next library: http://gruntthepeon.free.fr/ssemath/ very interesting. Perhaps there could be other (free) implementations... Optimized transcendental functions could be interesting. For example for tanh, call overhead is overcome already for ~30-element arrays. Since these are ufuncs, I suppose the SSE implementations could just be put in a separate module, which is always compiled. Before importing the module, we could simply check from Python side that the CPU supports the necessary instructions. If everything is OK, the accelerated implementations would then just replace the Numpy routines. This type of project could probably also be started outside Numpy, and just monkey-patch the Numpy routines on import. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Anyone seen the corepy numpy gsoc project? http://numcorepy.blogspot.com/ It implements a number of functions with the corepy runtime assembler. The project showed nice simd speedups for numpy. I've been following the liborc project... which is a runtime assembler that uses a generic assembly language and supports many different simd assembly languages (eg SSE, MMX, ARM, Altivec). It's the replacement for the liboil library (used in gstreamer etc). http://code.entropywave.com/projects/orc/ cu! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer gregor.thalham...@gmail.com wrote: I once wrote a module that replaces the built in transcendental functions of numpy by optimized versions from Intels vector math library. If someone is interested, I can publish it. In my experience it was of little use since real world problems are limited by memory bandwidth. Therefore extending numexpr with optimized transcendental functions was the better solution. Afterwards I discovered that I could have saved the effort of the first approach since gcc is able to use optimized functions from Intels vector math library or AMD's math core library, see the doc's of -mveclibabi. You just need to recompile numpy with proper compiler arguments. Do you have a link to the documentation for -mveclibabi? I can't find this anywhere and I'm *very* interested. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Wed, Oct 21, 2009 at 1:23 PM, Ryan May rma...@gmail.com wrote: On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer gregor.thalham...@gmail.com wrote: I once wrote a module that replaces the built in transcendental functions of numpy by optimized versions from Intels vector math library. If someone is interested, I can publish it. In my experience it was of little use since real world problems are limited by memory bandwidth. Therefore extending numexpr with optimized transcendental functions was the better solution. Afterwards I discovered that I could have saved the effort of the first approach since gcc is able to use optimized functions from Intels vector math library or AMD's math core library, see the doc's of -mveclibabi. You just need to recompile numpy with proper compiler arguments. Do you have a link to the documentation for -mveclibabi? I can't find this anywhere and I'm *very* interested. Ah, there it is. Google doesn't come up with much, but the PDF manual does have it: http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc.pdf (It helps when you don't mis-type your search in the PDF). Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from Norman, Oklahoma, United States ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
... I once wrote a module that replaces the built in transcendental functions of numpy by optimized versions from Intels vector math library. If someone is interested, I can publish it. In my experience it was of little use since real world problems are limited by memory bandwidth. Therefore extending numexpr with optimized transcendental functions was the better solution. Afterwards I discovered that I could have saved the effort of the first approach since gcc is able to use optimized functions from Intels vector math library or AMD's math core library, see the doc's of -mveclibabi. You just need to recompile numpy with proper compiler arguments. I'm interested. I'd like to try AMD rather than intel, because AMD is easier to obtain. I'm running on intel machine, I hope that doesn't matter too much. What exactly do I need to do? I see that numpy/site.cfg has an MKL section. I'm assuming I should not touch that, but just mess with gcc flags? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote: Since these are ufuncs, I suppose the SSE implementations could just be put in a separate module, which is always compiled. Before importing the module, we could simply check from Python side that the CPU supports the necessary instructions. If everything is OK, the accelerated implementations would then just replace the Numpy routines. Am I mistaken or wasn't that sort of the goal of Andrew Friedley's CorePy work this summer? Looking at his slides again, the speedups are rather impressive. I wonder if these could be usefully integrated into numpy itself? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
sigh; yet another email dropped by the list. David Warde-Farley wrote: On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote: Since these are ufuncs, I suppose the SSE implementations could just be put in a separate module, which is always compiled. Before importing the module, we could simply check from Python side that the CPU supports the necessary instructions. If everything is OK, the accelerated implementations would then just replace the Numpy routines. Am I mistaken or wasn't that sort of the goal of Andrew Friedley's CorePy work this summer? Looking at his slides again, the speedups are rather impressive. I wonder if these could be usefully integrated into numpy itself? Yes, my GSoC project is closely related, though I didn't do the CPU detection part, that'd be easy to do. Also I wrote my code specifically for 64-bit x86. I didn't focus so much on the transcendental functions, though they wouldn't be too hard to implement. There's also the possibility to provide implementations with differing tradeoffs between accuracy and performance. I think the blog link got posted already, but here's relevant info: http://numcorepy.blogspot.com http://www.corepy.org/wiki/index.php?title=CoreFunc I talked about this in my SciPy talk and up-coming paper, as well. Also people have just been talking about x86 in this thread -- other architectures could be supported too; eg PPC/Altivec or even Cell SPU and other accelerators. I actually wrote a quick/dirty implementation of addition and vector normalization ufuncs for Cell SPU recently. Basic result is that overall performance is very roughly comparable to a similar speed x86 chip, but this is a huge win over just running on the extremely slow Cell PPC cores. Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Wed, Oct 21, 2009 at 10:14 PM, Pauli Virtanen pav...@iki.fi wrote: This type of project could probably also be started outside Numpy, and just monkey-patch the Numpy routines on import. I think I would prefer this approach as a first shot. I will look into adding a small C library + wrapper in python to know which SIMD instructions are available to numpy. Then people can reuse this for whatever approach they prefer. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Mathieu Blondel skrev: Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. I think you are confusing SIMD with Intel's MMX/SSE instruction set. SIMD means single instruction - multiple data. NumPy is interherently an object-oriented SIMD API: array1[:] = array2 + array3 is a SIMD instruction by definition. SIMD instructions in hardware for length-4 vectors are mostly useful for 3D graphics. But they are not used a lot for that purpose, because GPUs are getting common. SSE is mostly for rendering 3D graphics without a GPU. There is nothing that prevents NumPy from having a Vector4f dtype, that internally stores four float32 and is aligned at 16 byte boundaries. But it would not be faster than the current float32 dtype. Do you know why? The reason is that memory access is slow, and computation is fast. Modern CPUs are starved. The speed of NumPy is not limited by not using MMX/SSE whenever possible. It is limited from having to create and delete temporary arrays all the time. You are suggesting to optimize in the wrong place. There is a lot that can be done to speed up computation: There are optimized BLAS libraries like ATLAS and MKL. NumPy uses BLAS for things like matrix multiplication. There are OpenMP for better performance on multicores. There are OpenCL and CUDA for moving computation from CPUs to GPU. But the main boost you get from going from NumPy to hand-written C or Fortran comes from reduced memory use. existing discussion here. Memory-alignment is an import related issue since non-aligned movs can tank the performance. You can align an ndarray on 16-byte boundary like this: def aligned_array(N, dtype): d = dtype() tmp = numpy.zeros(N * d.nbytes + 16, dtype=numpy.uint8) address = tmp.__array_interface__['data'][0] offset = (16 - address % 16) % 16 return tmp[offset:offset+N].view(dtype=dtype) Sturla Molden ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote: Mathieu Blondel skrev: Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. I think you are confusing SIMD with Intel's MMX/SSE instruction set. OK, I should have said Object-oriented SIMD API that is implemented using hardware SIMD instructions. And when an ISA doesn't allow to perform a specific operation in only one instruction (say the absolute value of the differences), the operation can be implemented in terms of other instructions. SIMD instructions in hardware for length-4 vectors are mostly useful for 3D graphics. But they are not used a lot for that purpose, because GPUs are getting common. SSE is mostly for rendering 3D graphics without a GPU. There is nothing that prevents NumPy from having a Vector4f dtype, that internally stores four float32 and is aligned at 16 byte boundaries. But it would not be faster than the current float32 dtype. Do you know why? Yes I know because this has already been explained in this very thread by someone before you! Mathieu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel math...@mblondel.org wrote: On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote: Mathieu Blondel skrev: Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. I think you are confusing SIMD with Intel's MMX/SSE instruction set. OK, I should have said Object-oriented SIMD API that is implemented using hardware SIMD instructions. No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Objected-oriented SIMD API for Numpy
Hello, About one year ago, a high-level, objected-oriented SIMD API was added to Mono. For example, there is a class Vector4f for vectors of 4 floats and this class implements methods such as basic operators, bitwise operators, comparison operators, min, max, sqrt, shuffle directly using SIMD operations. You can have a look at the following pages for further details: http://tirania.org/blog/archive/2008/Nov-03.html (blog post) http://go-mono.com/docs/index.aspx?tlin...@n%3amono.simd (API reference) It seems to me that such an API would possibly be a great fit in Numpy too. It would also be possible to add classes that don't directly map to SIMD types. For example, Vector8f can easily be implemented in terms of 2 Vector4f. In addition to vectors, additional API may be added to support operations on matrices of fixed width or height. I search the archives for similar discussions but I only found a discussion about memory-alignment so I hope I am not restarting an existing discussion here. Memory-alignment is an import related issue since non-aligned movs can tank the performance. Any thoughts? I don't know the Numpy code base yet but I'm willing to help if such an effort is started. Thanks, Mathieu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion