Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Robert Kern skrev:
 No, I think you're right. Using SIMD to refer to numpy-like
 operations is an abuse of the term not supported by any outside
 community that I am aware of. Everyone else uses SIMD to describe
 hardware instructions, not the application of a single syntactical
 element of a high level language to a non-trivial data structure
 containing lots of atomic data elements.
   
Then you should pick up a book on parallel computing.

It is common to differentiate between four classes of computers: SISD, 
MISD, SIMD, and MIMD machines.

A SISD system is the classical von Neuman machine. A MISD system is a 
pipelined von Neuman machine, for example the x86 processor.

A SIMD system is one that has one CPU dedicated to control, and a large 
collection of subordinate ALUs for computation. Each ALU has a small 
amount of private memory. The IBM Cell processor is the typical SIMD 
machine.

A special class of SIMD machines are the so-called vector machines, of 
which the most famous is the Cray C90. The MMX and SSE instructions in 
Intel Pentium processors are an example of vector instructions. Some 
computer scientists regard vector machines a subtype of MISD systems, 
orthogonal to piplines, because there are no subordinate ALUs with 
private memory.

MIMD systems multiple independent CPUs. MIMD systems comes in two 
categories: shared-memory processors (SMP) and distributed-memory 
machines (also called cluster computers). The dual- and quad-core x86 
processors are shared-memory MIMD machines.

Many people associate the word SIMD with SSE due to Intel marketing. But 
to the extent that vector machines are MISD orthogonal to piplined von 
Neuman machines, SSE cannot be called SIMD.

NumPy is a software simulated vector machine, usually executed on MISD 
hardware. To the extent that vector machines (such as SSE and C90) are 
SIMD, we must call NumPy an object-oriented SIMD library.


S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Matthieu Brucher
 OK, I should have said Object-oriented SIMD API that is implemented
 using hardware SIMD instructions.

 No, I think you're right. Using SIMD to refer to numpy-like
 operations is an abuse of the term not supported by any outside
 community that I am aware of. Everyone else uses SIMD to describe
 hardware instructions, not the application of a single syntactical
 element of a high level language to a non-trivial data structure
 containing lots of atomic data elements.

I agree with Sturla, for instance nVidia GPUs do SIMD computations
with blocs of 16 values at a time, but the hardware behind can't
compute on so much data at a time. It's SIMD from our point of view,
just like Numpy does ;)

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Matthieu Brucher skrev:
 I agree with Sturla, for instance nVidia GPUs do SIMD computations
 with blocs of 16 values at a time, but the hardware behind can't
 compute on so much data at a time. It's SIMD from our point of view,
 just like Numpy does ;)

   
A computer with a CPU and a GPU is a SIMD machine by definition, due to 
the single CPU and the multiple ALUs in the GPU, which are subordinate 
to the CPU. But with modern computers, these classifications becomes a 
bit unclear.

S.M.




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Mathieu Blondel skrev:
 Peter Norvig suggested to merge Numpy into Cython but he didn't
 mention SIMD as the reason (this one is from me). 

I don't know what Norvig said or meant.

However:

There is NumPy support in Cython. Cython has a general syntax applicable 
to any PEP 3118 buffer. (As NumPy is not yet PEP 3118 compliant, NumPy 
arrays are converted to Py_buffer structs behind the scenes.)

Support for optimized vector expressions might be added later. 
Currently, slicing works as with NumPy in Python, producing slice 
objects and invoking NumPy's own code, instead of being converted to 
fast inlined C.

The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, 
replacing the current C source. That might be what Norvig meant if he 
suggested merging NumPy into Cython.


S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Mathieu Blondel
On Thu, Oct 22, 2009 at 5:05 PM, Sturla Molden stu...@molden.no wrote:
 Mathieu Blondel skrev:

 The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k,
 replacing the current C source. That might be what Norvig meant if he
 suggested merging NumPy into Cython.

As I wrote earlier in this thread, I confused Cython and CPython. PN
was suggesting to include Numpy in the CPython  distribution (not
Cython). The reason why was also given earlier.

Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Mathieu Blondel skrev:
 As I wrote earlier in this thread, I confused Cython and CPython. PN
 was suggesting to include Numpy in the CPython  distribution (not
 Cython). The reason why was also given earlier.

   
First, that would currently not be possible, as NumPy does not support 
Py3k. Second, the easiest way to port NumPy to Py3k is Cython, which 
would prevent adoption in the Python standard library. At least they 
have to change their current policy. Also with NumPy in the standard 
library, any modification to NumPy would require a PEP.

But Python should have a PEP 3118 compliant buffer object in the 
standard library, which NumPy could subclass.

S.M.





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Gregor Thalhammer
2009/10/21 Neal Becker ndbeck...@gmail.com

 ...
  I once wrote a module that replaces the built in transcendental
  functions of numpy by optimized versions from Intels vector math
  library. If someone is interested, I can publish it. In my experience it
  was of little use since real world problems are limited by memory
  bandwidth. Therefore extending numexpr with optimized transcendental
  functions was the better solution. Afterwards I discovered that I could
  have saved the effort of the first approach since gcc is able to use
  optimized functions from Intels vector math library or AMD's math core
  library, see the doc's of -mveclibabi. You just need to recompile numpy
  with proper compiler arguments.
 

 I'm interested.  I'd like to try AMD rather than intel, because AMD is
 easier to obtain.  I'm running on intel machine, I hope that doesn't matter
 too much.

 What exactly do I need to do?

I once tried to recompile numpy with AMD's AMCL. Unfortunately I lost the
settings after an upgrade. What I remember: install AMCL, (and read the docs
;-) ), mess with the compiler args (-mveclibabi and related), link with the
AMCL. Then you get faster pow/sin/cos/exp. The transcendental functions of
AMCL also work with Intel processors with the same performance. I did not
try the Intel SVML, which belongs to the Intel compilers.
This is different to the first approach, which is a small wrapper for Intels
VML, put into a python module and which can inject it's ufuncs (via
numpy.set_numeric_ops) into numpy. If you want I can send the package per
private email.


 I see that numpy/site.cfg has an MKL section.  I'm assuming I should not
 touch that, but just mess with gcc flags?

This is for using the lapack provided by Intels MKL. These settings are not
related to the above mentioned compiler options.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Dag Sverre Seljebotn
Robert Kern wrote:
 On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel math...@mblondel.org wrote:
   
 On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote:
 
 Mathieu Blondel skrev:
   
 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.
 
 I think you are confusing SIMD with Intel's MMX/SSE instruction set.
   
 OK, I should have said Object-oriented SIMD API that is implemented
 using hardware SIMD instructions.
 

 No, I think you're right. Using SIMD to refer to numpy-like
 operations is an abuse of the term not supported by any outside
 community that I am aware of. Everyone else uses SIMD to describe
 hardware instructions, not the application of a single syntactical
 element of a high level language to a non-trivial data structure
 containing lots of atomic data elements.
   
BTW, is there any term for this latter concept that's not SIMD or 
vector operation? It would be good to have a word to distinguish this 
concept from both CPU instructions and linear algebra.

(Personally I think describing NumPy as SIMD and use SSE/MMX for CPU 
instructions makes best sense, but I'm happy to yield to conventions...)

Dag Sverre

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Robert Ferrell

On Oct 22, 2009, at 1:35 AM, Sturla Molden wrote:

 Robert Kern skrev:
 No, I think you're right. Using SIMD to refer to numpy-like
 operations is an abuse of the term not supported by any outside
 community that I am aware of. Everyone else uses SIMD to describe
 hardware instructions, not the application of a single syntactical
 element of a high level language to a non-trivial data structure
 containing lots of atomic data elements.

 Then you should pick up a book on parallel computing.

 It is common to differentiate between four classes of computers: SISD,
 MISD, SIMD, and MIMD machines.

 A SISD system is the classical von Neuman machine. A MISD system is a
 pipelined von Neuman machine, for example the x86 processor.

 A SIMD system is one that has one CPU dedicated to control, and a  
 large
 collection of subordinate ALUs for computation. Each ALU has a small
 amount of private memory. The IBM Cell processor is the typical SIMD
 machine.

 A special class of SIMD machines are the so-called vector  
 machines, of
 which the most famous is the Cray C90. The MMX and SSE instructions in
 Intel Pentium processors are an example of vector instructions. Some
 computer scientists regard vector machines a subtype of MISD systems,
 orthogonal to piplines, because there are no subordinate ALUs with
 private memory.

 MIMD systems multiple independent CPUs. MIMD systems comes in two
 categories: shared-memory processors (SMP) and distributed-memory
 machines (also called cluster computers). The dual- and quad-core x86
 processors are shared-memory MIMD machines.

 Many people associate the word SIMD with SSE due to Intel marketing.  
 But
 to the extent that vector machines are MISD orthogonal to piplined von
 Neuman machines, SSE cannot be called SIMD.

 NumPy is a software simulated vector machine, usually executed on MISD
 hardware. To the extent that vector machines (such as SSE and C90) are
 SIMD, we must call NumPy an object-oriented SIMD library.

This is not the terminology I am familiar with.  Calling NumPy an   
object-oriented SIMD library is very confusing for me.  I worked in  
the parallel computer world for a while (back in the dark ages) and  
this terminology would have been confusing to everyone I dealt with.   
I've also read many parallel computing books.  In my experience SIMD  
refers to hardware, not software.  There is no reason that NumPy can't  
be written to run great (get good speed-ups) on an 8-core shared  
memory system.  That would be a MIMD system, and there's nothing about  
it that doesn't fit with the NumPy abstraction.  And, although SIMD  
can be a subset of MIMD, there are things that can be done in NumPy  
that be parallelized on MIMD machines but not on SIMD machines (e.g.  
the NumPy vector type is flexible enough it can store a list of tasks,  
and the operations on that vector can be parallelized easily on a  
shared memory MIMD machine - task parallelism - but not on a SIMD  
machine).

If we say that  NumPy is a software simulated vector machine or an   
object-oriented SIMD library we are pigeonholing NumPy in a way which  
is too limiting and isn't accurate.  As a user it feels to me that  
NumPy is built around various algebra abstractions, many of which map  
well onto vector machine operations.  That means that many of the  
operations are amenable to efficient implementation on SIMD hardware.   
But, IMO, one of the nice features of NumPy is it is built around high- 
level operations, and I would hate to see the project go down a path  
which insists that everything in NumPy be efficient on all SIMD  
hardware.

Of course, I would also love to see implementations which take as much  
advantage of available HW as possible (e.g. exploit SIMD HW if  
available).

That's my $0.02, worth only a couple cents less than that.

-robert

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Robert Kern
On Thu, Oct 22, 2009 at 02:35, Sturla Molden stu...@molden.no wrote:
 Robert Kern skrev:
 No, I think you're right. Using SIMD to refer to numpy-like
 operations is an abuse of the term not supported by any outside
 community that I am aware of. Everyone else uses SIMD to describe
 hardware instructions, not the application of a single syntactical
 element of a high level language to a non-trivial data structure
 containing lots of atomic data elements.

 Then you should pick up a book on parallel computing.

I would be delighted to see a reference to one that refers to a high
level language's API as SIMD. Please point one out to me. It's
certainly not any of the ones I have available to me.

 It is common to differentiate between four classes of computers: SISD,
 MISD, SIMD, and MIMD machines.

 A SISD system is the classical von Neuman machine. A MISD system is a
 pipelined von Neuman machine, for example the x86 processor.

 A SIMD system is one that has one CPU dedicated to control, and a large
 collection of subordinate ALUs for computation. Each ALU has a small
 amount of private memory. The IBM Cell processor is the typical SIMD
 machine.

 A special class of SIMD machines are the so-called vector machines, of
 which the most famous is the Cray C90. The MMX and SSE instructions in
 Intel Pentium processors are an example of vector instructions. Some
 computer scientists regard vector machines a subtype of MISD systems,
 orthogonal to piplines, because there are no subordinate ALUs with
 private memory.

 MIMD systems multiple independent CPUs. MIMD systems comes in two
 categories: shared-memory processors (SMP) and distributed-memory
 machines (also called cluster computers). The dual- and quad-core x86
 processors are shared-memory MIMD machines.

 Many people associate the word SIMD with SSE due to Intel marketing. But
 to the extent that vector machines are MISD orthogonal to piplined von
 Neuman machines, SSE cannot be called SIMD.

That's a fair point, but unrelated to whether or not numpy can be
labeled SIMD. These all refer to hardware.

 NumPy is a software simulated vector machine, usually executed on MISD
 hardware. To the extent that vector machines (such as SSE and C90) are
 SIMD, we must call NumPy an object-oriented SIMD library.

numpy does not simulate anything. It is an object-oriented library.
If numpy could be said to simulate a vector machine, than just about
any object-oriented library that overloads operators could. It creates
a false equivalence between numpy and software that actually does
simulate hardware.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Robert Kern
On Thu, Oct 22, 2009 at 06:20, Dag Sverre Seljebotn
da...@student.matnat.uio.no wrote:
 Robert Kern wrote:
 On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel math...@mblondel.org wrote:

 On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote:

 Mathieu Blondel skrev:

 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.

 I think you are confusing SIMD with Intel's MMX/SSE instruction set.

 OK, I should have said Object-oriented SIMD API that is implemented
 using hardware SIMD instructions.


 No, I think you're right. Using SIMD to refer to numpy-like
 operations is an abuse of the term not supported by any outside
 community that I am aware of. Everyone else uses SIMD to describe
 hardware instructions, not the application of a single syntactical
 element of a high level language to a non-trivial data structure
 containing lots of atomic data elements.

 BTW, is there any term for this latter concept that's not SIMD or
 vector operation? It would be good to have a word to distinguish this
 concept from both CPU instructions and linear algebra.

Of course, vector instruction and vectorized operation sometimes
also refer to the CPU instructions. :-)

I don't think you will get much better than vectorized operation,
though. While it's ambiguous, it has a long history in the high level
language world thanks to Matlab.

 (Personally I think describing NumPy as SIMD and use SSE/MMX for CPU
 instructions makes best sense, but I'm happy to yield to conventions...)

Well, SSE/MMX is also too limiting. Altivec instructions are also in
the same class, and we should be able to use them on PPC platforms.
Regardless of the origin of the term, SIMD is used to refer to all
of these instructions in common practice. Sturla may be right in some
prescriptive sense, but descriptively, he's quite wrong.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Robert Kern skrev:
 I would be delighted to see a reference to one that refers to a high
 level language's API as SIMD. Please point one out to me. It's
 certainly not any of the ones I have available to me.

   
Numerical Receipes in Fortran 90, page 964 and 985-986, describes the 
syntax of Fortran 90 and 95 as SIMD.

Peter Pacheco's book on MPI describes the difference between von Neumann 
machines and vector machines as analogous to the difference between 
Fortran77 and Fortran 90 (with an example from Fortran90 array slicing). 
He is ambigous as to whether vector machines really are SIMD, or more 
related to pipelined von Neumann machines.

Grama et al. Introduction to Parallel Computing describes SIMD as an 
architecture, but it is more or less clear that the mean hardware. 
They do say the Fortran 90 where statement is a primitive used to 
support selective execution on SIMD processors, as conditional execution 
(if statements) are detrimental to performance.

So at least we here have three books claiming that Fortran is a language 
with special primities for SIMD processors.


 That's a fair point, but unrelated to whether or not numpy can be
 labeled SIMD. These all refer to hardware.
   
Actually I don't think the distinction is that important as we are 
taking about Turing machines. Also, a lot of what we call hardware is 
actually implemented  as software on the chip: The most extreme example 
would be Transmeta, which completely software emulated x86 processors. 
The vague distinction between hardware and software is why we get 
patents on software in Europe, although pure software patents are 
prohibited. One can always argue that the program and the computer 
together constitutes a physical device; and circumventing patents by 
moving hardware into software should not be allowed. The distinction 
between hardware and software is not as clear as programmers tend to 
believe.

Another thing is that performance issues for vector machines and vector 
languages (Fortran 90, Matlab, NumPy) are similar. Precisely the same 
situations that makes NumPy and Matlab code slow are detrimental on 
SIMD/vector hardware. That would for example be long for loops with 
conditional if statements. On the other hand, vectorized operations over 
arrays, possibly using where/find masks, are fast. So although NumPy is 
not executed on a vector machine like the Cray C90, it certainly behaves 
like one performance wise.

I'd say that a MIMD machine running NumPy is a Turing machine emulating 
a SIMD/vector machine.

And now I am done with this stupid discussion...


Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Charles R Harris
On Tue, Oct 20, 2009 at 11:44 PM, Mathieu Blondel math...@mblondel.orgwrote:

 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations. You can have a look at the following
 pages for further details:

 http://tirania.org/blog/archive/2008/Nov-03.html (blog post)
 http://go-mono.com/docs/index.aspx?tlin...@n%3amono.simd (API reference)

 It seems to me that such an API would possibly be a great fit in Numpy
 too. It would also be possible to add classes that don't directly map
 to SIMD types. For example, Vector8f can easily be implemented in
 terms of 2 Vector4f. In addition to vectors, additional API may be
 added to support operations on matrices of fixed width or height.

 I search the archives for similar discussions but I only found a
 discussion about memory-alignment so I hope I am not restarting an
 existing discussion here. Memory-alignment is an import related issue
 since non-aligned movs can tank the performance.

 Any thoughts? I don't know the Numpy code base yet but I'm willing to
 help if such an effort is started.


The licenses look all hodge-podge:


   - The C# compiler is dual-licensed under the MIT/X11 license and the GNU
   General Public Licensehttp://www.opensource.org/licenses/gpl-license.html
(*http://www.opensource.org/licenses/gpl-license.html*) (GPL).


   - The tools are released under the terms of the GNU General Public
   License http://www.opensource.org/licenses/gpl-license.html (*
   http://www.opensource.org/licenses/gpl-license.html*) (GPL).


   - The runtime libraries are under the GNU Library GPL
2.0http://www.gnu.org/copyleft/library.html#TOC1
(*http://www.gnu.org/copyleft/library.html#TOC1*) (LGPL 2.0).


   - The class libraries are released under the terms of the MIT
X11http://www.opensource.org/licenses/mit-license.html
(*http://www.opensource.org/licenses/mit-license.html*) license.


   - ASP.NET MVC and ASP.NET AJAX client software are released by Microsoft
   under the open source Microsoft Permissive
Licensehttp://www.opensource.org/licenses/ms-pl.html
(*http://www.opensource.org/licenses/ms-pl.html*).

However, if the good stuff is in the class libraries, that looks OK. But
that still leaves it in C#, no? You could have a looksie to see how it would
fit into, say, Cython. I don't know where it would go in numpy, maybe some
of the vector bits would be suitable for some generalized ufuncs. Apart from
that, I believe ATLAS can already make use of SIMD, but I have no idea how
far it goes in using the full feature set.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Cournapeau
On Wed, Oct 21, 2009 at 6:12 PM, Francesc Alted fal...@pytables.org wrote:
 A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigué:
 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.
 [clip]

 It is important to stress out that all the above operations, except probably
 sqrt, are all memory-bound operations, and that implementing them for numpy
 would not represent a significant improvement at all.


 This is because numpy is a package that works mainly with arrays in an
 element-wise way, and in this scenario, the time to transmit data to CPU
 dominates, by and large, over the time to perform operations.

Is it general, or just for simple operations in numpy and ufunc ? I
remember that for music softwares, SIMD used to matter a lot, even for
simple bus mixing (which is basically a ax+by with a, b scalars and x
y the input arrays).

Do you have any interest in adding SIMD to some core numpy
(transcendental functions). If so, I would try to go back to the
problem of runtime SSE detection and loading of optimized shared
library in a cross-platform way - that's something which should be
done at some point in numpy, and people requiring it would be a good
incentive.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Matthieu Brucher
 Is it general, or just for simple operations in numpy and ufunc ? I
 remember that for music softwares, SIMD used to matter a lot, even for
 simple bus mixing (which is basically a ax+by with a, b scalars and x
 y the input arrays).

Indeed, it shouldn't :| I think the main reason might not be SIMD, but
the additional hypothesis you put on the arrays (aliasing). This way,
todays compilers may not even need the actual SIMD instructions.
I have the same opinion as Francesc, it would only be useful for
operations that need more computations that load/store.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread René Dudfield
On Wed, Oct 21, 2009 at 2:14 PM, Pauli Virtanen pav...@iki.fipav%2...@iki.fi
 wrote:

 Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote:
 [clip]
  Do you have any interest in adding SIMD to some core numpy
  (transcendental functions). If so, I would try to go back to the
  problem of runtime SSE detection and loading of optimized shared
  library in a cross-platform way - that's something which should be done
  at some point in numpy, and people requiring it would be a good
  incentive.
 
  I don't personally have a lot of interest implementing this for numpy.
  But in case anyone does, I find the next library:
 
  http://gruntthepeon.free.fr/ssemath/
 
  very interesting.  Perhaps there could be other (free)
  implementations...

 Optimized transcendental functions could be interesting. For example for
 tanh, call overhead is overcome already for ~30-element arrays.

 Since these are ufuncs, I suppose the SSE implementations could just be
 put in a separate module, which is always compiled. Before importing the
 module, we could simply check from Python side that the CPU supports the
 necessary instructions. If everything is OK, the accelerated
 implementations would then just replace the Numpy routines.

 This type of project could probably also be started outside Numpy, and
 just monkey-patch the Numpy routines on import.

 --
 Pauli Virtanen

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



Anyone seen the corepy numpy gsoc project?
http://numcorepy.blogspot.com/

It implements a number of functions with the corepy runtime assembler.  The
project showed nice simd speedups for numpy.


I've been following the liborc project... which is a runtime assembler that
uses a generic assembly language and supports many different simd assembly
languages (eg SSE, MMX, ARM, Altivec).  It's the replacement for the liboil
library (used in gstreamer etc).
http://code.entropywave.com/projects/orc/


cu!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Ryan May
On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer
gregor.thalham...@gmail.com wrote:
 I once wrote a module that replaces the built in transcendental
 functions of numpy by optimized versions from Intels vector math
 library. If someone is interested, I can publish it. In my experience it
 was of little use since real world problems are limited by memory
 bandwidth. Therefore extending numexpr with optimized transcendental
 functions was the better solution. Afterwards I discovered that I could
 have saved the effort of the first approach since gcc is able to use
 optimized functions from Intels vector math library or AMD's math core
 library, see the doc's of -mveclibabi. You just need to recompile numpy
 with proper compiler arguments.

Do you have a link to the documentation for -mveclibabi?  I can't find
this anywhere and I'm *very* interested.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Ryan May
On Wed, Oct 21, 2009 at 1:23 PM, Ryan May rma...@gmail.com wrote:
 On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer
 gregor.thalham...@gmail.com wrote:
 I once wrote a module that replaces the built in transcendental
 functions of numpy by optimized versions from Intels vector math
 library. If someone is interested, I can publish it. In my experience it
 was of little use since real world problems are limited by memory
 bandwidth. Therefore extending numexpr with optimized transcendental
 functions was the better solution. Afterwards I discovered that I could
 have saved the effort of the first approach since gcc is able to use
 optimized functions from Intels vector math library or AMD's math core
 library, see the doc's of -mveclibabi. You just need to recompile numpy
 with proper compiler arguments.

 Do you have a link to the documentation for -mveclibabi?  I can't find
 this anywhere and I'm *very* interested.

Ah, there it is.  Google doesn't come up with much, but the PDF manual
does have it:
http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc.pdf

(It helps when you don't mis-type your search in the PDF).

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from Norman, Oklahoma, United States
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Neal Becker
...
 I once wrote a module that replaces the built in transcendental
 functions of numpy by optimized versions from Intels vector math
 library. If someone is interested, I can publish it. In my experience it
 was of little use since real world problems are limited by memory
 bandwidth. Therefore extending numexpr with optimized transcendental
 functions was the better solution. Afterwards I discovered that I could
 have saved the effort of the first approach since gcc is able to use
 optimized functions from Intels vector math library or AMD's math core
 library, see the doc's of -mveclibabi. You just need to recompile numpy
 with proper compiler arguments.
 

I'm interested.  I'd like to try AMD rather than intel, because AMD is 
easier to obtain.  I'm running on intel machine, I hope that doesn't matter 
too much.

What exactly do I need to do?

I see that numpy/site.cfg has an MKL section.  I'm assuming I should not 
touch that, but just mess with gcc flags?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Warde-Farley

On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:

 Since these are ufuncs, I suppose the SSE implementations could just  
 be
 put in a separate module, which is always compiled. Before importing  
 the
 module, we could simply check from Python side that the CPU supports  
 the
 necessary instructions. If everything is OK, the accelerated
 implementations would then just replace the Numpy routines.

Am I mistaken or wasn't that sort of the goal of Andrew Friedley's  
CorePy work this summer?

Looking at his slides again, the speedups are rather impressive. I  
wonder if these could be usefully integrated into numpy itself?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Andrew Friedley
sigh; yet another email dropped by the list.

David Warde-Farley wrote:
 On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:
 
 Since these are ufuncs, I suppose the SSE implementations could just  
 be
 put in a separate module, which is always compiled. Before importing  
 the
 module, we could simply check from Python side that the CPU supports  
 the
 necessary instructions. If everything is OK, the accelerated
 implementations would then just replace the Numpy routines.
 
 Am I mistaken or wasn't that sort of the goal of Andrew Friedley's  
 CorePy work this summer?
 
 Looking at his slides again, the speedups are rather impressive. I  
 wonder if these could be usefully integrated into numpy itself?

Yes, my GSoC project is closely related, though I didn't do the CPU 
detection part, that'd be easy to do.  Also I wrote my code specifically 
for 64-bit x86.

I didn't focus so much on the transcendental functions, though they 
wouldn't be too hard to implement.  There's also the possibility to 
provide implementations with differing tradeoffs between accuracy and 
performance.

I think the blog link got posted already, but here's relevant info:

http://numcorepy.blogspot.com
http://www.corepy.org/wiki/index.php?title=CoreFunc

I talked about this in my SciPy talk and up-coming paper, as well.

Also people have just been talking about x86 in this thread -- other 
architectures could be supported too; eg PPC/Altivec or even Cell SPU 
and other accelerators.  I actually wrote a quick/dirty implementation 
of addition and vector normalization ufuncs for Cell SPU recently. Basic 
result is that overall performance is very roughly comparable to a 
similar speed x86 chip, but this is a huge win over just running on the 
extremely slow Cell PPC cores.

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread David Cournapeau
On Wed, Oct 21, 2009 at 10:14 PM, Pauli Virtanen pav...@iki.fi wrote:


 This type of project could probably also be started outside Numpy, and
 just monkey-patch the Numpy routines on import.

I think I would prefer this approach as a first shot. I will look into
adding a small C library + wrapper in python to know which SIMD
instructions are available to numpy. Then people can reuse this for
whatever approach they prefer.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Sturla Molden
Mathieu Blondel skrev:
 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.
I think you are confusing SIMD with Intel's MMX/SSE instruction set.

SIMD means single instruction - multiple data. NumPy is interherently 
an object-oriented SIMD API:

  array1[:] = array2 + array3

is a SIMD instruction by definition.

SIMD instructions in hardware for length-4 vectors are mostly useful for 
3D graphics. But they are not used a lot for that purpose, because GPUs 
are getting common. SSE is mostly for rendering 3D graphics without a 
GPU. There is nothing that prevents NumPy from having a Vector4f dtype, 
that internally stores four float32 and is aligned at 16 byte 
boundaries. But it would not be faster than the current float32 dtype. 
Do you know why?

The reason is that memory access is slow, and computation is fast. 
Modern CPUs are starved. The speed of NumPy is not limited by not using 
MMX/SSE whenever possible. It is limited from having to create and 
delete temporary arrays all the time. You are suggesting to optimize in 
the wrong place. There is a lot that can be done to speed up 
computation: There are optimized BLAS libraries like ATLAS and MKL. 
NumPy uses BLAS for things like matrix multiplication. There are OpenMP 
for better performance on multicores. There are OpenCL and CUDA for 
moving computation from CPUs to GPU. But the main boost you get from 
going from NumPy to hand-written C or Fortran comes from reduced memory use.

 existing discussion here. Memory-alignment is an import related issue
 since non-aligned movs can tank the performance.

   

You can align an ndarray on 16-byte boundary like this:

def aligned_array(N, dtype):
 d = dtype()
 tmp = numpy.zeros(N * d.nbytes + 16, dtype=numpy.uint8)
 address = tmp.__array_interface__['data'][0]
 offset = (16 - address % 16) % 16
 return tmp[offset:offset+N].view(dtype=dtype)


Sturla Molden










___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Mathieu Blondel
On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote:
 Mathieu Blondel skrev:
 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.
 I think you are confusing SIMD with Intel's MMX/SSE instruction set.

OK, I should have said Object-oriented SIMD API that is implemented
using hardware SIMD instructions.

And when an ISA doesn't allow to perform a specific operation in only
one instruction (say the absolute value of the differences), the
operation can be implemented in terms of other instructions.

 SIMD instructions in hardware for length-4 vectors are mostly useful for
 3D graphics. But they are not used a lot for that purpose, because GPUs
 are getting common. SSE is mostly for rendering 3D graphics without a
 GPU. There is nothing that prevents NumPy from having a Vector4f dtype,
 that internally stores four float32 and is aligned at 16 byte
 boundaries. But it would not be faster than the current float32 dtype.
 Do you know why?

Yes I know because this has already been explained in this very thread
by someone before you!


Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Robert Kern
On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel math...@mblondel.org wrote:
 On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden stu...@molden.no wrote:
 Mathieu Blondel skrev:
 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.
 I think you are confusing SIMD with Intel's MMX/SSE instruction set.

 OK, I should have said Object-oriented SIMD API that is implemented
 using hardware SIMD instructions.

No, I think you're right. Using SIMD to refer to numpy-like
operations is an abuse of the term not supported by any outside
community that I am aware of. Everyone else uses SIMD to describe
hardware instructions, not the application of a single syntactical
element of a high level language to a non-trivial data structure
containing lots of atomic data elements.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion