subject:"Re\: \[Numpy\-discussion\] Openmp support $was numpy's future \(1.1 and beyond$\: which direction$s$ \?\)"

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Matthew Brett

Hi, Note that the plug-in idea is just my own idea, it is not something agreed by anyone else. So maybe it won't be done for numpy 1.1, or at all. It depends on the main maintainers of numpy. I'm +3 for the plugin idea - it would have huge benefits for installation and automatic

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Gnata Xavier

David Cournapeau wrote: Gnata Xavier wrote: Ok I will try to see what I can do but it is sure that we do need the plug-in system first (read before the threads in the numpy release). During the devel of 1.1, I will try to find some time to understand where I should put some pragma into

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Joe Harrington

A couple of thoughts on parallelism: 1. Can someone come up with a small set of cases and time them on numpy, IDL, Matlab, and C, using various parallel schemes, for each of a representative set of architectures? We're comparing a benchmark to itself on different architectures, rather than

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Gnata Xavier

A couple of thoughts on parallelism: 1. Can someone come up with a small set of cases and time them on numpy, IDL, Matlab, and C, using various parallel schemes, for each of a representative set of architectures? We're comparing a benchmark to itself on different architectures, rather than

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Matthieu Brucher

It is a real problem in some communities like astronomers and images processing people but the lack of documentation is the first one, that is true. Even in those communities, I think that a lot could be done at a higher level, as what IPython1 does (tasks parallelism). Matthieu -- French

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Robert Kern

On Sat, Mar 22, 2008 at 4:25 PM, Charles R Harris [EMAIL PROTECTED] wrote: On Sat, Mar 22, 2008 at 2:59 PM, Robert Kern [EMAIL PROTECTED] wrote: On Sat, Mar 22, 2008 at 2:04 PM, Charles R Harris [EMAIL PROTECTED] wrote: Maybe it's time to revisit the template subsystem I pulled out of

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Gnata Xavier

Matthieu Brucher wrote: It is a real problem in some communities like astronomers and images processing people but the lack of documentation is the first one, that is true. Even in those communities, I think that a lot could be done at a higher level, as what IPython1

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Charles R Harris

On Mon, Mar 24, 2008 at 10:35 AM, Robert Kern [EMAIL PROTECTED] wrote: On Sat, Mar 22, 2008 at 4:25 PM, Charles R Harris [EMAIL PROTECTED] wrote: On Sat, Mar 22, 2008 at 2:59 PM, Robert Kern [EMAIL PROTECTED] wrote: On Sat, Mar 22, 2008 at 2:04 PM, Charles R Harris [EMAIL

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread David Cournapeau

Matthew Brett wrote: I'm +3 for the plugin idea - it would have huge benefits for installation and automatic optimization. What needs to be done? Who could do it? The main issues are portability, and reliability I think. All OS supported by numpy have more or less a dynamic library loading

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Robert Kern

On Mon, Mar 24, 2008 at 12:12 PM, Gnata Xavier [EMAIL PROTECTED] wrote: Well it is not that easy. We have several numpy code following like this : 1) open an large data file to get a numpy array 2) perform computations on this array (I'm only talking of the numpy part here. scipy is

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-24 Thread Gnata Xavier

Robert Kern wrote: On Mon, Mar 24, 2008 at 12:12 PM, Gnata Xavier [EMAIL PROTECTED] wrote: Well it is not that easy. We have several numpy code following like this : 1) open an large data file to get a numpy array 2) perform computations on this array (I'm only talking of the numpy

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Charles R Harris

On Sat, Mar 22, 2008 at 10:59 PM, David Cournapeau [EMAIL PROTECTED] wrote: Charles R Harris wrote: It looks like memory access is the bottleneck, otherwise running 4 floats through in parallel should go a lot faster. I need to modify the program a bit and see how it works for doubles.

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread David Cournapeau

Charles R Harris wrote: Yep, but I expect the compilers to take care of alignment, say by inserting a few single ops when needed. The other solution would be to have aligned allocators (it won't solve all cases, of course). Because the compilers will never be able to take care of the cases

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Emanuele Olivetti

James Philbin wrote: OK, i've written a simple benchmark which implements an elementwise multiply (A=B*C) in three different ways (standard C, intrinsics, hand coded assembly). On the face of things the results seem to indicate that the vectorization works best on medium sized inputs. If

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread James Philbin

Wow, a much more varied set of results than I was expecting. Could someone who has gcc 4.3 installed compile it with: gcc -msse -O2 -ftree-vectorize -ftree-vectorizer-verbose=5 -S vec_bench.c -o vec_bench.s And attach vec_bench.s and the verbose output from gcc. James

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread David Cournapeau

Gnata Xavier wrote: Hi, I have a very limited knowledge of openmp but please consider this testcase : Honestly, if it was that simple, it would already have been done for a long time. The problem is that your test-case is not even remotely close to how things have to be done in

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Francesc Altet

A Sunday 23 March 2008, Charles R Harris escrigué: gcc --version: gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33) cpu: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Problem size Simple Intrin Inline 100 0.0002ms (100.0%) 0.0001ms ( 68.7%)

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Francesc Altet

A Sunday 23 March 2008, David Cournapeau escrigué: Gnata Xavier wrote: Hi, I have a very limited knowledge of openmp but please consider this testcase : Honestly, if it was that simple, it would already have been done for a long time. The problem is that your test-case is not even

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread David Cournapeau

Francesc Altet wrote: Why not? IMHO, complex operations requiring a great deal of operations per word, like trigonometric, exponential, etc..., are the best candidates to take advantage of several cores or even SSE instructions (not sure whether SSE supports this sort of operations,

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Matthieu Brucher

I find the example of sse rather enlightening: in theory, you should expect a 100-300 % speed increase using sse, but even with pure C code in a controlled manner, on one platform (linux + gcc), with varying, recent CPU, the results are fundamentally different. So what would happen in numpy,

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Gnata Xavier

David Cournapeau wrote: Francesc Altet wrote: Why not? IMHO, complex operations requiring a great deal of operations per word, like trigonometric, exponential, etc..., are the best candidates to take advantage of several cores or even SSE instructions (not sure whether SSE supports

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread David Cournapeau

Gnata Xavier wrote: Well of course my goal was not to say that my simple testcase can be copied/pasted into numpy :) Of ourse it is one of the best case to use openmp. Of course pragma can be more complex than that (you can tell variables that can/cannot be shared for instance). The size

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Matthieu Brucher

If the performances are so bad, ok, forget about itbut it would be sad because the next generation CPU will not be more powerfull, they will only have more that one or two cores on the same chip. I don't think this is the worst that will happen. The worst is what has been seen for

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Scott Ransom

Hi David et al, Very interesting. I thought that the 64-bit gcc's automatically aligned memory on 16-bit (or 32-bit) boundaries. But apparently not. Because running your code certainly made the intrinsic code quite a bit faster. However, another thing that I noticed was that the simple code

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread David Cournapeau

Scott Ransom wrote: Hi David et al, Very interesting. I thought that the 64-bit gcc's automatically aligned memory on 16-bit (or 32-bit) boundaries. Note that I am talking about bytes, not bits. Default alignement depend on many parameters, like the OS, C runtime. For example, on mac os X,

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Charles R Harris

On Sun, Mar 23, 2008 at 6:41 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Sunday 23 March 2008, Charles R Harris escrigué: gcc --version: gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33) cpu: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Problem size Simple

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread James Philbin

OK, i'm really impressed with the improvements in vectorization for gcc 4.3. It really seems like it's able to work with real loops which wasn't the case with 4.1. I think Chuck's right that we should simply special case contiguous data and allow the auto-vectorizer to do the rest. Something like

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Anne Archibald

On 23/03/2008, David Cournapeau [EMAIL PROTECTED] wrote: Gnata Xavier wrote: Hi, I have a very limited knowledge of openmp but please consider this testcase : Honestly, if it was that simple, it would already have been done for a long time. The problem is that your

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Matthieu Brucher

(And I suspect that OpenMP is smart enough to use single threads without locking when multiple threads won't help. Certainly all the information is available to OpenMP to make such decisions.) Unfortunately, I don't think there is such a think. For instance the number of threads used by MKL

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread David Cournapeau

Anne Archibald wrote: Actually, there are a few places where a parallel for would serve to accelerate all ufuncs. There are build issues, yes, though they are mild; Maybe, maybe not. Anyway, I said that I would step in to resolve those issues if someone else does the coding. we would also

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread David Cournapeau

Gnata Xavier wrote: Ok I will try to see what I can do but it is sure that we do need the plug-in system first (read before the threads in the numpy release). During the devel of 1.1, I will try to find some time to understand where I should put some pragma into ufunct using a very

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread David Cournapeau

Matthieu Brucher wrote: Hi, It seems complicated to add OpenMP in the code, I don't think many people have the knowlegde to do this, not mentioning the fact that there are a lotof Python calls in the different functions. Yes, this makes potential optimizations harder, at least for someone

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread James Philbin

Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK libraries to use multiple cores for more complex calculations. In particular, doing some basic loop unrolling and SSE versions of the ufuncs would be beneficial. I

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Neal Becker

James Philbin wrote: Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK libraries to use multiple cores for more complex calculations. In particular, doing some basic loop unrolling and SSE versions of the

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread James Philbin

gcc keeps advancing autovectorization. Is manual vectorization worth the trouble? Well, the way that the ufuncs are written at the moment, -ftree-vectorize will never kick in due to the non-constant strides. To get this to work, one has to special case out unary strides. Even with constant

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 11:43 AM, Neal Becker [EMAIL PROTECTED] wrote: James Philbin wrote: Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK libraries to use multiple cores for more complex calculations. In

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 12:01 PM, James Philbin [EMAIL PROTECTED] wrote: gcc keeps advancing autovectorization. Is manual vectorization worth the trouble? Well, the way that the ufuncs are written at the moment, -ftree-vectorize will never kick in due to the non-constant strides. To

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Travis E. Oliphant

James Philbin wrote: Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK libraries to use multiple cores for more complex calculations. In particular, doing some basic loop unrolling and SSE versions of the ufuncs

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Travis E. Oliphant

Charles R Harris wrote: On Sat, Mar 22, 2008 at 11:43 AM, Neal Becker [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: James Philbin wrote: Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread James Philbin

OK, so a few questions: 1. I'm not familiar with the format of the code generators. Should I pull the special case out of the /** begin repeats or should I do a conditional inside the repeats (how does one do this?). 2. I don't have access to Windows+VisualC, so I will need some help testing for

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Thomas Grill

Am 22.03.2008 um 19:20 schrieb Travis E. Oliphant: I think the thing to do is to special-case the code so that if the strides work for vectorization, then a different bit of code is executed and this current code is used as the final special-case. Something like this would be relatively

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Anne Archibald

On 22/03/2008, Thomas Grill [EMAIL PROTECTED] wrote: I've experimented with branching the ufuncs into different constant strides and aligned/unaligned cases to be able to use SSE using compiler intrinsics. I expected a considerable gain as i was using float32 with stride 1 most of the

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Anne Archibald

On 22/03/2008, Travis E. Oliphant [EMAIL PROTECTED] wrote: James Philbin wrote: Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK libraries to use multiple cores for more complex calculations. In

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread James Philbin

However, profiling revealed that hardly anything was gained because of 1) non-alignment of the vectors this _could_ be handled by shuffled loading of the values though 2) the fact that my application used relatively large vectors that wouldn't fit into the CPU cache, hence the memory

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 12:54 PM, Anne Archibald [EMAIL PROTECTED] wrote: On 22/03/2008, Travis E. Oliphant [EMAIL PROTECTED] wrote: James Philbin wrote: Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Travis E. Oliphant

Anne Archibald wrote: On 22/03/2008, Travis E. Oliphant [EMAIL PROTECTED] wrote: James Philbin wrote: Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK libraries to use multiple cores for more complex

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Robert Kern

On Sat, Mar 22, 2008 at 2:04 PM, Charles R Harris [EMAIL PROTECTED] wrote: Maybe it's time to revisit the template subsystem I pulled out of Django. I am still -lots on using the Django template system. Please, please, please, look at Jinja or another templating package that could be dropped in

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 2:59 PM, Robert Kern [EMAIL PROTECTED] wrote: On Sat, Mar 22, 2008 at 2:04 PM, Charles R Harris [EMAIL PROTECTED] wrote: Maybe it's time to revisit the template subsystem I pulled out of Django. I am still -lots on using the Django template system. Please, please,

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Stéfan van der Walt

On Sat, Mar 22, 2008 at 8:16 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Perhaps we could drum up interest in a Need for Speed Sprint on NumPy sometime over the next few months. I guess we'd all like our computations to complete more quickly, as long as they still give valid results. I

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread James Philbin

OK, i've written a simple benchmark which implements an elementwise multiply (A=B*C) in three different ways (standard C, intrinsics, hand coded assembly). On the face of things the results seem to indicate that the vectorization works best on medium sized inputs. If people could post the results

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Neal Becker

gcc --version gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [EMAIL PROTECTED] ~]$ cat

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 5:03 PM, James Philbin [EMAIL PROTECTED] wrote: OK, i've written a simple benchmark which implements an elementwise multiply (A=B*C) in three different ways (standard C, intrinsics, hand coded assembly). On the face of things the results seem to indicate that the

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Thomas Grill

Hi, here's my results: Intel Core 2 Duo, 2.16GHz, 667MHz bus, 4MB Cache running under OSX 10.5.2 please note that the auto-vectorizer of gcc-4.3 is doing really well gr~~~ - gcc version 4.0.1 (Apple Inc. build 5465) xbook-2:temp thomas$ gcc -msse -O2 vec_bench.c -o

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 5:32 PM, Charles R Harris [EMAIL PROTECTED] wrote: On Sat, Mar 22, 2008 at 5:03 PM, James Philbin [EMAIL PROTECTED] wrote: OK, i've written a simple benchmark which implements an elementwise multiply (A=B*C) in three different ways (standard C, intrinsics, hand

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 6:34 PM, Charles R Harris [EMAIL PROTECTED] wrote: I've attached a double version. Compile with gcc -msse2 -mfpmath=sse -O2 vec_bench_dbl.c -o vec_bench_dbl Chuck #include assert.h #include stdio.h #include stdlib.h #include math.h #include emmintrin.h int sizes[6] =

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Scott Ransom

Here are results under 64-bit linux using gcc-4.3 (which by default turns on the various sse flags). Note that -O3 is significantly better than -O2 for the simple calls: nimrod:~$ cat /proc/cpuinfo | grep model name | head -1 model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Neal Becker

Thomas Grill wrote: Hi, here's my results: Intel Core 2 Duo, 2.16GHz, 667MHz bus, 4MB Cache running under OSX 10.5.2 please note that the auto-vectorizer of gcc-4.3 is doing really well gr~~~ - gcc version 4.0.1 (Apple Inc. build 5465) xbook-2:temp

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Charles R Harris

On Sat, Mar 22, 2008 at 7:35 PM, Scott Ransom [EMAIL PROTECTED] wrote: Here are results under 64-bit linux using gcc-4.3 (which by default turns on the various sse flags). Note that -O3 is significantly better than -O2 for the simple calls: nimrod:~$ cat /proc/cpuinfo | grep model name |

Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread David Cournapeau

Charles R Harris wrote: It looks like memory access is the bottleneck, otherwise running 4 floats through in parallel should go a lot faster. I need to modify the program a bit and see how it works for doubles. I am not sure the benchmark is really meaningful: it does not uses aligned

59 matches

Mail list logo