Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-12 Thread Andrew Straw
Fernando Perez wrote:
 On Wed, Feb 11, 2009 at 6:17 PM, David Cournapeau courn...@gmail.com wrote:
 
 Unfortunately, it does require some work, because hardy uses g77
 instead of gfortran, so the source package has to be different (once
 hardy is done, all the one below would be easy, though). I am not sure
 how to do that with PPA (the doc is not great).
 
 OK, thanks for the info.  This is already very useful.

What exactly is the expected problem and how would I verify that I'm not
getting hit by it?

The context is that I have built your packages on Hardy and got no FTBFS
errors in a clean sbuild machine and was about to upload this to my
lab's repo but saw your email. (Also, I have been using scipy
0.7.0rc2-ish apparently descended from the same Debian source package as
yours for a couple weeks and have noted no problems of late.) nosetest
scipy on an amd64 with lots of installed packages shows:

Ran 3211 tests in 356.572s

FAILED (SKIP=13, errors=9)

Where the errors are all known failures.


The same test on the i386 sbuild shows:

Ran 3211 tests in 312.464s

FAILED (SKIP=16, errors=204)

Most of the failures appear to be from weave/scxx and I suspect have to
do with some package not being installed -- if that's true, it's
probably a packaging bug and not a scipy bug.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Gael Varoquaux
On Thu, Feb 12, 2009 at 12:42:37AM -0600, Robert Kern wrote:
 It is implemented using threads, with Windows native threads on
 Windows. I think Gaël really just meant threads there.

I guess so :). Once you reformulate my remark in proper terms, this is
indeed what comes out.

I guess all what it means is that openMP is implemented using native OS
threads, and is nothing more than an abstraction on them.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-12 Thread David Cournapeau
Andrew Straw wrote:
 Fernando Perez wrote:
   
 On Wed, Feb 11, 2009 at 6:17 PM, David Cournapeau courn...@gmail.com wrote:

 
 Unfortunately, it does require some work, because hardy uses g77
 instead of gfortran, so the source package has to be different (once
 hardy is done, all the one below would be easy, though). I am not sure
 how to do that with PPA (the doc is not great).
   
 OK, thanks for the info.  This is already very useful.
 

 What exactly is the expected problem and how would I verify that I'm not
 getting hit by it?
   

I want to follow as closely as possible the official debian/ubuntu
packages. Ideally, any package produced on the PPA superseded by an
official package (from 1.2.1-1~ppaN to 1.2.1-1) should be identical to
the superseding package. Hardy default fortran ABI is g77, not gfortran,
so I have to use g77 for hardy - and the ppa package, limited to
intrepid, use gfortran (since intrepid ABI is gfortran's).

In your case, you built a package which uses gfortran ABI on Hardy - it
works, but is not really acceptable for an official package - and thus,
when an upgrade from ppa to official happens, you won't get the same
package. In the rpm  world, you can use conditional on distribution
version/type int the spec file (which is the control  + changelog  +
rules in one file), but AFAIK, you can't do that with debian, or at
least have not found the relevant doc.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyArray_SETITEM with object arrays in Cython

2009-02-12 Thread Dag Sverre Seljebotn
Wes McKinney wrote:

 The general problem here is an indexed array (by dates or strings, for 
 example), that you want to conform to a new index. The arrays most of 
 the time contain floats but occasionally PyObjects. For some reason 
 the access and assignment is slow (this function can be faster by a 
 factor of 50 with C API macros, so clearly something is awry)-- let me 
 know if you see anything obviously wrong with this
Thanks for the example; I don't have time this week but I recorded it on 
the Cython trac and will hopefully get back to it soon.

http://trac.cython.org/cython_trac/ticket/209

Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
Hi Brian,

A Thursday 12 February 2009, Brian Granger escrigué:
 Hi,

 This is relevant for anyone who would like to speed up array based
 codes using threads.

 I have a simple loop that I have implemented using Cython:

 def backstep(np.ndarray opti, np.ndarray optf,
  int istart, int iend, double p, double q):
 cdef int j
 cdef double *pi
 cdef double *pf
 pi = double *opti.data
 pf = double *optf.data

 with nogil:
 for j in range(istart, iend):
 pf[j] = (p*pi[j+1] + q*pi[j])

 I need to call this function *many* times and each time cannot be
 performed until the previous time is completely as there are data
 dependencies.  But, I still want to parallelize a single call to this
 function across multiple cores (notice that I am releasing the GIL
 before I do the heavy lifting).

 I want to break my loop range(istart,iend) into pieces and have a
 thread do each piece.  The arrays have sizes 10^3 to 10^5.

 Things I have tried:
[clip]

If your problem is evaluating vector expressions just like the above 
(i.e. without using transcendental functions like sin, exp, etc...), 
usually the bottleneck is on memory access, so using several threads is 
simply not going to help you achieving better performance, but rather 
the contrary (you have to deal with the additional thread overhead).  
So, frankly, I would not waste more time trying to paralelize that.

As an example, in the recent support of VML in numexpr we have disabled 
the use of VML (as well as the OpenMP threading support that comes with 
it) in cases like yours, where only additions and multiplications are 
performed (these operations are very fast in modern processors, and the 
sole bottleneck for this case is the memory bandwidth, as I've said).  
However, in case of expressions containing operations like division or 
transcendental functions, then VML activates automatically, and you can 
make use of several cores if you want.  So, if you are in this case, 
and you have access to Intel MKL (the library that contains VML), you 
may want to give numexpr a try. 

HTH,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-12 Thread Andrew Straw
David Cournapeau wrote:
 Andrew Straw wrote:
   
 Fernando Perez wrote:
   
 
 On Wed, Feb 11, 2009 at 6:17 PM, David Cournapeau courn...@gmail.com 
 wrote:

 
   
 Unfortunately, it does require some work, because hardy uses g77
 instead of gfortran, so the source package has to be different (once
 hardy is done, all the one below would be easy, though). I am not sure
 how to do that with PPA (the doc is not great).
   
 
 OK, thanks for the info.  This is already very useful.
 
   
 What exactly is the expected problem and how would I verify that I'm not
 getting hit by it?
   
 

 I want to follow as closely as possible the official debian/ubuntu
 packages. Ideally, any package produced on the PPA superseded by an
 official package (from 1.2.1-1~ppaN to 1.2.1-1) should be identical to
 the superseding package. Hardy default fortran ABI is g77, not gfortran,
 so I have to use g77 for hardy - and the ppa package, limited to
 intrepid, use gfortran (since intrepid ABI is gfortran's).
   
(Warning: this email is a little over-detailed on the packaging details
front. Believe it or not, I'm not discussing the details of Debian
packaging for fun, but rather my questions have practical importance to
me -- I don't want to break all my lab's scipy installations. :)


This doesn't make sense to me. I built the .deb on an a clean, minimal
sbuild (a chroot with only a very few basic packages installed, somewhat
mimicing Ubuntu's PPA builder). It built from your unmodified .dsc,
which auto-downloads the declared dependencies (and nothing else). It
passes the tests. To be very explicit -- I didn't specify to use g77 at
any point. (As implied by my previous statement of using your unmodified
.dsc, I used only the debian/rules and debian/control in your package.)
To understand your statement about identical, I will operationally
define identical for .debs to mean that they were built from the same
.dsc. Of course, in the case you describe above, it can't be _exactly_
the same .dsc because the version numbers in debian/changelog must
change and consequently so must the checksums and GPG signature in the
.dsc file, and presumably a different person will sign it. Also, there
will be timestamp differences and such for two .debs built from the
exact same .dsc, but we can ignore those. In this case, I don't see why
an official package, which meets this operational definition of
identical, wouldn't work on Hardy, as it would be built from nearly an
identical .dsc on nearly an identical clean build environment. (Of
course, there will never be an official package of this for Hardy, but
that's not the point.)

 In your case, you built a package which uses gfortran ABI on Hardy - it
 works, but is not really acceptable for an official package - and thus,
 when an upgrade from ppa to official happens, you won't get the same
 package.
Why is it not really acceptable? As long as it builds and works and
doesn't break anything, why would Ubuntu maintainers care if it uses the
gfortran ABI?

Also, in practical terms, what upgrade? A) Hardy will not upgrade
python-scipy. It's against policy for a released distribution to upgrade
software without a security reason. B) Imaging for a moment that there
would be an upgrade, why do you think it would break?
  In the rpm  world, you can use conditional on distribution
 version/type int the spec file (which is the control  + changelog  +
 rules in one file), but AFAIK, you can't do that with debian, or at
 least have not found the relevant doc.
   
I don't understand what you're saying. My understanding is that at the
beginning of each distribution (Hardy, Intrepid, Lenny), the maintainers
decide on a C++ (and I guess Fortran, but I'm not sure) ABI and fix the
toolchain to build this ABI. From then on, everything is built with this
ABI. And the point of a signed .dsc file and a clean sbuild/pbuilder is
that any .deb that gets built will be contingent on the files in
debian/* because that's cryptographically signed in the .dsc file. So,
if you trust the archive master and his computer (by trusting his keys
in your apt keyring), you trust that the .deb was built from the .dsc.
An the .dsc is signed by the maintainer. So there's a cryptographic
chain of trust to those control + changelog + rules files.

I'm still not sure if I'm not getting your worry, though...

Thanks,
Andrew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Gregor Thalhammer
Brian Granger schrieb:
 I am curious: would you know what would be different in numpy's case
 compared to matlab array model concerning locks ? Matlab, up to
 recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3
 (or 7.4), it also uses multicore for mathematical functions (cos,
 etc...). So at least in matlab's model, it looks like it can be useful.
 

 Good point.  Is it possible to tell what array size it switches over
 to using multiple threads?  Also, do you happen to iknow how Matlab is
 doing this?

   
Recent Matlab versions use Intels Math Kernel Library, which performs 
automatic multi-threading - also for mathematical functions like sin 
etc, but not for  addition, multiplication etc. It seems to me Matlab 
itself does not take care of multi-threading. On
http://www.intel.com/software/products/mkl/data/vml/functions/_listfunc.html
you can have a look at the performance data of the MKL vectorized math 
functions. Around a vector length between 100-1000, depending on which 
function, precision, cpu architecture, they switch to multi-threading.

Gregor
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué:
 A quick digression:

 It would be interesting to see how a spec would look for integrating
 OpenMP natively into Cython for these kinds of purposes. Cython is
 still flexible as a language after all.

That would be really nice indeed.

 Avoiding language bloat is 
 also important, but it is difficult to know what kind of balance can
 be struck before some kind of spec is worked out. Has anyone managed
 to use OpenMP with Cython code in a nice way already?

 I couldn't do any work on it right now but it could sit in the
 pipeline for the year to come. Also I have a strong feeling that
 making the spec and language design issues would take more time than
 the actual implementation of it anyway.

I tend to aggree with you.  As a matter of fact, doing (efficient) 
parallelism is a very hairy thing and understanding all (or just most 
of them) of its issues maybe the real problem for doing the port 
(unless one can find some direct way to translate OpenMP directives 
from Cython code to the C code, which would be wonderful).

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-12 Thread David Cournapeau
Andrew Straw wrote:
 (Warning: this email is a little over-detailed on the packaging details
 front. Believe it or not, I'm not discussing the details of Debian
 packaging for fun, but rather my questions have practical importance to
 me -- I don't want to break all my lab's scipy installations. :)

Well, the devil is in the details, specially for packaging :)



 This doesn't make sense to me. I built the .deb on an a clean, minimal
 sbuild (a chroot with only a very few basic packages installed, somewhat
 mimicing Ubuntu's PPA builder). It built from your unmodified .dsc,
 which auto-downloads the declared dependencies (and nothing else). It
 passes the tests. To be very explicit -- I didn't specify to use g77 at
 any point. (As implied by my previous statement of using your unmodified
 .dsc, I used only the debian/rules and debian/control in your package.)

Yes: your package will be almost the same as mine. It will use gfortran,
and not g77. There is nothing wrong with that per-se, but I don't think
it is a good practice in general. Here is an example which can break
things, to show that's not just red-herring:
- install numpy from the PPA (gfortran ABI), pull gfortran as well
- later, g77 is installed (as a dependency, whatever)
- now, user build his own python extension with fortran extension,
or for example scipy - uses g77, does not work anymore.

My main rationale to provide PPA is to avoid the never-ending queue of
emails about missing symbols, etc... because I am tired of it, because
it gives a bad mouth taste to users, because I would like to deal with
more interesting issues.

 To understand your statement about identical, I will operationally
 define identical for .debs to mean that they were built from the same
 .dsc.

I has an even larger definition of identical: same control/rules/patches
is identical for me.

 In this case, I don't see why
 an official package, which meets this operational definition of
 identical, wouldn't work on Hardy, as it would be built from nearly an
 identical .dsc on nearly an identical clean build environment. (Of
 course, there will never be an official package of this for Hardy, but
 that's not the point.)

Because a package has to be well integrated with the rest of the system
- I think my example above shows the problem of using a fortran compiler
which has a different ABI than the default one.

 Why is it not really acceptable? As long as it builds and works and
 doesn't break anything, why would Ubuntu maintainers care if it uses the
 gfortran ABI?

The problem is that it is impossible to avoid breaking anything without
using the g77 ABI on hardy. I don't understand it is ok not to follow
the ABI as long as it does not break anything : if there was no need to
follow an ABI to avoid breaking anything, then there would be no point
for an ABI in the first place :)


 Also, in practical terms, what upgrade? A) Hardy will not upgrade
 python-scipy. It's against policy for a released distribution to upgrade
 software without a security reason.

First, concerning the upgrade, I am just following the Ubuntu guide. To
quote it:


Ubuntu package names are suffixed by the version number of the package.
This allows Ubuntu to distinguish newer packages from older ones and so
remain up to date.

If you're creating an alternative version of a package already available
in Ubuntu's repositories, you should ensure that:

* your package supersedes the official Ubuntu version
* future Ubuntu versions will supersede your package.


As an example, I can think of backports.

  B) Imaging for a moment that there
 would be an upgrade, why do you think it would break?

Well, if the upgraded package use the g77 ABI, and the old one the
gfortran ABI, this is quite likely to break some extensions built
against numpy/scipy I believe.

  In the rpm  world, you can use conditional on distribution
 version/type int the spec file (which is the control  + changelog  +
 rules in one file), but AFAIK, you can't do that with debian, or at
 least have not found the relevant doc.
   
 I don't understand what you're saying.

One solution would be able to say in the control file: for hardy, do
this, for intrepid, do that. For example, with rpm packaging, I can do
the following:

%if 0%{?fedora_version} || 0%{?rhel_version} || 0%{?centos_version} 
BuildRequires: gcc-gfortran python python-devel lapack3-devel  3.1
refblas3-devel
requires:   gcc-gfortran lapack3  3.1 refblas3
%endif

%if 0%{?suse_version}
BuildRequires:  gcc-fortran python python-devel lapack3-devel  3.1
refblas3-devel
requires:   gcc-fortran lapack3  3.1 refblas3
%endif

Notice the different name for the dependency (not that this is a problem
with Ubuntu/Debian, but I would still need different packages).

  And the point of a signed .dsc file and a clean sbuild/pbuilder is
 that any .deb that gets built will be contingent on the files in
 debian/* because that's cryptographically signed in the .dsc file. 

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Francesc Alted wrote:
 A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué:
   
 A quick digression:

 It would be interesting to see how a spec would look for integrating
 OpenMP natively into Cython for these kinds of purposes. Cython is
 still flexible as a language after all.
 

 That would be really nice indeed.

   
 Avoiding language bloat is 
 also important, but it is difficult to know what kind of balance can
 be struck before some kind of spec is worked out. Has anyone managed
 to use OpenMP with Cython code in a nice way already?

 I couldn't do any work on it right now but it could sit in the
 pipeline for the year to come. Also I have a strong feeling that
 making the spec and language design issues would take more time than
 the actual implementation of it anyway.
 

 I tend to aggree with you.  As a matter of fact, doing (efficient) 
 parallelism is a very hairy thing and understanding all (or just most 
 of them) of its issues maybe the real problem for doing the port 
 (unless one can find some direct way to translate OpenMP directives 
 from Cython code to the C code, which would be wonderful).
   
This is what I'm thinking. Example: This in Cython:

with cython.openmp.parallell(locals=a,b): # implies nogil?
   for i in cython.openmp.range(0, 10):
  ...

could simply translate into emitting the right #pragmas in generated C 
at the right locations.

FYI, I am one of the core Cython developers and can make such 
modifications in Cython itself as long as there's consensus on how it 
should look on the Cython mailing list.  My problem is that I don't 
really know OpenMP and have little experience with it, so I'm not the 
best person for creating a draft for how such high-level OpenMP 
constructs should look like in Cython.

Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Gregor Thalhammer wrote:
 Recent Matlab versions use Intels Math Kernel Library, which performs 
 automatic multi-threading - also for mathematical functions like sin 
 etc, but not for  addition, multiplication etc.

It does if you have access to the parallel toolbox I mentioned earlier
in this thread (again, no experience with it, but I think it is
specially popular on clusters; in that case, though, it is not limited
to thread-based implementation).

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-12 Thread Andrew Straw
OK, I think you're concerned about compatibility of Python extensions
using fortran. We don't use any (that I know of), so I'm going to stop
worrying about this and upload .debs from your .dsc (or very close) to
my repository...

...except for one last question: If Hardy uses the g77 ABI but I'm
building scipy with gfortran, shouldn't there be an ABI issue with my
ATLAS? Shouldn't I get lots of test failures with scipy? I don't.

David Cournapeau wrote:
 My main rationale to provide PPA is to avoid the never-ending queue of
 emails about missing symbols, etc... because I am tired of it, because
 it gives a bad mouth taste to users, because I would like to deal with
 more interesting issues.
   
Well, I appreciate you doing that. Packaging is a thankless job... and
when everything works on your own computer, it's hard to work up the
motivation to make it work on others'.
 To understand your statement about identical, I will operationally
 define identical for .debs to mean that they were built from the same
 .dsc.
 

 I has an even larger definition of identical: same control/rules/patches
 is identical for me.
   
This is the only other point I want to make: if you're building from the
same .dsc, it means you're using the same control/rules/patches. (That's
why I brought up the checksums and the signatures.)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages

2009-02-12 Thread David Cournapeau
Andrew Straw wrote:
 OK, I think you're concerned about compatibility of Python extensions
 using fortran. We don't use any (that I know of), so I'm going to stop
 worrying about this and upload .debs from your .dsc (or very close) to
 my repository...

 ...except for one last question: If Hardy uses the g77 ABI but I'm
 building scipy with gfortran, shouldn't there be an ABI issue with my
 ATLAS? Shouldn't I get lots of test failures with scipy? I don't.
   

Not for hardy, because Hardy has the transitional packages (e.g. it has
both gfortran and g77 ABI packages: libatlas 3g* vs atlas3*). But this
wouldn't work for earlier distributions.

 Well, I appreciate you doing that. Packaging is a thankless job...

Reducing the number of support emails is enough for me as an incentive :)

 This is the only other point I want to make: if you're building from the
 same .dsc, it means you're using the same control/rules/patches. (That's
 why I brought up the checksums and the signatures.)
   

Yes, the problem is not how to avoid different packages from the .dsc,
but how to conditionally defines some sections in the control file and
rule files,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numexpr and numpy windows binaries built with MKL

2009-02-12 Thread Gregor Thalhammer
Hi all,

as Francesc announced, the latest release of Numexpr 1.2 can be built 
with Intels Math Kernel Library, which gives a BIGbig increase in 
performance. Now the questions: Could somebody provide binaries for 
Windows of Numexpr, linked with Intels MKL? I know, there is the license 
problem. The same question has been discussed for numpy. Is it still 
true that Enthought is considering to provide binaries based on MKL, see 
https://svn.enthought.com/epd/ticket/216  ? 
https://svn.enthought.com/epd/ticket/216

Gregor

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 7:15 AM, David Cournapeau wrote:

 Since openmp also exists on windows, I doubt that it is required that
 openmp uses pthread :)

On Windows, MSVC uses Win32 threads and GCC (Cygwin and MinGW) uses 
pthreads. If you use OpenMP with MinGW, the executable becomes dependent 
on pthreadGC2.dll (an LGPL library from Redhat).

S.M.



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden

On 2/12/2009 11:30 AM, Dag Sverre Seljebotn wrote:

It would be interesting to see how a spec would look for integrating 
OpenMP natively into Cython for these kinds of purposes. Cython is still 
flexible as a language after all. Avoiding language bloat is also 
important, but it is difficult to know what kind of balance can be 
struck before some kind of spec is worked out. Has anyone managed to use 
OpenMP with Cython code in a nice way already?


Here is an example of SciPy's ckdtree.pyx modified to use OpenMP. It is 
basically twice as fast on my dual-core laptop. But as Cython mangles 
the names in the C code, I had to put the OpenMP part in a separate C file.


OpenMP does not need to be a aprt of the Cython language. It can be 
special comments in the code as in Fortran. After all, #pragma omp 
parallel is a comment in Cython.



Sturla Molden


/*  
 *  Parallel query for faster kd-tree searches on SMP computers.
 *  This function will release the GIL around the lengthy query 
loop and 
 *  use OpenMP for multithreading.
 *
 *  OpenMP is supported on GCC (since version 4.2) and MS Visual 
C++ 2005 
 *  and later. Mingw builds of GCC supports OpenMP through a pthread
 *  layer (pthreadGC2.dll). 
 *
 *  Compile with -fopenmp and link -lgomp -lpthread on GCC.
 *
 *  Written by Sturla Molden, Univ. Oslo, Jan. 24 2009  
 *
 *
 *  This code is released to public domain.
 *
 */


#include Python.h
#include omp.h

typedef void (*queryfun_t)(void *self, double *result_distances,
int*result_indices, double*x, int k, double eps, double p, double 
distance_upper_bound);

void parallel_query(queryfun_t query, int nthreads, int niter, void *self, 
double*result_distances, 
int *result_indices, double*x, int k, int m, double eps, double p, 
double distance_upper_bound)
{
int c;

/* allow the number of threads to be specified */
if (nthreads == 0) {
for (c=0; cniter; c++) {
(*query)(self, result_distances + c*k, result_indices + 
c*k, x + c*m, k, 
  eps, p, distance_upper_bound);
}
} else {
/* parallel queries with GIL released */
Py_BEGIN_ALLOW_THREADS {
#pragma omp parallel for private(c) schedule(guided) 
num_threads(nthreads)
for (c=0; cniter; c++) {
(*query)(self, result_distances + c*k, 
result_indices + c*k, x + c*m, k, 
   eps, p, distance_upper_bound);
}
} Py_END_ALLOW_THREADS
} 
}













































ckdtree.pyx
Description: /
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué:
 FYI, I am one of the core Cython developers and can make such
 modifications in Cython itself as long as there's consensus on how it
 should look on the Cython mailing list.  My problem is that I don't
 really know OpenMP and have little experience with it, so I'm not the
 best person for creating a draft for how such high-level OpenMP
 constructs should look like in Cython.

I don't know OpenMP enough neither, but I'd say that in this list there 
could be some people that could help.

At any rate, I really like the OpenMP approach and prefer to have 
support for it in Cython much better than threading, MPI or whatever.  
But the thing is: is OpenMP stable, mature enough for allow using it in 
most of common platforms?  I think that recent GCC compilers support 
the latest incarnation (3.0) of the standard, but IIRC, MSVC 2008 only 
supports OpenMP 2.5 specification.  I'm not sure how this would affect 
a possible implementation in Cython, tough.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
A Thursday 12 February 2009, Sturla Molden escrigué:
 OpenMP does not need to be a aprt of the Cython language. It can be
 special comments in the code as in Fortran. After all, #pragma omp
 parallel is a comment in Cython.

Hey!  That's very nice to know.  We already have OpenMP support in 
Cython for free (or apparently it seems so :-)

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 12:20 PM, David Cournapeau wrote:

 It does if you have access to the parallel toolbox I mentioned earlier
 in this thread (again, no experience with it, but I think it is
 specially popular on clusters; in that case, though, it is not limited
 to thread-based implementation).

As has been mentioned, Matlab is a safer language for parallel computing 
  as arrays are immutable. There is almost no need for synchronization 
of any sort, except barriers.

Maybe it is time to implement an immutable ndarray subclass?

With immutable arrays we can also avoid making temporary arrays in 
expressions like y = a*b + c. y just gets an expression and three 
immutable buffers. And then numexpr (or something like it) can take care 
of the rest.

As for Matlab, I have noticed that they are experimenting with CUDA now, 
to use nvidia's processors for hardware-acceleration. As even modest 
GPUs can yield hundreds of gigaflops, that is going to be hard to match 
(unless we make an ndarray that uses the GPU). But again, as the 
performance of GPUs comes from massive multithreading, immutability may 
be the key here as well.

Sturla Molden









___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Sturla Molden wrote:
 On 2/12/2009 12:20 PM, David Cournapeau wrote:

   
 It does if you have access to the parallel toolbox I mentioned earlier
 in this thread (again, no experience with it, but I think it is
 specially popular on clusters; in that case, though, it is not limited
 to thread-based implementation).
 

 As has been mentioned, Matlab is a safer language for parallel computing 
   as arrays are immutable. There is almost no need for synchronization 
 of any sort, except barriers.

 Maybe it is time to implement an immutable ndarray subclass?
   

Is it possible at all ? I mean immutable array as a subclass of a
mutable array ? I too would like some immutable arrays  but this sounds
like an enormous task (as much as I like numpy, I find numpy sometimes
harder to use than matlab for some problems because of the immutability
- may be my own limitation, though),

 As for Matlab, I have noticed that they are experimenting with CUDA now, 
 to use nvidia's processors for hardware-acceleration.

Labview too; there was a small boost from NI at last ICASSP, and they
were demoing some GPU based operations (mostly blas/lapack/fft from what
I understood, the easy stuff).

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Francesc Alted wrote:
 I don't know OpenMP enough neither, but I'd say that in this list there 
 could be some people that could help.

 At any rate, I really like the OpenMP approach and prefer to have 
 support for it in Cython much better than threading, MPI or whatever.  
 But the thing is: is OpenMP stable, mature enough for allow using it in 
 most of common platforms?  I think that recent GCC compilers support 
 the latest incarnation (3.0) of the standard

The yet unreleased gcc 4.4 will supports 3.0, and gcc 4.3 supports a
subset of 3.0, I believe:

http://openmp.org/wp/2008/11/openmp-30-status/

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Michael Abshoff
Sturla Molden wrote:
 On 2/12/2009 12:20 PM, David Cournapeau wrote:

Hi,

 It does if you have access to the parallel toolbox I mentioned earlier
 in this thread (again, no experience with it, but I think it is
 specially popular on clusters; in that case, though, it is not limited
 to thread-based implementation).
 
 As has been mentioned, Matlab is a safer language for parallel computing 
   as arrays are immutable. There is almost no need for synchronization 
 of any sort, except barriers.
 
 Maybe it is time to implement an immutable ndarray subclass?
 
 With immutable arrays we can also avoid making temporary arrays in 
 expressions like y = a*b + c. y just gets an expression and three 
 immutable buffers. And then numexpr (or something like it) can take care 
 of the rest.
 
 As for Matlab, I have noticed that they are experimenting with CUDA now, 
 to use nvidia's processors for hardware-acceleration. As even modest 
 GPUs can yield hundreds of gigaflops, 

No even close. The current generation peaks at around 1.2 TFlops single 
precision, 280 GFlops double precision for ATI's hardware. The main 
problem with those numbers is that the memory on the graphics card 
cannot feed the data fast enough into the GPU to achieve theoretical 
peak. So those hundreds of GFlops are pure marketing :)

So in reality you might get anywhere from 20% to 60% (if you are lucky) 
locally before accounting for transfers from main memory to GPU memory 
and so on. Given that recent Intel CPUs give you about 7 to 11 Glops 
Double per core and libraries like ATLAS give you that performance today 
without the need to jump through hoops these number start to look a lot 
less impressive.

And Nvidia's number are lower than ATI's. NVidia's programming solution 
is much more advanced and rounded out compared to ATi's which is largely 
in closed beta. OpenCL is mostly vaporware at this point.

  that is going to be hard to match
 (unless we make an ndarray that uses the GPU). But again, as the 
 performance of GPUs comes from massive multithreading, immutability may 
 be the key here as well.

I have a K10 system with two Tesla C1060 GPUs to play with and have 
thought about adding CUDABlas support to Numpy/Scipy, but it hasn't been 
a priority for me. My main interest here is finite field arithmetic by 
making FFPack via LinBox use CUDABlas. If anyone wants an account to 
make numpy/scipy optionally use CUDABlas feel free to ping me off list 
and I can set you up.

 Sturla Molden

Cheers,

Michael

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
 

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Sturla Molden wrote:
 On 2/12/2009 1:50 PM, Francesc Alted wrote:

   
 Hey!  That's very nice to know.  We already have OpenMP support in 
 Cython for free (or apparently it seems so :-)
 

 Not we don't, as variable names are different in C and Cython. But 
 adding support for OpenMP would not bloat the Cython language.
   
What I meant was bloat the Cython compiler, i.e. Cython has to know 
about OpenMP and there must be OpenMP-specific code in the compiler. The 
grammar etc. wouldn't change.

But I think I'll be able to lobby it in, given a good spec and some time 
to implement it (about a day of work, so I think this is likely to 
happen within a year).

As Cython must understand what is going on anyway, having things like 
with cython.openmp.parallell is actually a lot easier to implement in 
Cython than using comments/pragmas, and looks nicer too IMO. (It means 
that only code generation and not the parser must be modified.)

Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
 I am curious: would you know what would be different in numpy's case
 compared to matlab array model concerning locks ? Matlab, up to
 recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3
 (or 7.4), it also uses multicore for mathematical functions (cos,
 etc...). So at least in matlab's model, it looks like it can be useful.
 I understand that numpy's model is more flexible (I don't think strided
 arrays can overlap in matlab for example, at least not from what you can
 see from the public API).

 cheers,

And the 'more flexible' is one of the biggest drawback. It's one of
the reasons Fortran is so fast compared to C: you can't have pointer
aliases, so you will be slower. With Numpy, it is the same.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
Yes, it is. You have to link against pthread (at least with Linux ;))

You have to write a single parallel region if you don't want this
overhead (which is not possible with Python).

Matthieu

2009/2/12 Gael Varoquaux gael.varoqu...@normalesup.org:
 On Wed, Feb 11, 2009 at 11:52:40PM -0600, Robert Kern wrote:
  This seem like pretty heavy solutions though.

 From a programmer's perspective, it seems to me like OpenMP is a muck
 lighter weight solution than pthreads.

 From a programmer's perspective, because, IMHO, openmp is implemented
 using pthreads. I do have difficulties verifying this on the web, but
 documents I find hint to that.

 Gaël
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion




-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
2009/2/12 Sturla Molden stu...@molden.no:
 On 2/12/2009 1:50 PM, Francesc Alted wrote:

 Hey!  That's very nice to know.  We already have OpenMP support in
 Cython for free (or apparently it seems so :-)

 Not we don't, as variable names are different in C and Cython. But
 adding support for OpenMP would not bloat the Cython language.

 Cython must exhange the variable names and leave the comment in C as a
 pragma to the C compiler.

 IMHO, OpenMP is the easiest way to use parallel computing in scientific
 projects. It is much easier than manually fiddling with threads,
 processes, MPI, etc. Just write code as you normally would, debug and
 verify. Then you spend five minutes inserting pragmas to the compiler,
 and et voilá you have a parallel program. The same code will then
 compile and run correctly for parallel or sequential execution,
 depending on a compiler switch (-fopenmp). You get load balancing for
 free, as that is built into OpenMP. OpenMPs API is so small that it just
 takes 10 minutes to learn.

 OpenMP currently works on SMPs (e.g. multicore CPUs), but there is work
 going on to port it to clusters as well.

In this matter, I think OpenMP will never be used. Besides, OpenMP
doesn't even support an unsigned int for a parallel loop variable.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Matthieu Brucher wrote:
 No - I have never seen deep explanation of the matlab model. The C api
 is so small that it is hard to deduce anything from it (except that the
 memory handling is not ref-counting-based, I don't know if it matters
 for our discussion of speeding up ufunc). I would guess that since two
 arrays cannot share data (COW-based), lock handling may be easier to
 deal with ? I am not really familiar with multi-thread programming (my
 only limited experience is for soft real-time programming for audio
 processing, where the issues are totally different, since latency
 matters as much if not more than throughput).
 

 It's not even a matter of multithread programming, in mono-core
 programming, the same issue can arise.
   

Which issue ?

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Outer join ?

2009-02-12 Thread bernhard.vo...@gmail.com
You might consider the groupby from the itertools module.

Do you have two keys only? I would prefer grouping on the first
column. For grouby you need to sort the array after the first column
then.

from itertools import groupby
a.sort(order='col1')

# target array: first col are unique dates, second col values for
key1, third col values for key2
data = numpy.zeros(len(unique(a['col1'])), dtype=dict(names=['dates',
'key1', 'key2'] , types=[long, float, float]))

for i, (date, items) in enumerate(groupby(a, lambda item: item
['col1'])):
data[i][dates] = date
for col1, col2, col3 in items:
data[i][col2] = col3

Hope this works! Bernhard

On Feb 12, 6:24 am, A B python6...@gmail.com wrote:
 Hi,

 I have the following data structure:

 col1 | col2 | col3

 20080101|key1|4
 20080201|key1|6
 20080301|key1|5
 20080301|key2|3.4
 20080601|key2|5.6

 For each key in the second column, I would like to create an array
 where for all unique values in the first column, there will be either
 a value or zero if there is no data available. Like so:

 # 20080101, 20080201, 20080301, 20080601

 key1 - 4, 6, 5,    0
 key2 - 0, 0, 3.4, 5.6

 Ideally, the results would end up in a 2d array.

 What's the most efficient way to accomplish this? Currently, I am
 getting a list of uniq col1 and col2 values into separate variables,
 then looping through each unique value in col2

 a = loadtxt(...)

 dates = unique(a[:]['col1'])
 keys = unique(a[:]['col2'])

 for key in keys:
     b = a[where(a[:]['col2'] == key)]
     ???

 Thanks in advance.
 ___
 Numpy-discussion mailing list
 numpy-discuss...@scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote:

 FYI, I am one of the core Cython developers and can make such 
 modifications in Cython itself as long as there's consensus on how it 
 should look on the Cython mailing list.  My problem is that I don't 
 really know OpenMP and have little experience with it, so I'm not the 
 best person for creating a draft for how such high-level OpenMP 
 constructs should look like in Cython.

I don't know the Cython internals, but I do know OpenMP. I mostly use it 
with Fortran.

The question is: Should OpenMP be comments in the Cython code (as they 
are in C and Fortran), or should OpenMP be special objects?

As for the GIL: No I don't think nogil should be implied. But Python 
objects should only be allowed as shared variables. Synchronization will 
then be as usual for shared variables in OpenMP (#pragma omp critical).

Here is my suggestion for syntax. If you just follow a consistent 
translation scheme, you don't need to know OpenMP in details. Here is a 
suggestion:


with openmp('parallel for', argument=iterable, ...):
-- insert pragma directly above for

with openmp(directive, argument=iterable, ...):
-- insert pragma and brackets

with openmp('atomic'): -- insert pragma directly

openmp('barrier') -- insert pragma directly


This by the way covers all of OpenMP. This is how it should translate:


with openmp('parallel for', private=(i,), shared=(n,),
 schedule='dynamic'):

 for i in range(n):
pass

Compiles to:

#pragma omp parallel for \
private(i) \
shared(n) \
schedule(dynamic)
for(i=0; in; i++) {
   /* whatever */
}


with openmp('parallel sections',
  reduction=('+',k), private=(i,j)):

 with openmp('section'):
 i = foobar()

 with openmp('section'):
 j = foobar()

 k = i + j


Compiles to:

#pragma omp parallel sections\
reduction(+:k)\
private(i,j)
{
 #pragma omp section
 {
i = foobar();
 }

 #pragma omp section
 {
j = foobar();
 }

 k = i+j;
}


With Python objects, the programmer must synchronize access:


with openmp('parallel for', shared=(pyobj,n), private=(i,)):
 for i in range(n):
 with openmp('critical'):
 pyobj += i


#pragma omp parallel for \
shared(pyobj,n) \
private(i)
for (i=0; in; i++) {
#pragma omp critical
{
   pyobj += i;
}
}


Atomic and barriers:


with openmp('atomic'): i += j

#pragma omp atomic
i += j;


with openmp('parallel for', default='private', shared(n,)):
for i in range(n):
   openmp('barrier')


#pragma omp parallel for \
default(private)\
shared(n)\
for (i=0; in; i++)
{
#pragma omp barrier
}


That is my suggestion. Easy to implement as you don't need to learn 
OpenMP first (not that it is difficult).


Sturla Molden






































___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Matthieu Brucher wrote:

 Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;)
 In C, if you have to arrays (two pointers), the compiler can't make
 aggressive optimizations because they may intersect. With Fortran,
 this is not possible. In this matter, Numpy behaves like C (everyone
 heard about the different a[indices] += a[other_intersecting_indices]
 issues), Matlab is more like Fortran.
   

I think it is hard to know exactly, because we don't know how matlab is
implemented. It is possible to handle special cases for non overlapping
arrays in numpy, once you are in C; I believe there are many codepath
which we could optimize aggressively, using openMP, SSE, etc... It is
just a lot of work to do so manually, so the real problem is how this
can be handled as generally as possible using a small core.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden

On 2/12/2009 1:44 PM, Sturla Molden wrote:


Here is an example of SciPy's ckdtree.pyx modified to use OpenMP.


It seems I managed to post an errorneous C file. :(

S.M.

/*  
 *  Parallel query for faster kd-tree searches on SMP computers.
 *  This function will release the GIL around the lengthy query 
loop and 
 *  use OpenMP for multithreading.
 *
 *  OpenMP is supported on GCC (since version 4.2) and MS Visual 
C++ 2005 
 *  and later. Mingw builds of GCC supports OpenMP through a pthread
 *  layer (pthreadGC2.dll). 
 *
 *  Compile with -fopenmp and link -lgomp -lpthread on GCC.
 *
 *  Written by Sturla Molden, Univ. Oslo, Jan. 24 2009  
 *
 *
 *  This code is released to public domain.
 *
 */


#include Python.h
#include omp.h

typedef void (*queryfun_t)(void *self, double *result_distances,
int*result_indices, double*x, int k, double eps, double p, double 
distance_upper_bound);

void parallel_query(queryfun_t query, int nthreads, int niter, void *self, 
double*result_distances, 
int *result_indices, double*x, int k, int m, double eps, double p, 
double distance_upper_bound)
{
int c;

/* allow the number of threads to be specified */
if (nthreads == 0) {
Py_BEGIN_ALLOW_THREADS {
#pragma omp parallel for private(c) schedule(guided)
for (c=0; cniter; c++) {
(*query)(self, result_distances + c*k, 
result_indices + c*k, x + c*m, k, 
   eps, p, distance_upper_bound);
}
} Py_END_ALLOW_THREADS
} else {
/* parallel queries with GIL released */
Py_BEGIN_ALLOW_THREADS {
#pragma omp parallel for private(c) schedule(guided) 
num_threads(nthreads)
for (c=0; cniter; c++) {
(*query)(self, result_distances + c*k, 
result_indices + c*k, x + c*m, k, 
   eps, p, distance_upper_bound);
}
} Py_END_ALLOW_THREADS
} 
}











































___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Michael Abshoff
David Cournapeau wrote:
 Matthieu Brucher wrote:
 For BLAS level 3, the MKL is parallelized (so matrix multiplication is).
   

Hi David,

 Same for ATLAS: thread support is one focus in the 3.9 serie, currently
 in development. 

ATLAS has had thread support for a long, long time. The 3.9 series just 
improves it substantially by using affinity when available and removes 
some long standing issues with allocation performance that one had to 
work around before by setting some defines at compile time.

  I have never used it, I don't know how it compare to the
 MKL,

It does compare quite well and is more or less on par with the latest 
MKL releases in the 3.9 series. 3.8.2 is maybe 10% to 15% slower on i7 
as well as Core2 cores than the MKL. On big advantage of ATLAS is that 
it tends to work when using it with numpy/scipy unlike the Intel MKL 
where one has to work around a bunch of oddities and jump through hoops 
to get it to work. It seems that Intel must rename at least one library 
in each release of the MKL to keep build system maintainers occupied :)

The big disadvantage of ATLAS is that Windows support is currently 
limited to 32 bits, but 3.9.5 and higher have SFU/SUA support, so 64 bit 
support is possible. Clint told me the main issue here was lack of 
access and it isn't too high on his priority list.

 David

Cheers,

Michael

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
 

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
2009/2/12 David Cournapeau da...@ar.media.kyoto-u.ac.jp:
 Matthieu Brucher wrote:
 No - I have never seen deep explanation of the matlab model. The C api
 is so small that it is hard to deduce anything from it (except that the
 memory handling is not ref-counting-based, I don't know if it matters
 for our discussion of speeding up ufunc). I would guess that since two
 arrays cannot share data (COW-based), lock handling may be easier to
 deal with ? I am not really familiar with multi-thread programming (my
 only limited experience is for soft real-time programming for audio
 processing, where the issues are totally different, since latency
 matters as much if not more than throughput).


 It's not even a matter of multithread programming, in mono-core
 programming, the same issue can arise.


 Which issue ?

Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;)
In C, if you have to arrays (two pointers), the compiler can't make
aggressive optimizations because they may intersect. With Fortran,
this is not possible. In this matter, Numpy behaves like C (everyone
heard about the different a[indices] += a[other_intersecting_indices]
issues), Matlab is more like Fortran.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
2009/2/12 David Cournapeau da...@ar.media.kyoto-u.ac.jp:
 Matthieu Brucher wrote:

 Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;)
 In C, if you have to arrays (two pointers), the compiler can't make
 aggressive optimizations because they may intersect. With Fortran,
 this is not possible. In this matter, Numpy behaves like C (everyone
 heard about the different a[indices] += a[other_intersecting_indices]
 issues), Matlab is more like Fortran.


 I think it is hard to know exactly, because we don't know how matlab is
 implemented. It is possible to handle special cases for non overlapping
 arrays in numpy, once you are in C; I believe there are many codepath
 which we could optimize aggressively, using openMP, SSE, etc... It is
 just a lot of work to do so manually, so the real problem is how this
 can be handled as generally as possible using a small core.

 David

Indeed, Matlab may not benefit from this. At least, COW makes it
possible to optimize agressively. Then, it's a matter of implementing
the loops in C or Fortran (or to use C99 extensions and rely on the
compiler). In C89, you will have absolutely no benefit (because there
are no way you can tell the compiler that there is no aliasing), in
Fortran, it will be optimized correctly.
The problem with optimizing codepath is that you have to detect them.
On the other hand, if you implement a numpy compliant array that
explicitely forbids aliasing (you could add some additional tests to
ensure that the parents are different, or stuff like that), you can
make the compiler optimize the code better.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Outer join ?

2009-02-12 Thread A B
This is probably more than I need but I will definitely keep it as
reference. Thank you.

On 2/12/09, bernhard.vo...@gmail.com bernhard.vo...@gmail.com wrote:
 You might consider the groupby from the itertools module.

 Do you have two keys only? I would prefer grouping on the first
 column. For grouby you need to sort the array after the first column
 then.

 from itertools import groupby
 a.sort(order='col1')

 # target array: first col are unique dates, second col values for
 key1, third col values for key2
 data = numpy.zeros(len(unique(a['col1'])), dtype=dict(names=['dates',
 'key1', 'key2'] , types=[long, float, float]))

 for i, (date, items) in enumerate(groupby(a, lambda item: item
 ['col1'])):
 data[i][dates] = date
 for col1, col2, col3 in items:
 data[i][col2] = col3

 Hope this works! Bernhard

 On Feb 12, 6:24 am, A B python6...@gmail.com wrote:
 Hi,

 I have the following data structure:

 col1 | col2 | col3

 20080101|key1|4
 20080201|key1|6
 20080301|key1|5
 20080301|key2|3.4
 20080601|key2|5.6

 For each key in the second column, I would like to create an array
 where for all unique values in the first column, there will be either
 a value or zero if there is no data available. Like so:

 # 20080101, 20080201, 20080301, 20080601

 key1 - 4, 6, 5,0
 key2 - 0, 0, 3.4, 5.6

 Ideally, the results would end up in a 2d array.

 What's the most efficient way to accomplish this? Currently, I am
 getting a list of uniq col1 and col2 values into separate variables,
 then looping through each unique value in col2

 a = loadtxt(...)

 dates = unique(a[:]['col1'])
 keys = unique(a[:]['col2'])

 for key in keys:
 b = a[where(a[:]['col2'] == key)]
 ???

 Thanks in advance.
 __
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Outer join ?

2009-02-12 Thread A B
On 2/11/09, Robert Kern robert.k...@gmail.com wrote:
 On Wed, Feb 11, 2009 at 23:24, A B python6...@gmail.com wrote:
 Hi,

 I have the following data structure:

 col1 | col2 | col3

 20080101|key1|4
 20080201|key1|6
 20080301|key1|5
 20080301|key2|3.4
 20080601|key2|5.6

 For each key in the second column, I would like to create an array
 where for all unique values in the first column, there will be either
 a value or zero if there is no data available. Like so:

 # 20080101, 20080201, 20080301, 20080601

 key1 - 4, 6, 5,0
 key2 - 0, 0, 3.4, 5.6

 Ideally, the results would end up in a 2d array.

 What's the most efficient way to accomplish this? Currently, I am
 getting a list of uniq col1 and col2 values into separate variables,
 then looping through each unique value in col2

 a = loadtxt(...)

 dates = unique(a[:]['col1'])
 keys = unique(a[:]['col2'])

 for key in keys:
b = a[where(a[:]['col2'] == key)]
???

 Take a look at setmember1d().

 --
 Robert Kern

Thanks. That's exactly what I need,  but I'm not sure about the next
step after I do

setmember1d(dates, b['date'])

and have the bool arr/mask ... How can I grow b to have 0 values for
the missing keys?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Gael Varoquaux
On Thu, Feb 12, 2009 at 03:27:51PM +0100, Sturla Molden wrote:
 The question is: Should OpenMP be comments in the Cython code (as they 
 are in C and Fortran), or should OpenMP be special objects?

My two cents: go for cython objects/statements. Not only does code in
comments looks weird and a hack, but also it means to you have to hack
the parser. Finally cython can be used in a valid Python syntax (at least
part of it) thanks to decorators giving the types. Ondrej Certik
implemented to for sympy. Having pragmas in comments would make it hard
to use the vanilla Python parser to read this code, or to do
metaprogramming on the pragmas.

The logic of Python is pretty much that everything is accessible to the
programmer, to be modified, eg for testing. Putting logics in comments
breaks this.

I like the use of the with statement for this purpose.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt: dtype=None and unpack=True

2009-02-12 Thread Ryan May
On Wed, Feb 11, 2009 at 10:47 PM, Pierre GM pgmdevl...@gmail.com wrote:


 On Feb 11, 2009, at 11:38 PM, Ryan May wrote:

  Pierre,
 
  I noticed that using dtype=None with a heterogeneous set of data,
  trying to use unpack=True to get the columns into separate arrays
  (instead of a structured array) doesn't work.  I've attached a patch
  that, in the case of dtype=None, unpacks the fields in the final
  array into a list of separate arrays.  Does this seem like a good
  idea to you?

 Nope, as it breaks consistency: depending on some input parameters,
 you either get an array or a list. I think it's better to leave it as
 it is, maybe adding an extra line in the doc precising that
 unpack=True doesn't do anything for structured arrays.


Ah, I hadn't thought of that.  I was only thinking in terms of the behavior
of unpacking on return, not in the actual returned object.  You're right,
it's a bad idea.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?

2009-02-12 Thread Michael S. Gilbert
On Fri, 6 Feb 2009 18:18:44 -0500 Michael S. Gilbert wrote:
 BTW, there is a 64-bit version of the reference mersenne twister 
 implementation available [1].

I did some testing with this 64-bit implementation (mt19937-64). I've
found that it is actually slower than the 32-bit reference (mt19937ar)
on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is
likely because it generates 64-bit long long ints instead of 32-bit
long ints. However, it should be possible to break up each 64-bit int
into two 32-bit ints, then the runtime would appear to be almost twice
as fast. One other consideration to keep in mind is that the 64-bit
version is not stream-compatible with the 32-bit implementation (you
will get different sequences for the same input seed).

Would it be worth it to implement this in numpy in order to get an
almost 2x speedup on 64-bit machines?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 5:24 PM, Gael Varoquaux wrote:

 My two cents: go for cython objects/statements. Not only does code in
 comments looks weird and a hack, but also it means to you have to hack
 the parser. 

I agree with this. Particularly because Cython uses intendation as 
syntax. With comments you would have to use 'end' tags like in Fortran:

!$omp parallel do private(i) shared(n)
do i = 1,n
   !$omp parallel do private(j) shared(i,n)
   do j = i, n
  ! whatever
   end do
   !$omp end parallel do
end do
!$omp end parallel do

But this is Fortran. It is meant to look bad. :)


 Finally cython can be used in a valid Python syntax (at least
 part of it) thanks to decorators giving the types. 

I don't think OpenMP will ever be used from pure Python.

But please, add a -openmp compiler swich in Cython. If it is not there,
all openmp statements should be ignored and translate to nothing.

S.M.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Integer cast problems

2009-02-12 Thread Ralph Kube
Hi there,
I have a little problem here with array indexing, hope you see the problem.
I use the following loop to calculate some integrals

import numpy as N
from scipy.integrate import quad
T = 1
dt = 0.005
L = 3
n = 2
ints = N.zeros([T/dt])

for t in N.arange(0, T, dt):
 a = quad(lambda 
x:-1*(1-4*(t**4))*N.exp(-t**4)*N.exp(-x**2)*N.cos(n*N.pi*(x-L)/(2*L)), 
-L, L)[0]
 ints[int(t/dt)] = a
 print t, N.int32(t/dt), t/dt, a, ints[int(t/dt)]

The output from the print statement looks like:

0.14 28 28.0 2.52124867251e-16 2.52124867251e-16
0.145 28 29.0 2.03015199575e-16 2.03015199575e-16
0.15 30 30.0 2.40857836418e-16 2.40857836418e-16
0.155 31 31.0 2.52191011339e-16 2.52191011339e-16

The same happens on the ipython prompt:

0.145 * 0.005 = 28.996
N.int32(0.145 * 0.005) = 28

Any ideas how to deal with this?

Cheers, Ralph
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integer cast problems

2009-02-12 Thread Ryan May
On Thu, Feb 12, 2009 at 11:21 AM, Ralph Kube ralphk...@googlemail.comwrote:

 Hi there,
 I have a little problem here with array indexing, hope you see the problem.
 I use the following loop to calculate some integrals

 import numpy as N
 from scipy.integrate import quad
 T = 1
 dt = 0.005
 L = 3
 n = 2
 ints = N.zeros([T/dt])

 for t in N.arange(0, T, dt):
 a = quad(lambda
 x:-1*(1-4*(t**4))*N.exp(-t**4)*N.exp(-x**2)*N.cos(n*N.pi*(x-L)/(2*L)),
 -L, L)[0]
 ints[int(t/dt)] = a
 print t, N.int32(t/dt), t/dt, a, ints[int(t/dt)]

 The output from the print statement looks like:

 0.14 28 28.0 2.52124867251e-16 2.52124867251e-16
 0.145 28 29.0 2.03015199575e-16 2.03015199575e-16
 0.15 30 30.0 2.40857836418e-16 2.40857836418e-16
 0.155 31 31.0 2.52191011339e-16 2.52191011339e-16

 The same happens on the ipython prompt:

 0.145 * 0.005 = 28.996
 N.int32(0.145 * 0.005) = 28

 Any ideas how to deal with this?


I'm assuming you mean 0.145 / 0.005 = 28.996

When you cast to an integer, it *truncates* the fractional part, and life
with floating point says that what should be an exact result won't
necessarily be exact.  Try using N.around.

Ryan


-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integer cast problems

2009-02-12 Thread Gregor Thalhammer
Ralph Kube schrieb:
 Hi there,
 I have a little problem here with array indexing, hope you see the problem.
 I use the following loop to calculate some integrals

 ...
 0.145 * 0.005 = 28.996
 N.int32(0.145 * 0.005) = 28
conversion to int truncates, it doesn't round. Try
N.int32(0.145 * 0.005  +  0.5)

Gregor
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integer cast problems

2009-02-12 Thread Keith Goodman
On Thu, Feb 12, 2009 at 9:21 AM, Ralph Kube ralphk...@googlemail.com wrote:
 The same happens on the ipython prompt:

 0.145 * 0.005 = 28.996
 N.int32(0.145 * 0.005) = 28

 Any ideas how to deal with this?

Do you want the answer to be 29? N.int32 truncates. If you want to
round instead, you could use that standard trick of adding 0.5:

 np.int32(0.5 + 0.145 / 0.005)
   29

or

 np.round(0.145 / 0.005)
   29.0
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?

2009-02-12 Thread Robert Kern
On Thu, Feb 12, 2009 at 10:58, Michael S. Gilbert
michael.s.gilb...@gmail.com wrote:
 On Fri, 6 Feb 2009 18:18:44 -0500 Michael S. Gilbert wrote:
 BTW, there is a 64-bit version of the reference mersenne twister
 implementation available [1].

 I did some testing with this 64-bit implementation (mt19937-64). I've
 found that it is actually slower than the 32-bit reference (mt19937ar)
 on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is
 likely because it generates 64-bit long long ints instead of 32-bit
 long ints. However, it should be possible to break up each 64-bit int
 into two 32-bit ints, then the runtime would appear to be almost twice
 as fast.

Why do you think that?

 One other consideration to keep in mind is that the 64-bit
 version is not stream-compatible with the 32-bit implementation (you
 will get different sequences for the same input seed).

 Would it be worth it to implement this in numpy in order to get an
 almost 2x speedup on 64-bit machines?

The incompatibility is a showstopper to replacing the PRNG on any platform.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Sturla Molden wrote:
 On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote:
 
 FYI, I am one of the core Cython developers and can make such 
 modifications in Cython itself as long as there's consensus on how it 
 should look on the Cython mailing list.  My problem is that I don't 
 really know OpenMP and have little experience with it, so I'm not the 
 best person for creating a draft for how such high-level OpenMP 
 constructs should look like in Cython.
 
 I don't know the Cython internals, but I do know OpenMP. I mostly use it 
 with Fortran.
 
 The question is: Should OpenMP be comments in the Cython code (as they 
 are in C and Fortran), or should OpenMP be special objects?

Statements more or less like C and Fortran, but disguised as Python 
syntax rather than pragmas/comments.

 As for the GIL: No I don't think nogil should be implied. But Python 
 objects should only be allowed as shared variables. Synchronization will 
 then be as usual for shared variables in OpenMP (#pragma omp critical).

Good point.

 Here is my suggestion for syntax. If you just follow a consistent 
 translation scheme, you don't need to know OpenMP in details. Here is a 
 suggestion:
 
 
 with openmp('parallel for', argument=iterable, ...):
 -- insert pragma directly above for
 
 with openmp(directive, argument=iterable, ...):
 -- insert pragma and brackets
 
 with openmp('atomic'): -- insert pragma directly
 
 openmp('barrier') -- insert pragma directly
 
 
 This by the way covers all of OpenMP. This is how it should translate:
 
 
 with openmp('parallel for', private=(i,), shared=(n,),

IMO there's a problem with using literal variable names here, because 
Python syntax implies that the value is passed. One shouldn't make 
syntax where private=(i,) is legal but private=(f(),) isn't. So I'd go 
for strings.

  schedule='dynamic'):
 
  for i in range(n):
 pass
 
 Compiles to:
 
 #pragma omp parallel for \
 private(i) \
 shared(n) \
 schedule(dynamic)
 for(i=0; in; i++) {
/* whatever */
 }

Hmm... yes. Care would need to be taken though because Cython might in 
the future very well generate a while loop instead for such a 
statement under some circumstances, and that won't work with OpenMP. One 
should be careful with assuming what the C result will be of Cython 
code. That's why I proposed using an alternative construct which both 
does the OpenMP stuff and contains the for loop.

 With Python objects, the programmer must synchronize access:
 
 
 with openmp('parallel for', shared=(pyobj,n), private=(i,)):
  for i in range(n):
  with openmp('critical'):
  pyobj += i

We already have the nogil stuff, so with e.g. with openmp.critical: 
instead we would leave the way open for checking correctness in this area.

Anyway, thanks for the input. I've made it a ticket and might get back 
to it when I've got more time through a discussion on the Cython list.

http://trac.cython.org/cython_trac/ticket/211

-- 
Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Dag Sverre Seljebotn wrote:
 Hmm... yes. Care would need to be taken though because Cython might in 
 the future very well generate a while loop instead for such a 
 statement under some circumstances, and that won't work with OpenMP. One 
 should be careful with assuming what the C result will be of Cython 
 code. That's why I proposed using an alternative construct which both 
 does the OpenMP stuff and contains the for loop.

As a matter of fact, the next Cython release might prepend all 
for-in-range-loops with an if-test under some circumstances, in order to 
better match Python looping semantics (if the loop isn't executed, the 
loop counter should never be written to -- in contrast with C). So this 
is already happening. OpenMP might need different loop semantics and so 
calls for a different construct.

-- 
Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
 If your problem is evaluating vector expressions just like the above
 (i.e. without using transcendental functions like sin, exp, etc...),
 usually the bottleneck is on memory access, so using several threads is
 simply not going to help you achieving better performance, but rather
 the contrary (you have to deal with the additional thread overhead).
 So, frankly, I would not waste more time trying to paralelize that.

I had a feeling this would be the case, I just haven't been sure about
what point this comes into play.  I really need to do some tests to
understand exactly how CPU load and memory bandwidth interplay in
these situations.  I have worked with GPUs before and often the reason
the GPU is faster than the CPU is simply the higher memory bandwidth.

 As an example, in the recent support of VML in numexpr we have disabled
 the use of VML (as well as the OpenMP threading support that comes with
 it) in cases like yours, where only additions and multiplications are
 performed (these operations are very fast in modern processors, and the
 sole bottleneck for this case is the memory bandwidth, as I've said).
 However, in case of expressions containing operations like division or
 transcendental functions, then VML activates automatically, and you can
 make use of several cores if you want.  So, if you are in this case,
 and you have access to Intel MKL (the library that contains VML), you
 may want to give numexpr a try.

OK, this is very interesting indeed.  I didn't know that numexpr has
support for VML, which has openmp support.  I will definitely have
look at this.  Thanks!

Brian

 HTH,

 --
 Francesc Alted
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
 Recent Matlab versions use Intels Math Kernel Library, which performs
 automatic multi-threading - also for mathematical functions like sin
 etc, but not for  addition, multiplication etc. It seems to me Matlab
 itself does not take care of multi-threading. On
 http://www.intel.com/software/products/mkl/data/vml/functions/_listfunc.html
 you can have a look at the performance data of the MKL vectorized math
 functions. Around a vector length between 100-1000, depending on which
 function, precision, cpu architecture, they switch to multi-threading.

Fantastic, thanks for this pointer!

Brian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
 At any rate, I really like the OpenMP approach and prefer to have
 support for it in Cython much better than threading, MPI or whatever.
 But the thing is: is OpenMP stable, mature enough for allow using it in
 most of common platforms?  I think that recent GCC compilers support
 the latest incarnation (3.0) of the standard, but IIRC, MSVC 2008 only
 supports OpenMP 2.5 specification.  I'm not sure how this would affect
 a possible implementation in Cython, tough.

OpenMP has been around for a long time.  But in other ways, I don't
consider it to be completely stable and mature - mainly because of the
fact that there haven't been free compilers (like gcc) that support
OpenMP for very long.  For example, I am running OS X Leopard, which
is back at gcc 4.0.1 - a good ways previous to the OpenMP support.
But, this will hopefully change in the next few years as everyone
moves to more recent versions of gcc.

But, Cython could simply do a test of the compiler to see if it
supports OpenMP and what version it supports.

Brian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?

2009-02-12 Thread Michael S. Gilbert
On Thu, 12 Feb 2009 13:18:26 -0600 Robert Kern wrote:
  I did some testing with this 64-bit implementation (mt19937-64). I've
  found that it is actually slower than the 32-bit reference (mt19937ar)
  on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is
  likely because it generates 64-bit long long ints instead of 32-bit
  long ints. However, it should be possible to break up each 64-bit int
  into two 32-bit ints, then the runtime would appear to be almost twice
  as fast.
 
 Why do you think that?

You could also think of it the other way (in terms of generating 64-bit
ints). Instead of generating two 32-bit rints and concatenating them
for a 64-bit int, you can just directly generate the 64-bit int. Since
the 64-bit int requires only slightly more time to generate than either
of the 32-bit ints individually, an almost 2x speedup is achieved.

  One other consideration to keep in mind is that the 64-bit
  version is not stream-compatible with the 32-bit implementation (you
  will get different sequences for the same input seed).
 
  Would it be worth it to implement this in numpy in order to get an
  almost 2x speedup on 64-bit machines?
 
 The incompatibility is a showstopper to replacing the PRNG on any platform.

Why is stream-compatibility such a stringent requirement?  It seems
like this contstraint majorly limits your ability to adopt new/better
technologies and reduces your flexibility to make changes to your code
as needed.  What end-use applications require stream-compatibility?  The
only thing I can think of is verification/regression testing for numpy,
but those test cases could be updated to account for the break in
compatibility (and specifically using the reference implementation's
expected output).  Wouldn't sufficient documentation of the change in
behavior be sufficient?

Regards,
Mike
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
Wow, interesting thread.  Thanks everyone for the ideas.  A few more comments:

GPUs/CUDA:

* Even though there is a bottleneck between main memory and GPU
memory, as Nathan mentioned, the much larger memory bandwidth on a GPU
often makes GPUs great for memory bound computations...as long as you
can leave your data on the GPU for most of the computation.  In my
case I can do this and this is something I am pursuing as well.

OpenMP:

I don't really like OpenMP (pragma?!?), but it would be very nice if
Cython had optional support for OpenMP that didn't use comments.

Other ideas:

What I would really like is a nice, super fast *library* built on top
of pthreads that made it possible to do OpenMP-like things in Cython,
but without depending on having an OpenMP compiler.  Basically a
fancy, fast thread pool implementation in Cython.

And a question:

With the new Numpy support in Cython, does Cython release the GIL if
it can when running through through loops over numpy arrays?  Does
Cython call into the C API during these sections?

Cheers,

Brian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Brian Granger wrote:
 And a question:

 With the new Numpy support in Cython, does Cython release the GIL if
 it can when running through through loops over numpy arrays?  Does
 Cython call into the C API during these sections?

You know, I thought of the exact same thing when reading your post. No,
you need the GIL currently, but that's something I'd like to fix.

Ideally, it would be something like this:

cdef int i, s = 0, n = ...
cdef np.ndarray[int] arr = ... # will require the GIL
with nogil:
for i in range(n):
s += arr[i] # does not require GIL

The only Python operation needed on the last line is throwing an exception
if one is out of bounds; but I can always insert code to reacquire the GIL
in that exceptional case. (And in most cases like this the user will turn
off bounds-checking anyway.)

http://trac.cython.org/cython_trac/ticket/210 will contain progress on this.

Dag Sverre

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?

2009-02-12 Thread Robert Kern
On Thu, Feb 12, 2009 at 14:17, Michael S. Gilbert
michael.s.gilb...@gmail.com wrote:
 On Thu, 12 Feb 2009 13:18:26 -0600 Robert Kern wrote:
  I did some testing with this 64-bit implementation (mt19937-64). I've
  found that it is actually slower than the 32-bit reference (mt19937ar)
  on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is
  likely because it generates 64-bit long long ints instead of 32-bit
  long ints. However, it should be possible to break up each 64-bit int
  into two 32-bit ints, then the runtime would appear to be almost twice
  as fast.

 Why do you think that?

 You could also think of it the other way (in terms of generating 64-bit
 ints). Instead of generating two 32-bit rints and concatenating them
 for a 64-bit int, you can just directly generate the 64-bit int. Since
 the 64-bit int requires only slightly more time to generate than either
 of the 32-bit ints individually, an almost 2x speedup is achieved.

shrug I'll believe it when I see it.

  One other consideration to keep in mind is that the 64-bit
  version is not stream-compatible with the 32-bit implementation (you
  will get different sequences for the same input seed).
 
  Would it be worth it to implement this in numpy in order to get an
  almost 2x speedup on 64-bit machines?

 The incompatibility is a showstopper to replacing the PRNG on any platform.

 Why is stream-compatibility such a stringent requirement?  It seems
 like this contstraint majorly limits your ability to adopt new/better
 technologies and reduces your flexibility to make changes to your code
 as needed.  What end-use applications require stream-compatibility?  The
 only thing I can think of is verification/regression testing for numpy,
 but those test cases could be updated to account for the break in
 compatibility (and specifically using the reference implementation's
 expected output).  Wouldn't sufficient documentation of the change in
 behavior be sufficient?

Some people don't think so. People have asked for more stringent
compatibility than we can already provide (i.e. replicability even in
the face of bug fixes). People use these as inputs to their scientific
simulations. I'm not going to intentionally make their lives harder
than that.

Bruce Southey was working on exposing the innards a bit so that you
could make use the a different core PRNG while reusing the
numpy-specific stuff in RandomState. That would be the approach to
apply different technologies.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?

2009-02-12 Thread Robert Kern
On Thu, Feb 12, 2009 at 14:17, Michael S. Gilbert
michael.s.gilb...@gmail.com wrote:
 On Thu, 12 Feb 2009 13:18:26 -0600 Robert Kern wrote:
  I did some testing with this 64-bit implementation (mt19937-64). I've
  found that it is actually slower than the 32-bit reference (mt19937ar)
  on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is
  likely because it generates 64-bit long long ints instead of 32-bit
  long ints. However, it should be possible to break up each 64-bit int
  into two 32-bit ints, then the runtime would appear to be almost twice
  as fast.

 Why do you think that?

 You could also think of it the other way (in terms of generating 64-bit
 ints). Instead of generating two 32-bit rints and concatenating them
 for a 64-bit int, you can just directly generate the 64-bit int. Since
 the 64-bit int requires only slightly more time to generate than either
 of the 32-bit ints individually, an almost 2x speedup is achieved.

shrug I'll believe it when I see it.

  One other consideration to keep in mind is that the 64-bit
  version is not stream-compatible with the 32-bit implementation (you
  will get different sequences for the same input seed).
 
  Would it be worth it to implement this in numpy in order to get an
  almost 2x speedup on 64-bit machines?

 The incompatibility is a showstopper to replacing the PRNG on any platform.

 Why is stream-compatibility such a stringent requirement?  It seems
 like this contstraint majorly limits your ability to adopt new/better
 technologies and reduces your flexibility to make changes to your code
 as needed.  What end-use applications require stream-compatibility?  The
 only thing I can think of is verification/regression testing for numpy,
 but those test cases could be updated to account for the break in
 compatibility (and specifically using the reference implementation's
 expected output).  Wouldn't sufficient documentation of the change in
 behavior be sufficient?

Some people don't think so. People have asked for more stringent
compatibility than we can already provide (i.e. replicability even in
the face of bug fixes). People use these as inputs to their scientific
simulations. I'm not going to intentionally make their lives harder
than that.

Bruce Southey was working on exposing the innards a bit so that you
could make use the a different core PRNG while reusing the
numpy-specific stuff in RandomState. That would be the approach to
apply different technologies.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?

2009-02-12 Thread Michael S. Gilbert
On Thu, 12 Feb 2009 14:32:02 -0600 Robert Kern wrote:
 You could also think of it the other way (in terms of generating 64-bit
 ints). Instead of generating two 32-bit rints and concatenating them
 for a 64-bit int, you can just directly generate the 64-bit int. Since
 the 64-bit int requires only slightly more time to generate than either
 of the 32-bit ints individually, an almost 2x speedup is achieved.

shrug I'll believe it when I see it.

I'll put together a proof of concept when I have the time.

 Some people don't think so. People have asked for more stringent
 compatibility than we can already provide (i.e. replicability even in
 the face of bug fixes). People use these as inputs to their scientific
 simulations. I'm not going to intentionally make their lives harder
 than that.
 
 Bruce Southey was working on exposing the innards a bit so that you
 could make use the a different core PRNG while reusing the
 numpy-specific stuff in RandomState. That would be the approach to
 apply different technologies.

This would be very useful.  A backward-compatibility=numpy version
number flag could be offered to fall back to a particular
implementation so that you can push forward and still offer
compatibility as needed.

Regards,
Mike
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
 You know, I thought of the exact same thing when reading your post. No,
 you need the GIL currently, but that's something I'd like to fix.

 Ideally, it would be something like this:

 cdef int i, s = 0, n = ...
 cdef np.ndarray[int] arr = ... # will require the GIL
 with nogil:
for i in range(n):
s += arr[i] # does not require GIL

Yep, that would be fantastic!!!  While it doesn't make the code
multithreaded, having this sure would make it easy to get code ready
for multithreading

 The only Python operation needed on the last line is throwing an exception
 if one is out of bounds; but I can always insert code to reacquire the GIL
 in that exceptional case. (And in most cases like this the user will turn
 off bounds-checking anyway.)

 http://trac.cython.org/cython_trac/ticket/210 will contain progress on this.

Great, I will watch this.  I am more than willing to test this as well
as it is something I would use often.

Cheers,

Brian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] umath build failures @ trunk on some platforms?

2009-02-12 Thread Pauli Virtanen
Hi,

The Buildbot (up once again) is showing build failures on some platforms:


http://buildbot.scipy.org/builders/Windows_XP_x86_64_MSVC/builds/875/steps/shell/logs/stdio

http://buildbot.scipy.org/builders/Linux_SPARC_64_Debian/builds/423/steps/shell/logs/stdio

Are these a sign of some bug (in the new math config system?), or
a glitch in the buildslaves only?

-- 
Pauli Virtanen

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] umath build failures @ trunk on some platforms?

2009-02-12 Thread Charles R Harris
On Thu, Feb 12, 2009 at 2:23 PM, Pauli Virtanen p...@iki.fi wrote:

 Hi,

 The Buildbot (up once again) is showing build failures on some platforms:


 http://buildbot.scipy.org/builders/Windows_XP_x86_64_MSVC/builds/875/steps/shell/logs/stdio

 http://buildbot.scipy.org/builders/Linux_SPARC_64_Debian/builds/423/steps/shell/logs/stdio



 Are these a sign of some bug (in the new math config system?), or
 a glitch in the buildslaves only?


They are old problems. The Debian problem comes from missing prototypes, the
Windows problems from a missing function. I had workarounds for these but
David removed them and wants to solve them otherwise. The Debian problem
probably won't be solved at all because it is a distro problem, you might
try an upgrade.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
 Sturla Molden wrote:

 IMO there's a problem with using literal variable names here, because
 Python syntax implies that the value is passed. One shouldn't make
 syntax where private=(i,) is legal but private=(f(),) isn't.

The latter would be illegal in OpenMP as well. OpenMP pragmas only take
variable names, not function calls.

S.M.




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Filling gaps

2009-02-12 Thread A B
Hi,
Are there any routines to fill in the gaps in an array. The simplest
would be by carrying the last known observation forward.
0,0,10,8,0,0,7,0
0,0,10,8,8,8,7,7
Or by somehow interpolating the missing values based on the previous
and next known observations (mean).
Thanks.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Filling gaps

2009-02-12 Thread Pierre GM

On Feb 12, 2009, at 8:22 PM, A B wrote:

 Hi,
 Are there any routines to fill in the gaps in an array. The simplest
 would be by carrying the last known observation forward.
 0,0,10,8,0,0,7,0
 0,0,10,8,8,8,7,7
 Or by somehow interpolating the missing values based on the previous
 and next known observations (mean).
 Thanks.


The functions `forward_fill` and `backward_fill` in scikits.timeseries  
should do what you want. They work also on MaskedArray objects,  
meaning that you don't need to have actual series.
The catch is that you need to install scikits.timeseries, of course.  
More info here:http://pytseries.sourceforge.net/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Filling gaps

2009-02-12 Thread Keith Goodman
On Thu, Feb 12, 2009 at 5:22 PM, A B python6...@gmail.com wrote:
 Are there any routines to fill in the gaps in an array. The simplest
 would be by carrying the last known observation forward.
 0,0,10,8,0,0,7,0
 0,0,10,8,8,8,7,7

Here's an obvious hack for 1d arrays:

def fill_forward(x, miss=0):
y = x.copy()
for i in range(x.shape[0]):
if y[i] == miss:
y[i] = y[i-1]
return y

Seems to work:

 x
   array([ 0,  0, 10,  8,  0,  0,  7,  0])
 fill_forward(x)
   array([ 0,  0, 10,  8,  8,  8,  7,  7])
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Filling gaps

2009-02-12 Thread Keith Goodman
On Thu, Feb 12, 2009 at 5:52 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Thu, Feb 12, 2009 at 5:22 PM, A B python6...@gmail.com wrote:
 Are there any routines to fill in the gaps in an array. The simplest
 would be by carrying the last known observation forward.
 0,0,10,8,0,0,7,0
 0,0,10,8,8,8,7,7

 Here's an obvious hack for 1d arrays:

 def fill_forward(x, miss=0):
y = x.copy()
for i in range(x.shape[0]):
if y[i] == miss:
y[i] = y[i-1]
return y

 Seems to work:

 x
   array([ 0,  0, 10,  8,  0,  0,  7,  0])
 fill_forward(x)
   array([ 0,  0, 10,  8,  8,  8,  7,  7])

I guess that should be

for i in range(1, x.shape[0]):

instead of

for i in range(x.shape[0]):

to avoid replacing the first element of the array, if it is missing,
with the last.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Filling gaps

2009-02-12 Thread Keith Goodman
On Thu, Feb 12, 2009 at 6:04 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Thu, Feb 12, 2009 at 5:52 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Thu, Feb 12, 2009 at 5:22 PM, A B python6...@gmail.com wrote:
 Are there any routines to fill in the gaps in an array. The simplest
 would be by carrying the last known observation forward.
 0,0,10,8,0,0,7,0
 0,0,10,8,8,8,7,7

 Here's an obvious hack for 1d arrays:

 def fill_forward(x, miss=0):
y = x.copy()
for i in range(x.shape[0]):
if y[i] == miss:
y[i] = y[i-1]
return y

 Seems to work:

 x
   array([ 0,  0, 10,  8,  0,  0,  7,  0])
 fill_forward(x)
   array([ 0,  0, 10,  8,  8,  8,  7,  7])

 I guess that should be

for i in range(1, x.shape[0]):

 instead of

for i in range(x.shape[0]):

 to avoid replacing the first element of the array, if it is missing,
 with the last.

For large 1d x arrays, this might be faster:

def fill_forward2(x, miss=0):
y = x.copy()
while np.any(y == miss):
idx = np.where(y == miss)[0]
y[idx] = y[idx-1]
return y

But it does replace the first element of the array, if it is missing,
with the last.

We could speed it up by doing (y == miss) only once per loop. (But I
bet the np.where is the bottleneck.)
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bilateral filter: bug corrected

2009-02-12 Thread Nadav Horesh

Attached here a code for bilateral filter:

1. The code is composed of a cython back end (bilateral_base.pyx) and a python 
front end (bilateral.py)
2. I do not have the tools to make windows binaries (I run it on gentoo linux) .
3. It is not hard to strip the cython code to get a pure (and slow) python 
implementation.
4. If someone finds the licensing inadequate, I have no problem to re-license.


Any comments are (more then) welcome

  Nadav




bilateral_base.pyx
Description: bilateral_base.pyx
#   Copyright 2008, Nadav Horesh
#   nadavh at visionsense dot com
#
#   The software is licenced under BSD licence.
'''
 A front end functions for bilateral filtering.
   The output functions
 bilateral
 bilateral_slow
 bilateral_fast
  should be roughly the same, but there should be a considerable speed factor.
'''


from numpy.numarray.nd_image import generic_filter
import bilateral_base as _BB

def bilateral(mat, xy_sig, z_sig, filter_size=None, mode='reflect'):
'''
Bilateral filter a 2D array.
  mat: A 2D array
  xy_sig:  The sigma of the spatial Gaussian filter.
  z_sig:   The sigma of the gray levels Gaussian filter.
  filter_size: Size of the spatial filter kernel: None or values  2 --- auto select.
  mode:See numpy.numarray.nd_image.generic_filter documentation

output: A 2D array of the same dimensions as mat and a float64 dtype

Remarks:
  1. The spatial filter kernel size is ~4*xy_sig, unless specified otherwise
  2. The algorithm is slow but has a minimal memory footprint
'''

filter_fcn = _BB.Bilat_fcn(xy_sig, z_sig, filter_size)
size = filter_fcn.xy_size
return generic_filter(mat, filter_fcn.cfilter, size=size, mode=mode)

def bilateral_slow(mat, xy_sig, z_sig, filter_size=None, mode='reflect'):
'A pure python implementation of the bilateral filter, for details see bilateral doc.'
filter_fcn = _BB.Bilat_fcn(xy_sig, z_sig, filter_size)
size = filter_fcn.xy_size
return generic_filter(mat, filter_fcn, size=size, mode=mode)

def bilateral_fast(mat, xy_sig, z_sig, filter_size=None, mode='reflect'):
'A fast implementation of bilateral filter, for details see bilateral doc.'
filter_fcn = _BB.Bilat_fcn(xy_sig, z_sig, filter_size)
size = filter_fcn.xy_size
return generic_filter(mat, filter_fcn.fc_filter, size =size, mode=mode)
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion