Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages
Fernando Perez wrote: On Wed, Feb 11, 2009 at 6:17 PM, David Cournapeau courn...@gmail.com wrote: Unfortunately, it does require some work, because hardy uses g77 instead of gfortran, so the source package has to be different (once hardy is done, all the one below would be easy, though). I am not sure how to do that with PPA (the doc is not great). OK, thanks for the info. This is already very useful. What exactly is the expected problem and how would I verify that I'm not getting hit by it? The context is that I have built your packages on Hardy and got no FTBFS errors in a clean sbuild machine and was about to upload this to my lab's repo but saw your email. (Also, I have been using scipy 0.7.0rc2-ish apparently descended from the same Debian source package as yours for a couple weeks and have noted no problems of late.) nosetest scipy on an amd64 with lots of installed packages shows: Ran 3211 tests in 356.572s FAILED (SKIP=13, errors=9) Where the errors are all known failures. The same test on the i386 sbuild shows: Ran 3211 tests in 312.464s FAILED (SKIP=16, errors=204) Most of the failures appear to be from weave/scxx and I suspect have to do with some package not being installed -- if that's true, it's probably a packaging bug and not a scipy bug. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On Thu, Feb 12, 2009 at 12:42:37AM -0600, Robert Kern wrote: It is implemented using threads, with Windows native threads on Windows. I think Gaël really just meant threads there. I guess so :). Once you reformulate my remark in proper terms, this is indeed what comes out. I guess all what it means is that openMP is implemented using native OS threads, and is nothing more than an abstraction on them. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages
Andrew Straw wrote: Fernando Perez wrote: On Wed, Feb 11, 2009 at 6:17 PM, David Cournapeau courn...@gmail.com wrote: Unfortunately, it does require some work, because hardy uses g77 instead of gfortran, so the source package has to be different (once hardy is done, all the one below would be easy, though). I am not sure how to do that with PPA (the doc is not great). OK, thanks for the info. This is already very useful. What exactly is the expected problem and how would I verify that I'm not getting hit by it? I want to follow as closely as possible the official debian/ubuntu packages. Ideally, any package produced on the PPA superseded by an official package (from 1.2.1-1~ppaN to 1.2.1-1) should be identical to the superseding package. Hardy default fortran ABI is g77, not gfortran, so I have to use g77 for hardy - and the ppa package, limited to intrepid, use gfortran (since intrepid ABI is gfortran's). In your case, you built a package which uses gfortran ABI on Hardy - it works, but is not really acceptable for an official package - and thus, when an upgrade from ppa to official happens, you won't get the same package. In the rpm world, you can use conditional on distribution version/type int the spec file (which is the control + changelog + rules in one file), but AFAIK, you can't do that with debian, or at least have not found the relevant doc. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] PyArray_SETITEM with object arrays in Cython
Wes McKinney wrote: The general problem here is an indexed array (by dates or strings, for example), that you want to conform to a new index. The arrays most of the time contain floats but occasionally PyObjects. For some reason the access and assignment is slow (this function can be faster by a factor of 50 with C API macros, so clearly something is awry)-- let me know if you see anything obviously wrong with this Thanks for the example; I don't have time this week but I recorded it on the Cython trac and will hopefully get back to it soon. http://trac.cython.org/cython_trac/ticket/209 Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Hi Brian, A Thursday 12 February 2009, Brian Granger escrigué: Hi, This is relevant for anyone who would like to speed up array based codes using threads. I have a simple loop that I have implemented using Cython: def backstep(np.ndarray opti, np.ndarray optf, int istart, int iend, double p, double q): cdef int j cdef double *pi cdef double *pf pi = double *opti.data pf = double *optf.data with nogil: for j in range(istart, iend): pf[j] = (p*pi[j+1] + q*pi[j]) I need to call this function *many* times and each time cannot be performed until the previous time is completely as there are data dependencies. But, I still want to parallelize a single call to this function across multiple cores (notice that I am releasing the GIL before I do the heavy lifting). I want to break my loop range(istart,iend) into pieces and have a thread do each piece. The arrays have sizes 10^3 to 10^5. Things I have tried: [clip] If your problem is evaluating vector expressions just like the above (i.e. without using transcendental functions like sin, exp, etc...), usually the bottleneck is on memory access, so using several threads is simply not going to help you achieving better performance, but rather the contrary (you have to deal with the additional thread overhead). So, frankly, I would not waste more time trying to paralelize that. As an example, in the recent support of VML in numexpr we have disabled the use of VML (as well as the OpenMP threading support that comes with it) in cases like yours, where only additions and multiplications are performed (these operations are very fast in modern processors, and the sole bottleneck for this case is the memory bandwidth, as I've said). However, in case of expressions containing operations like division or transcendental functions, then VML activates automatically, and you can make use of several cores if you want. So, if you are in this case, and you have access to Intel MKL (the library that contains VML), you may want to give numexpr a try. HTH, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages
David Cournapeau wrote: Andrew Straw wrote: Fernando Perez wrote: On Wed, Feb 11, 2009 at 6:17 PM, David Cournapeau courn...@gmail.com wrote: Unfortunately, it does require some work, because hardy uses g77 instead of gfortran, so the source package has to be different (once hardy is done, all the one below would be easy, though). I am not sure how to do that with PPA (the doc is not great). OK, thanks for the info. This is already very useful. What exactly is the expected problem and how would I verify that I'm not getting hit by it? I want to follow as closely as possible the official debian/ubuntu packages. Ideally, any package produced on the PPA superseded by an official package (from 1.2.1-1~ppaN to 1.2.1-1) should be identical to the superseding package. Hardy default fortran ABI is g77, not gfortran, so I have to use g77 for hardy - and the ppa package, limited to intrepid, use gfortran (since intrepid ABI is gfortran's). (Warning: this email is a little over-detailed on the packaging details front. Believe it or not, I'm not discussing the details of Debian packaging for fun, but rather my questions have practical importance to me -- I don't want to break all my lab's scipy installations. :) This doesn't make sense to me. I built the .deb on an a clean, minimal sbuild (a chroot with only a very few basic packages installed, somewhat mimicing Ubuntu's PPA builder). It built from your unmodified .dsc, which auto-downloads the declared dependencies (and nothing else). It passes the tests. To be very explicit -- I didn't specify to use g77 at any point. (As implied by my previous statement of using your unmodified .dsc, I used only the debian/rules and debian/control in your package.) To understand your statement about identical, I will operationally define identical for .debs to mean that they were built from the same .dsc. Of course, in the case you describe above, it can't be _exactly_ the same .dsc because the version numbers in debian/changelog must change and consequently so must the checksums and GPG signature in the .dsc file, and presumably a different person will sign it. Also, there will be timestamp differences and such for two .debs built from the exact same .dsc, but we can ignore those. In this case, I don't see why an official package, which meets this operational definition of identical, wouldn't work on Hardy, as it would be built from nearly an identical .dsc on nearly an identical clean build environment. (Of course, there will never be an official package of this for Hardy, but that's not the point.) In your case, you built a package which uses gfortran ABI on Hardy - it works, but is not really acceptable for an official package - and thus, when an upgrade from ppa to official happens, you won't get the same package. Why is it not really acceptable? As long as it builds and works and doesn't break anything, why would Ubuntu maintainers care if it uses the gfortran ABI? Also, in practical terms, what upgrade? A) Hardy will not upgrade python-scipy. It's against policy for a released distribution to upgrade software without a security reason. B) Imaging for a moment that there would be an upgrade, why do you think it would break? In the rpm world, you can use conditional on distribution version/type int the spec file (which is the control + changelog + rules in one file), but AFAIK, you can't do that with debian, or at least have not found the relevant doc. I don't understand what you're saying. My understanding is that at the beginning of each distribution (Hardy, Intrepid, Lenny), the maintainers decide on a C++ (and I guess Fortran, but I'm not sure) ABI and fix the toolchain to build this ABI. From then on, everything is built with this ABI. And the point of a signed .dsc file and a clean sbuild/pbuilder is that any .deb that gets built will be contingent on the files in debian/* because that's cryptographically signed in the .dsc file. So, if you trust the archive master and his computer (by trusting his keys in your apt keyring), you trust that the .deb was built from the .dsc. An the .dsc is signed by the maintainer. So there's a cryptographic chain of trust to those control + changelog + rules files. I'm still not sure if I'm not getting your worry, though... Thanks, Andrew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Brian Granger schrieb: I am curious: would you know what would be different in numpy's case compared to matlab array model concerning locks ? Matlab, up to recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3 (or 7.4), it also uses multicore for mathematical functions (cos, etc...). So at least in matlab's model, it looks like it can be useful. Good point. Is it possible to tell what array size it switches over to using multiple threads? Also, do you happen to iknow how Matlab is doing this? Recent Matlab versions use Intels Math Kernel Library, which performs automatic multi-threading - also for mathematical functions like sin etc, but not for addition, multiplication etc. It seems to me Matlab itself does not take care of multi-threading. On http://www.intel.com/software/products/mkl/data/vml/functions/_listfunc.html you can have a look at the performance data of the MKL vectorized math functions. Around a vector length between 100-1000, depending on which function, precision, cpu architecture, they switch to multi-threading. Gregor ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué: A quick digression: It would be interesting to see how a spec would look for integrating OpenMP natively into Cython for these kinds of purposes. Cython is still flexible as a language after all. That would be really nice indeed. Avoiding language bloat is also important, but it is difficult to know what kind of balance can be struck before some kind of spec is worked out. Has anyone managed to use OpenMP with Cython code in a nice way already? I couldn't do any work on it right now but it could sit in the pipeline for the year to come. Also I have a strong feeling that making the spec and language design issues would take more time than the actual implementation of it anyway. I tend to aggree with you. As a matter of fact, doing (efficient) parallelism is a very hairy thing and understanding all (or just most of them) of its issues maybe the real problem for doing the port (unless one can find some direct way to translate OpenMP directives from Cython code to the C code, which would be wonderful). -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages
Andrew Straw wrote: (Warning: this email is a little over-detailed on the packaging details front. Believe it or not, I'm not discussing the details of Debian packaging for fun, but rather my questions have practical importance to me -- I don't want to break all my lab's scipy installations. :) Well, the devil is in the details, specially for packaging :) This doesn't make sense to me. I built the .deb on an a clean, minimal sbuild (a chroot with only a very few basic packages installed, somewhat mimicing Ubuntu's PPA builder). It built from your unmodified .dsc, which auto-downloads the declared dependencies (and nothing else). It passes the tests. To be very explicit -- I didn't specify to use g77 at any point. (As implied by my previous statement of using your unmodified .dsc, I used only the debian/rules and debian/control in your package.) Yes: your package will be almost the same as mine. It will use gfortran, and not g77. There is nothing wrong with that per-se, but I don't think it is a good practice in general. Here is an example which can break things, to show that's not just red-herring: - install numpy from the PPA (gfortran ABI), pull gfortran as well - later, g77 is installed (as a dependency, whatever) - now, user build his own python extension with fortran extension, or for example scipy - uses g77, does not work anymore. My main rationale to provide PPA is to avoid the never-ending queue of emails about missing symbols, etc... because I am tired of it, because it gives a bad mouth taste to users, because I would like to deal with more interesting issues. To understand your statement about identical, I will operationally define identical for .debs to mean that they were built from the same .dsc. I has an even larger definition of identical: same control/rules/patches is identical for me. In this case, I don't see why an official package, which meets this operational definition of identical, wouldn't work on Hardy, as it would be built from nearly an identical .dsc on nearly an identical clean build environment. (Of course, there will never be an official package of this for Hardy, but that's not the point.) Because a package has to be well integrated with the rest of the system - I think my example above shows the problem of using a fortran compiler which has a different ABI than the default one. Why is it not really acceptable? As long as it builds and works and doesn't break anything, why would Ubuntu maintainers care if it uses the gfortran ABI? The problem is that it is impossible to avoid breaking anything without using the g77 ABI on hardy. I don't understand it is ok not to follow the ABI as long as it does not break anything : if there was no need to follow an ABI to avoid breaking anything, then there would be no point for an ABI in the first place :) Also, in practical terms, what upgrade? A) Hardy will not upgrade python-scipy. It's against policy for a released distribution to upgrade software without a security reason. First, concerning the upgrade, I am just following the Ubuntu guide. To quote it: Ubuntu package names are suffixed by the version number of the package. This allows Ubuntu to distinguish newer packages from older ones and so remain up to date. If you're creating an alternative version of a package already available in Ubuntu's repositories, you should ensure that: * your package supersedes the official Ubuntu version * future Ubuntu versions will supersede your package. As an example, I can think of backports. B) Imaging for a moment that there would be an upgrade, why do you think it would break? Well, if the upgraded package use the g77 ABI, and the old one the gfortran ABI, this is quite likely to break some extensions built against numpy/scipy I believe. In the rpm world, you can use conditional on distribution version/type int the spec file (which is the control + changelog + rules in one file), but AFAIK, you can't do that with debian, or at least have not found the relevant doc. I don't understand what you're saying. One solution would be able to say in the control file: for hardy, do this, for intrepid, do that. For example, with rpm packaging, I can do the following: %if 0%{?fedora_version} || 0%{?rhel_version} || 0%{?centos_version} BuildRequires: gcc-gfortran python python-devel lapack3-devel 3.1 refblas3-devel requires: gcc-gfortran lapack3 3.1 refblas3 %endif %if 0%{?suse_version} BuildRequires: gcc-fortran python python-devel lapack3-devel 3.1 refblas3-devel requires: gcc-fortran lapack3 3.1 refblas3 %endif Notice the different name for the dependency (not that this is a problem with Ubuntu/Debian, but I would still need different packages). And the point of a signed .dsc file and a clean sbuild/pbuilder is that any .deb that gets built will be contingent on the files in debian/* because that's cryptographically signed in the .dsc file.
Re: [Numpy-discussion] Fast threading solution thoughts
Francesc Alted wrote: A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué: A quick digression: It would be interesting to see how a spec would look for integrating OpenMP natively into Cython for these kinds of purposes. Cython is still flexible as a language after all. That would be really nice indeed. Avoiding language bloat is also important, but it is difficult to know what kind of balance can be struck before some kind of spec is worked out. Has anyone managed to use OpenMP with Cython code in a nice way already? I couldn't do any work on it right now but it could sit in the pipeline for the year to come. Also I have a strong feeling that making the spec and language design issues would take more time than the actual implementation of it anyway. I tend to aggree with you. As a matter of fact, doing (efficient) parallelism is a very hairy thing and understanding all (or just most of them) of its issues maybe the real problem for doing the port (unless one can find some direct way to translate OpenMP directives from Cython code to the C code, which would be wonderful). This is what I'm thinking. Example: This in Cython: with cython.openmp.parallell(locals=a,b): # implies nogil? for i in cython.openmp.range(0, 10): ... could simply translate into emitting the right #pragmas in generated C at the right locations. FYI, I am one of the core Cython developers and can make such modifications in Cython itself as long as there's consensus on how it should look on the Cython mailing list. My problem is that I don't really know OpenMP and have little experience with it, so I'm not the best person for creating a draft for how such high-level OpenMP constructs should look like in Cython. Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Gregor Thalhammer wrote: Recent Matlab versions use Intels Math Kernel Library, which performs automatic multi-threading - also for mathematical functions like sin etc, but not for addition, multiplication etc. It does if you have access to the parallel toolbox I mentioned earlier in this thread (again, no experience with it, but I think it is specially popular on clusters; in that case, though, it is not limited to thread-based implementation). David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages
OK, I think you're concerned about compatibility of Python extensions using fortran. We don't use any (that I know of), so I'm going to stop worrying about this and upload .debs from your .dsc (or very close) to my repository... ...except for one last question: If Hardy uses the g77 ABI but I'm building scipy with gfortran, shouldn't there be an ABI issue with my ATLAS? Shouldn't I get lots of test failures with scipy? I don't. David Cournapeau wrote: My main rationale to provide PPA is to avoid the never-ending queue of emails about missing symbols, etc... because I am tired of it, because it gives a bad mouth taste to users, because I would like to deal with more interesting issues. Well, I appreciate you doing that. Packaging is a thankless job... and when everything works on your own computer, it's hard to work up the motivation to make it work on others'. To understand your statement about identical, I will operationally define identical for .debs to mean that they were built from the same .dsc. I has an even larger definition of identical: same control/rules/patches is identical for me. This is the only other point I want to make: if you're building from the same .dsc, it means you're using the same control/rules/patches. (That's why I brought up the checksums and the signatures.) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-user] Numpy 1.2.1 and Scipy 0.7.0; Ubuntu packages
Andrew Straw wrote: OK, I think you're concerned about compatibility of Python extensions using fortran. We don't use any (that I know of), so I'm going to stop worrying about this and upload .debs from your .dsc (or very close) to my repository... ...except for one last question: If Hardy uses the g77 ABI but I'm building scipy with gfortran, shouldn't there be an ABI issue with my ATLAS? Shouldn't I get lots of test failures with scipy? I don't. Not for hardy, because Hardy has the transitional packages (e.g. it has both gfortran and g77 ABI packages: libatlas 3g* vs atlas3*). But this wouldn't work for earlier distributions. Well, I appreciate you doing that. Packaging is a thankless job... Reducing the number of support emails is enough for me as an incentive :) This is the only other point I want to make: if you're building from the same .dsc, it means you're using the same control/rules/patches. (That's why I brought up the checksums and the signatures.) Yes, the problem is not how to avoid different packages from the .dsc, but how to conditionally defines some sections in the control file and rule files, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numexpr and numpy windows binaries built with MKL
Hi all, as Francesc announced, the latest release of Numexpr 1.2 can be built with Intels Math Kernel Library, which gives a BIGbig increase in performance. Now the questions: Could somebody provide binaries for Windows of Numexpr, linked with Intels MKL? I know, there is the license problem. The same question has been discussed for numpy. Is it still true that Enthought is considering to provide binaries based on MKL, see https://svn.enthought.com/epd/ticket/216 ? https://svn.enthought.com/epd/ticket/216 Gregor ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On 2/12/2009 7:15 AM, David Cournapeau wrote: Since openmp also exists on windows, I doubt that it is required that openmp uses pthread :) On Windows, MSVC uses Win32 threads and GCC (Cygwin and MinGW) uses pthreads. If you use OpenMP with MinGW, the executable becomes dependent on pthreadGC2.dll (an LGPL library from Redhat). S.M. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On 2/12/2009 11:30 AM, Dag Sverre Seljebotn wrote: It would be interesting to see how a spec would look for integrating OpenMP natively into Cython for these kinds of purposes. Cython is still flexible as a language after all. Avoiding language bloat is also important, but it is difficult to know what kind of balance can be struck before some kind of spec is worked out. Has anyone managed to use OpenMP with Cython code in a nice way already? Here is an example of SciPy's ckdtree.pyx modified to use OpenMP. It is basically twice as fast on my dual-core laptop. But as Cython mangles the names in the C code, I had to put the OpenMP part in a separate C file. OpenMP does not need to be a aprt of the Cython language. It can be special comments in the code as in Fortran. After all, #pragma omp parallel is a comment in Cython. Sturla Molden /* * Parallel query for faster kd-tree searches on SMP computers. * This function will release the GIL around the lengthy query loop and * use OpenMP for multithreading. * * OpenMP is supported on GCC (since version 4.2) and MS Visual C++ 2005 * and later. Mingw builds of GCC supports OpenMP through a pthread * layer (pthreadGC2.dll). * * Compile with -fopenmp and link -lgomp -lpthread on GCC. * * Written by Sturla Molden, Univ. Oslo, Jan. 24 2009 * * * This code is released to public domain. * */ #include Python.h #include omp.h typedef void (*queryfun_t)(void *self, double *result_distances, int*result_indices, double*x, int k, double eps, double p, double distance_upper_bound); void parallel_query(queryfun_t query, int nthreads, int niter, void *self, double*result_distances, int *result_indices, double*x, int k, int m, double eps, double p, double distance_upper_bound) { int c; /* allow the number of threads to be specified */ if (nthreads == 0) { for (c=0; cniter; c++) { (*query)(self, result_distances + c*k, result_indices + c*k, x + c*m, k, eps, p, distance_upper_bound); } } else { /* parallel queries with GIL released */ Py_BEGIN_ALLOW_THREADS { #pragma omp parallel for private(c) schedule(guided) num_threads(nthreads) for (c=0; cniter; c++) { (*query)(self, result_distances + c*k, result_indices + c*k, x + c*m, k, eps, p, distance_upper_bound); } } Py_END_ALLOW_THREADS } } ckdtree.pyx Description: / ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué: FYI, I am one of the core Cython developers and can make such modifications in Cython itself as long as there's consensus on how it should look on the Cython mailing list. My problem is that I don't really know OpenMP and have little experience with it, so I'm not the best person for creating a draft for how such high-level OpenMP constructs should look like in Cython. I don't know OpenMP enough neither, but I'd say that in this list there could be some people that could help. At any rate, I really like the OpenMP approach and prefer to have support for it in Cython much better than threading, MPI or whatever. But the thing is: is OpenMP stable, mature enough for allow using it in most of common platforms? I think that recent GCC compilers support the latest incarnation (3.0) of the standard, but IIRC, MSVC 2008 only supports OpenMP 2.5 specification. I'm not sure how this would affect a possible implementation in Cython, tough. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
A Thursday 12 February 2009, Sturla Molden escrigué: OpenMP does not need to be a aprt of the Cython language. It can be special comments in the code as in Fortran. After all, #pragma omp parallel is a comment in Cython. Hey! That's very nice to know. We already have OpenMP support in Cython for free (or apparently it seems so :-) -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On 2/12/2009 12:20 PM, David Cournapeau wrote: It does if you have access to the parallel toolbox I mentioned earlier in this thread (again, no experience with it, but I think it is specially popular on clusters; in that case, though, it is not limited to thread-based implementation). As has been mentioned, Matlab is a safer language for parallel computing as arrays are immutable. There is almost no need for synchronization of any sort, except barriers. Maybe it is time to implement an immutable ndarray subclass? With immutable arrays we can also avoid making temporary arrays in expressions like y = a*b + c. y just gets an expression and three immutable buffers. And then numexpr (or something like it) can take care of the rest. As for Matlab, I have noticed that they are experimenting with CUDA now, to use nvidia's processors for hardware-acceleration. As even modest GPUs can yield hundreds of gigaflops, that is going to be hard to match (unless we make an ndarray that uses the GPU). But again, as the performance of GPUs comes from massive multithreading, immutability may be the key here as well. Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Sturla Molden wrote: On 2/12/2009 12:20 PM, David Cournapeau wrote: It does if you have access to the parallel toolbox I mentioned earlier in this thread (again, no experience with it, but I think it is specially popular on clusters; in that case, though, it is not limited to thread-based implementation). As has been mentioned, Matlab is a safer language for parallel computing as arrays are immutable. There is almost no need for synchronization of any sort, except barriers. Maybe it is time to implement an immutable ndarray subclass? Is it possible at all ? I mean immutable array as a subclass of a mutable array ? I too would like some immutable arrays but this sounds like an enormous task (as much as I like numpy, I find numpy sometimes harder to use than matlab for some problems because of the immutability - may be my own limitation, though), As for Matlab, I have noticed that they are experimenting with CUDA now, to use nvidia's processors for hardware-acceleration. Labview too; there was a small boost from NI at last ICASSP, and they were demoing some GPU based operations (mostly blas/lapack/fft from what I understood, the easy stuff). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Francesc Alted wrote: I don't know OpenMP enough neither, but I'd say that in this list there could be some people that could help. At any rate, I really like the OpenMP approach and prefer to have support for it in Cython much better than threading, MPI or whatever. But the thing is: is OpenMP stable, mature enough for allow using it in most of common platforms? I think that recent GCC compilers support the latest incarnation (3.0) of the standard The yet unreleased gcc 4.4 will supports 3.0, and gcc 4.3 supports a subset of 3.0, I believe: http://openmp.org/wp/2008/11/openmp-30-status/ cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Sturla Molden wrote: On 2/12/2009 12:20 PM, David Cournapeau wrote: Hi, It does if you have access to the parallel toolbox I mentioned earlier in this thread (again, no experience with it, but I think it is specially popular on clusters; in that case, though, it is not limited to thread-based implementation). As has been mentioned, Matlab is a safer language for parallel computing as arrays are immutable. There is almost no need for synchronization of any sort, except barriers. Maybe it is time to implement an immutable ndarray subclass? With immutable arrays we can also avoid making temporary arrays in expressions like y = a*b + c. y just gets an expression and three immutable buffers. And then numexpr (or something like it) can take care of the rest. As for Matlab, I have noticed that they are experimenting with CUDA now, to use nvidia's processors for hardware-acceleration. As even modest GPUs can yield hundreds of gigaflops, No even close. The current generation peaks at around 1.2 TFlops single precision, 280 GFlops double precision for ATI's hardware. The main problem with those numbers is that the memory on the graphics card cannot feed the data fast enough into the GPU to achieve theoretical peak. So those hundreds of GFlops are pure marketing :) So in reality you might get anywhere from 20% to 60% (if you are lucky) locally before accounting for transfers from main memory to GPU memory and so on. Given that recent Intel CPUs give you about 7 to 11 Glops Double per core and libraries like ATLAS give you that performance today without the need to jump through hoops these number start to look a lot less impressive. And Nvidia's number are lower than ATI's. NVidia's programming solution is much more advanced and rounded out compared to ATi's which is largely in closed beta. OpenCL is mostly vaporware at this point. that is going to be hard to match (unless we make an ndarray that uses the GPU). But again, as the performance of GPUs comes from massive multithreading, immutability may be the key here as well. I have a K10 system with two Tesla C1060 GPUs to play with and have thought about adding CUDABlas support to Numpy/Scipy, but it hasn't been a priority for me. My main interest here is finite field arithmetic by making FFPack via LinBox use CUDABlas. If anyone wants an account to make numpy/scipy optionally use CUDABlas feel free to ping me off list and I can set you up. Sturla Molden Cheers, Michael ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Sturla Molden wrote: On 2/12/2009 1:50 PM, Francesc Alted wrote: Hey! That's very nice to know. We already have OpenMP support in Cython for free (or apparently it seems so :-) Not we don't, as variable names are different in C and Cython. But adding support for OpenMP would not bloat the Cython language. What I meant was bloat the Cython compiler, i.e. Cython has to know about OpenMP and there must be OpenMP-specific code in the compiler. The grammar etc. wouldn't change. But I think I'll be able to lobby it in, given a good spec and some time to implement it (about a day of work, so I think this is likely to happen within a year). As Cython must understand what is going on anyway, having things like with cython.openmp.parallell is actually a lot easier to implement in Cython than using comments/pragmas, and looks nicer too IMO. (It means that only code generation and not the parser must be modified.) Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
I am curious: would you know what would be different in numpy's case compared to matlab array model concerning locks ? Matlab, up to recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3 (or 7.4), it also uses multicore for mathematical functions (cos, etc...). So at least in matlab's model, it looks like it can be useful. I understand that numpy's model is more flexible (I don't think strided arrays can overlap in matlab for example, at least not from what you can see from the public API). cheers, And the 'more flexible' is one of the biggest drawback. It's one of the reasons Fortran is so fast compared to C: you can't have pointer aliases, so you will be slower. With Numpy, it is the same. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Yes, it is. You have to link against pthread (at least with Linux ;)) You have to write a single parallel region if you don't want this overhead (which is not possible with Python). Matthieu 2009/2/12 Gael Varoquaux gael.varoqu...@normalesup.org: On Wed, Feb 11, 2009 at 11:52:40PM -0600, Robert Kern wrote: This seem like pretty heavy solutions though. From a programmer's perspective, it seems to me like OpenMP is a muck lighter weight solution than pthreads. From a programmer's perspective, because, IMHO, openmp is implemented using pthreads. I do have difficulties verifying this on the web, but documents I find hint to that. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
2009/2/12 Sturla Molden stu...@molden.no: On 2/12/2009 1:50 PM, Francesc Alted wrote: Hey! That's very nice to know. We already have OpenMP support in Cython for free (or apparently it seems so :-) Not we don't, as variable names are different in C and Cython. But adding support for OpenMP would not bloat the Cython language. Cython must exhange the variable names and leave the comment in C as a pragma to the C compiler. IMHO, OpenMP is the easiest way to use parallel computing in scientific projects. It is much easier than manually fiddling with threads, processes, MPI, etc. Just write code as you normally would, debug and verify. Then you spend five minutes inserting pragmas to the compiler, and et voilá you have a parallel program. The same code will then compile and run correctly for parallel or sequential execution, depending on a compiler switch (-fopenmp). You get load balancing for free, as that is built into OpenMP. OpenMPs API is so small that it just takes 10 minutes to learn. OpenMP currently works on SMPs (e.g. multicore CPUs), but there is work going on to port it to clusters as well. In this matter, I think OpenMP will never be used. Besides, OpenMP doesn't even support an unsigned int for a parallel loop variable. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Matthieu Brucher wrote: No - I have never seen deep explanation of the matlab model. The C api is so small that it is hard to deduce anything from it (except that the memory handling is not ref-counting-based, I don't know if it matters for our discussion of speeding up ufunc). I would guess that since two arrays cannot share data (COW-based), lock handling may be easier to deal with ? I am not really familiar with multi-thread programming (my only limited experience is for soft real-time programming for audio processing, where the issues are totally different, since latency matters as much if not more than throughput). It's not even a matter of multithread programming, in mono-core programming, the same issue can arise. Which issue ? David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Outer join ?
You might consider the groupby from the itertools module. Do you have two keys only? I would prefer grouping on the first column. For grouby you need to sort the array after the first column then. from itertools import groupby a.sort(order='col1') # target array: first col are unique dates, second col values for key1, third col values for key2 data = numpy.zeros(len(unique(a['col1'])), dtype=dict(names=['dates', 'key1', 'key2'] , types=[long, float, float])) for i, (date, items) in enumerate(groupby(a, lambda item: item ['col1'])): data[i][dates] = date for col1, col2, col3 in items: data[i][col2] = col3 Hope this works! Bernhard On Feb 12, 6:24 am, A B python6...@gmail.com wrote: Hi, I have the following data structure: col1 | col2 | col3 20080101|key1|4 20080201|key1|6 20080301|key1|5 20080301|key2|3.4 20080601|key2|5.6 For each key in the second column, I would like to create an array where for all unique values in the first column, there will be either a value or zero if there is no data available. Like so: # 20080101, 20080201, 20080301, 20080601 key1 - 4, 6, 5, 0 key2 - 0, 0, 3.4, 5.6 Ideally, the results would end up in a 2d array. What's the most efficient way to accomplish this? Currently, I am getting a list of uniq col1 and col2 values into separate variables, then looping through each unique value in col2 a = loadtxt(...) dates = unique(a[:]['col1']) keys = unique(a[:]['col2']) for key in keys: b = a[where(a[:]['col2'] == key)] ??? Thanks in advance. ___ Numpy-discussion mailing list numpy-discuss...@scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote: FYI, I am one of the core Cython developers and can make such modifications in Cython itself as long as there's consensus on how it should look on the Cython mailing list. My problem is that I don't really know OpenMP and have little experience with it, so I'm not the best person for creating a draft for how such high-level OpenMP constructs should look like in Cython. I don't know the Cython internals, but I do know OpenMP. I mostly use it with Fortran. The question is: Should OpenMP be comments in the Cython code (as they are in C and Fortran), or should OpenMP be special objects? As for the GIL: No I don't think nogil should be implied. But Python objects should only be allowed as shared variables. Synchronization will then be as usual for shared variables in OpenMP (#pragma omp critical). Here is my suggestion for syntax. If you just follow a consistent translation scheme, you don't need to know OpenMP in details. Here is a suggestion: with openmp('parallel for', argument=iterable, ...): -- insert pragma directly above for with openmp(directive, argument=iterable, ...): -- insert pragma and brackets with openmp('atomic'): -- insert pragma directly openmp('barrier') -- insert pragma directly This by the way covers all of OpenMP. This is how it should translate: with openmp('parallel for', private=(i,), shared=(n,), schedule='dynamic'): for i in range(n): pass Compiles to: #pragma omp parallel for \ private(i) \ shared(n) \ schedule(dynamic) for(i=0; in; i++) { /* whatever */ } with openmp('parallel sections', reduction=('+',k), private=(i,j)): with openmp('section'): i = foobar() with openmp('section'): j = foobar() k = i + j Compiles to: #pragma omp parallel sections\ reduction(+:k)\ private(i,j) { #pragma omp section { i = foobar(); } #pragma omp section { j = foobar(); } k = i+j; } With Python objects, the programmer must synchronize access: with openmp('parallel for', shared=(pyobj,n), private=(i,)): for i in range(n): with openmp('critical'): pyobj += i #pragma omp parallel for \ shared(pyobj,n) \ private(i) for (i=0; in; i++) { #pragma omp critical { pyobj += i; } } Atomic and barriers: with openmp('atomic'): i += j #pragma omp atomic i += j; with openmp('parallel for', default='private', shared(n,)): for i in range(n): openmp('barrier') #pragma omp parallel for \ default(private)\ shared(n)\ for (i=0; in; i++) { #pragma omp barrier } That is my suggestion. Easy to implement as you don't need to learn OpenMP first (not that it is difficult). Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Matthieu Brucher wrote: Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;) In C, if you have to arrays (two pointers), the compiler can't make aggressive optimizations because they may intersect. With Fortran, this is not possible. In this matter, Numpy behaves like C (everyone heard about the different a[indices] += a[other_intersecting_indices] issues), Matlab is more like Fortran. I think it is hard to know exactly, because we don't know how matlab is implemented. It is possible to handle special cases for non overlapping arrays in numpy, once you are in C; I believe there are many codepath which we could optimize aggressively, using openMP, SSE, etc... It is just a lot of work to do so manually, so the real problem is how this can be handled as generally as possible using a small core. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On 2/12/2009 1:44 PM, Sturla Molden wrote: Here is an example of SciPy's ckdtree.pyx modified to use OpenMP. It seems I managed to post an errorneous C file. :( S.M. /* * Parallel query for faster kd-tree searches on SMP computers. * This function will release the GIL around the lengthy query loop and * use OpenMP for multithreading. * * OpenMP is supported on GCC (since version 4.2) and MS Visual C++ 2005 * and later. Mingw builds of GCC supports OpenMP through a pthread * layer (pthreadGC2.dll). * * Compile with -fopenmp and link -lgomp -lpthread on GCC. * * Written by Sturla Molden, Univ. Oslo, Jan. 24 2009 * * * This code is released to public domain. * */ #include Python.h #include omp.h typedef void (*queryfun_t)(void *self, double *result_distances, int*result_indices, double*x, int k, double eps, double p, double distance_upper_bound); void parallel_query(queryfun_t query, int nthreads, int niter, void *self, double*result_distances, int *result_indices, double*x, int k, int m, double eps, double p, double distance_upper_bound) { int c; /* allow the number of threads to be specified */ if (nthreads == 0) { Py_BEGIN_ALLOW_THREADS { #pragma omp parallel for private(c) schedule(guided) for (c=0; cniter; c++) { (*query)(self, result_distances + c*k, result_indices + c*k, x + c*m, k, eps, p, distance_upper_bound); } } Py_END_ALLOW_THREADS } else { /* parallel queries with GIL released */ Py_BEGIN_ALLOW_THREADS { #pragma omp parallel for private(c) schedule(guided) num_threads(nthreads) for (c=0; cniter; c++) { (*query)(self, result_distances + c*k, result_indices + c*k, x + c*m, k, eps, p, distance_upper_bound); } } Py_END_ALLOW_THREADS } } ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
David Cournapeau wrote: Matthieu Brucher wrote: For BLAS level 3, the MKL is parallelized (so matrix multiplication is). Hi David, Same for ATLAS: thread support is one focus in the 3.9 serie, currently in development. ATLAS has had thread support for a long, long time. The 3.9 series just improves it substantially by using affinity when available and removes some long standing issues with allocation performance that one had to work around before by setting some defines at compile time. I have never used it, I don't know how it compare to the MKL, It does compare quite well and is more or less on par with the latest MKL releases in the 3.9 series. 3.8.2 is maybe 10% to 15% slower on i7 as well as Core2 cores than the MKL. On big advantage of ATLAS is that it tends to work when using it with numpy/scipy unlike the Intel MKL where one has to work around a bunch of oddities and jump through hoops to get it to work. It seems that Intel must rename at least one library in each release of the MKL to keep build system maintainers occupied :) The big disadvantage of ATLAS is that Windows support is currently limited to 32 bits, but 3.9.5 and higher have SFU/SUA support, so 64 bit support is possible. Clint told me the main issue here was lack of access and it isn't too high on his priority list. David Cheers, Michael ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
2009/2/12 David Cournapeau da...@ar.media.kyoto-u.ac.jp: Matthieu Brucher wrote: No - I have never seen deep explanation of the matlab model. The C api is so small that it is hard to deduce anything from it (except that the memory handling is not ref-counting-based, I don't know if it matters for our discussion of speeding up ufunc). I would guess that since two arrays cannot share data (COW-based), lock handling may be easier to deal with ? I am not really familiar with multi-thread programming (my only limited experience is for soft real-time programming for audio processing, where the issues are totally different, since latency matters as much if not more than throughput). It's not even a matter of multithread programming, in mono-core programming, the same issue can arise. Which issue ? Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;) In C, if you have to arrays (two pointers), the compiler can't make aggressive optimizations because they may intersect. With Fortran, this is not possible. In this matter, Numpy behaves like C (everyone heard about the different a[indices] += a[other_intersecting_indices] issues), Matlab is more like Fortran. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
2009/2/12 David Cournapeau da...@ar.media.kyoto-u.ac.jp: Matthieu Brucher wrote: Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;) In C, if you have to arrays (two pointers), the compiler can't make aggressive optimizations because they may intersect. With Fortran, this is not possible. In this matter, Numpy behaves like C (everyone heard about the different a[indices] += a[other_intersecting_indices] issues), Matlab is more like Fortran. I think it is hard to know exactly, because we don't know how matlab is implemented. It is possible to handle special cases for non overlapping arrays in numpy, once you are in C; I believe there are many codepath which we could optimize aggressively, using openMP, SSE, etc... It is just a lot of work to do so manually, so the real problem is how this can be handled as generally as possible using a small core. David Indeed, Matlab may not benefit from this. At least, COW makes it possible to optimize agressively. Then, it's a matter of implementing the loops in C or Fortran (or to use C99 extensions and rely on the compiler). In C89, you will have absolutely no benefit (because there are no way you can tell the compiler that there is no aliasing), in Fortran, it will be optimized correctly. The problem with optimizing codepath is that you have to detect them. On the other hand, if you implement a numpy compliant array that explicitely forbids aliasing (you could add some additional tests to ensure that the parents are different, or stuff like that), you can make the compiler optimize the code better. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Outer join ?
This is probably more than I need but I will definitely keep it as reference. Thank you. On 2/12/09, bernhard.vo...@gmail.com bernhard.vo...@gmail.com wrote: You might consider the groupby from the itertools module. Do you have two keys only? I would prefer grouping on the first column. For grouby you need to sort the array after the first column then. from itertools import groupby a.sort(order='col1') # target array: first col are unique dates, second col values for key1, third col values for key2 data = numpy.zeros(len(unique(a['col1'])), dtype=dict(names=['dates', 'key1', 'key2'] , types=[long, float, float])) for i, (date, items) in enumerate(groupby(a, lambda item: item ['col1'])): data[i][dates] = date for col1, col2, col3 in items: data[i][col2] = col3 Hope this works! Bernhard On Feb 12, 6:24 am, A B python6...@gmail.com wrote: Hi, I have the following data structure: col1 | col2 | col3 20080101|key1|4 20080201|key1|6 20080301|key1|5 20080301|key2|3.4 20080601|key2|5.6 For each key in the second column, I would like to create an array where for all unique values in the first column, there will be either a value or zero if there is no data available. Like so: # 20080101, 20080201, 20080301, 20080601 key1 - 4, 6, 5,0 key2 - 0, 0, 3.4, 5.6 Ideally, the results would end up in a 2d array. What's the most efficient way to accomplish this? Currently, I am getting a list of uniq col1 and col2 values into separate variables, then looping through each unique value in col2 a = loadtxt(...) dates = unique(a[:]['col1']) keys = unique(a[:]['col2']) for key in keys: b = a[where(a[:]['col2'] == key)] ??? Thanks in advance. __ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Outer join ?
On 2/11/09, Robert Kern robert.k...@gmail.com wrote: On Wed, Feb 11, 2009 at 23:24, A B python6...@gmail.com wrote: Hi, I have the following data structure: col1 | col2 | col3 20080101|key1|4 20080201|key1|6 20080301|key1|5 20080301|key2|3.4 20080601|key2|5.6 For each key in the second column, I would like to create an array where for all unique values in the first column, there will be either a value or zero if there is no data available. Like so: # 20080101, 20080201, 20080301, 20080601 key1 - 4, 6, 5,0 key2 - 0, 0, 3.4, 5.6 Ideally, the results would end up in a 2d array. What's the most efficient way to accomplish this? Currently, I am getting a list of uniq col1 and col2 values into separate variables, then looping through each unique value in col2 a = loadtxt(...) dates = unique(a[:]['col1']) keys = unique(a[:]['col2']) for key in keys: b = a[where(a[:]['col2'] == key)] ??? Take a look at setmember1d(). -- Robert Kern Thanks. That's exactly what I need, but I'm not sure about the next step after I do setmember1d(dates, b['date']) and have the bool arr/mask ... How can I grow b to have 0 values for the missing keys? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On Thu, Feb 12, 2009 at 03:27:51PM +0100, Sturla Molden wrote: The question is: Should OpenMP be comments in the Cython code (as they are in C and Fortran), or should OpenMP be special objects? My two cents: go for cython objects/statements. Not only does code in comments looks weird and a hack, but also it means to you have to hack the parser. Finally cython can be used in a valid Python syntax (at least part of it) thanks to decorators giving the types. Ondrej Certik implemented to for sympy. Having pragmas in comments would make it hard to use the vanilla Python parser to read this code, or to do metaprogramming on the pragmas. The logic of Python is pretty much that everything is accessible to the programmer, to be modified, eg for testing. Putting logics in comments breaks this. I like the use of the with statement for this purpose. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genloadtxt: dtype=None and unpack=True
On Wed, Feb 11, 2009 at 10:47 PM, Pierre GM pgmdevl...@gmail.com wrote: On Feb 11, 2009, at 11:38 PM, Ryan May wrote: Pierre, I noticed that using dtype=None with a heterogeneous set of data, trying to use unpack=True to get the columns into separate arrays (instead of a structured array) doesn't work. I've attached a patch that, in the case of dtype=None, unpacks the fields in the final array into a list of separate arrays. Does this seem like a good idea to you? Nope, as it breaks consistency: depending on some input parameters, you either get an array or a list. I think it's better to leave it as it is, maybe adding an extra line in the doc precising that unpack=True doesn't do anything for structured arrays. Ah, I hadn't thought of that. I was only thinking in terms of the behavior of unpacking on return, not in the actual returned object. You're right, it's a bad idea. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?
On Fri, 6 Feb 2009 18:18:44 -0500 Michael S. Gilbert wrote: BTW, there is a 64-bit version of the reference mersenne twister implementation available [1]. I did some testing with this 64-bit implementation (mt19937-64). I've found that it is actually slower than the 32-bit reference (mt19937ar) on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is likely because it generates 64-bit long long ints instead of 32-bit long ints. However, it should be possible to break up each 64-bit int into two 32-bit ints, then the runtime would appear to be almost twice as fast. One other consideration to keep in mind is that the 64-bit version is not stream-compatible with the 32-bit implementation (you will get different sequences for the same input seed). Would it be worth it to implement this in numpy in order to get an almost 2x speedup on 64-bit machines? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
On 2/12/2009 5:24 PM, Gael Varoquaux wrote: My two cents: go for cython objects/statements. Not only does code in comments looks weird and a hack, but also it means to you have to hack the parser. I agree with this. Particularly because Cython uses intendation as syntax. With comments you would have to use 'end' tags like in Fortran: !$omp parallel do private(i) shared(n) do i = 1,n !$omp parallel do private(j) shared(i,n) do j = i, n ! whatever end do !$omp end parallel do end do !$omp end parallel do But this is Fortran. It is meant to look bad. :) Finally cython can be used in a valid Python syntax (at least part of it) thanks to decorators giving the types. I don't think OpenMP will ever be used from pure Python. But please, add a -openmp compiler swich in Cython. If it is not there, all openmp statements should be ignored and translate to nothing. S.M. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Integer cast problems
Hi there, I have a little problem here with array indexing, hope you see the problem. I use the following loop to calculate some integrals import numpy as N from scipy.integrate import quad T = 1 dt = 0.005 L = 3 n = 2 ints = N.zeros([T/dt]) for t in N.arange(0, T, dt): a = quad(lambda x:-1*(1-4*(t**4))*N.exp(-t**4)*N.exp(-x**2)*N.cos(n*N.pi*(x-L)/(2*L)), -L, L)[0] ints[int(t/dt)] = a print t, N.int32(t/dt), t/dt, a, ints[int(t/dt)] The output from the print statement looks like: 0.14 28 28.0 2.52124867251e-16 2.52124867251e-16 0.145 28 29.0 2.03015199575e-16 2.03015199575e-16 0.15 30 30.0 2.40857836418e-16 2.40857836418e-16 0.155 31 31.0 2.52191011339e-16 2.52191011339e-16 The same happens on the ipython prompt: 0.145 * 0.005 = 28.996 N.int32(0.145 * 0.005) = 28 Any ideas how to deal with this? Cheers, Ralph ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integer cast problems
On Thu, Feb 12, 2009 at 11:21 AM, Ralph Kube ralphk...@googlemail.comwrote: Hi there, I have a little problem here with array indexing, hope you see the problem. I use the following loop to calculate some integrals import numpy as N from scipy.integrate import quad T = 1 dt = 0.005 L = 3 n = 2 ints = N.zeros([T/dt]) for t in N.arange(0, T, dt): a = quad(lambda x:-1*(1-4*(t**4))*N.exp(-t**4)*N.exp(-x**2)*N.cos(n*N.pi*(x-L)/(2*L)), -L, L)[0] ints[int(t/dt)] = a print t, N.int32(t/dt), t/dt, a, ints[int(t/dt)] The output from the print statement looks like: 0.14 28 28.0 2.52124867251e-16 2.52124867251e-16 0.145 28 29.0 2.03015199575e-16 2.03015199575e-16 0.15 30 30.0 2.40857836418e-16 2.40857836418e-16 0.155 31 31.0 2.52191011339e-16 2.52191011339e-16 The same happens on the ipython prompt: 0.145 * 0.005 = 28.996 N.int32(0.145 * 0.005) = 28 Any ideas how to deal with this? I'm assuming you mean 0.145 / 0.005 = 28.996 When you cast to an integer, it *truncates* the fractional part, and life with floating point says that what should be an exact result won't necessarily be exact. Try using N.around. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integer cast problems
Ralph Kube schrieb: Hi there, I have a little problem here with array indexing, hope you see the problem. I use the following loop to calculate some integrals ... 0.145 * 0.005 = 28.996 N.int32(0.145 * 0.005) = 28 conversion to int truncates, it doesn't round. Try N.int32(0.145 * 0.005 + 0.5) Gregor ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integer cast problems
On Thu, Feb 12, 2009 at 9:21 AM, Ralph Kube ralphk...@googlemail.com wrote: The same happens on the ipython prompt: 0.145 * 0.005 = 28.996 N.int32(0.145 * 0.005) = 28 Any ideas how to deal with this? Do you want the answer to be 29? N.int32 truncates. If you want to round instead, you could use that standard trick of adding 0.5: np.int32(0.5 + 0.145 / 0.005) 29 or np.round(0.145 / 0.005) 29.0 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?
On Thu, Feb 12, 2009 at 10:58, Michael S. Gilbert michael.s.gilb...@gmail.com wrote: On Fri, 6 Feb 2009 18:18:44 -0500 Michael S. Gilbert wrote: BTW, there is a 64-bit version of the reference mersenne twister implementation available [1]. I did some testing with this 64-bit implementation (mt19937-64). I've found that it is actually slower than the 32-bit reference (mt19937ar) on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is likely because it generates 64-bit long long ints instead of 32-bit long ints. However, it should be possible to break up each 64-bit int into two 32-bit ints, then the runtime would appear to be almost twice as fast. Why do you think that? One other consideration to keep in mind is that the 64-bit version is not stream-compatible with the 32-bit implementation (you will get different sequences for the same input seed). Would it be worth it to implement this in numpy in order to get an almost 2x speedup on 64-bit machines? The incompatibility is a showstopper to replacing the PRNG on any platform. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Sturla Molden wrote: On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote: FYI, I am one of the core Cython developers and can make such modifications in Cython itself as long as there's consensus on how it should look on the Cython mailing list. My problem is that I don't really know OpenMP and have little experience with it, so I'm not the best person for creating a draft for how such high-level OpenMP constructs should look like in Cython. I don't know the Cython internals, but I do know OpenMP. I mostly use it with Fortran. The question is: Should OpenMP be comments in the Cython code (as they are in C and Fortran), or should OpenMP be special objects? Statements more or less like C and Fortran, but disguised as Python syntax rather than pragmas/comments. As for the GIL: No I don't think nogil should be implied. But Python objects should only be allowed as shared variables. Synchronization will then be as usual for shared variables in OpenMP (#pragma omp critical). Good point. Here is my suggestion for syntax. If you just follow a consistent translation scheme, you don't need to know OpenMP in details. Here is a suggestion: with openmp('parallel for', argument=iterable, ...): -- insert pragma directly above for with openmp(directive, argument=iterable, ...): -- insert pragma and brackets with openmp('atomic'): -- insert pragma directly openmp('barrier') -- insert pragma directly This by the way covers all of OpenMP. This is how it should translate: with openmp('parallel for', private=(i,), shared=(n,), IMO there's a problem with using literal variable names here, because Python syntax implies that the value is passed. One shouldn't make syntax where private=(i,) is legal but private=(f(),) isn't. So I'd go for strings. schedule='dynamic'): for i in range(n): pass Compiles to: #pragma omp parallel for \ private(i) \ shared(n) \ schedule(dynamic) for(i=0; in; i++) { /* whatever */ } Hmm... yes. Care would need to be taken though because Cython might in the future very well generate a while loop instead for such a statement under some circumstances, and that won't work with OpenMP. One should be careful with assuming what the C result will be of Cython code. That's why I proposed using an alternative construct which both does the OpenMP stuff and contains the for loop. With Python objects, the programmer must synchronize access: with openmp('parallel for', shared=(pyobj,n), private=(i,)): for i in range(n): with openmp('critical'): pyobj += i We already have the nogil stuff, so with e.g. with openmp.critical: instead we would leave the way open for checking correctness in this area. Anyway, thanks for the input. I've made it a ticket and might get back to it when I've got more time through a discussion on the Cython list. http://trac.cython.org/cython_trac/ticket/211 -- Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Dag Sverre Seljebotn wrote: Hmm... yes. Care would need to be taken though because Cython might in the future very well generate a while loop instead for such a statement under some circumstances, and that won't work with OpenMP. One should be careful with assuming what the C result will be of Cython code. That's why I proposed using an alternative construct which both does the OpenMP stuff and contains the for loop. As a matter of fact, the next Cython release might prepend all for-in-range-loops with an if-test under some circumstances, in order to better match Python looping semantics (if the loop isn't executed, the loop counter should never be written to -- in contrast with C). So this is already happening. OpenMP might need different loop semantics and so calls for a different construct. -- Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
If your problem is evaluating vector expressions just like the above (i.e. without using transcendental functions like sin, exp, etc...), usually the bottleneck is on memory access, so using several threads is simply not going to help you achieving better performance, but rather the contrary (you have to deal with the additional thread overhead). So, frankly, I would not waste more time trying to paralelize that. I had a feeling this would be the case, I just haven't been sure about what point this comes into play. I really need to do some tests to understand exactly how CPU load and memory bandwidth interplay in these situations. I have worked with GPUs before and often the reason the GPU is faster than the CPU is simply the higher memory bandwidth. As an example, in the recent support of VML in numexpr we have disabled the use of VML (as well as the OpenMP threading support that comes with it) in cases like yours, where only additions and multiplications are performed (these operations are very fast in modern processors, and the sole bottleneck for this case is the memory bandwidth, as I've said). However, in case of expressions containing operations like division or transcendental functions, then VML activates automatically, and you can make use of several cores if you want. So, if you are in this case, and you have access to Intel MKL (the library that contains VML), you may want to give numexpr a try. OK, this is very interesting indeed. I didn't know that numexpr has support for VML, which has openmp support. I will definitely have look at this. Thanks! Brian HTH, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Recent Matlab versions use Intels Math Kernel Library, which performs automatic multi-threading - also for mathematical functions like sin etc, but not for addition, multiplication etc. It seems to me Matlab itself does not take care of multi-threading. On http://www.intel.com/software/products/mkl/data/vml/functions/_listfunc.html you can have a look at the performance data of the MKL vectorized math functions. Around a vector length between 100-1000, depending on which function, precision, cpu architecture, they switch to multi-threading. Fantastic, thanks for this pointer! Brian ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
At any rate, I really like the OpenMP approach and prefer to have support for it in Cython much better than threading, MPI or whatever. But the thing is: is OpenMP stable, mature enough for allow using it in most of common platforms? I think that recent GCC compilers support the latest incarnation (3.0) of the standard, but IIRC, MSVC 2008 only supports OpenMP 2.5 specification. I'm not sure how this would affect a possible implementation in Cython, tough. OpenMP has been around for a long time. But in other ways, I don't consider it to be completely stable and mature - mainly because of the fact that there haven't been free compilers (like gcc) that support OpenMP for very long. For example, I am running OS X Leopard, which is back at gcc 4.0.1 - a good ways previous to the OpenMP support. But, this will hopefully change in the next few years as everyone moves to more recent versions of gcc. But, Cython could simply do a test of the compiler to see if it supports OpenMP and what version it supports. Brian ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?
On Thu, 12 Feb 2009 13:18:26 -0600 Robert Kern wrote: I did some testing with this 64-bit implementation (mt19937-64). I've found that it is actually slower than the 32-bit reference (mt19937ar) on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is likely because it generates 64-bit long long ints instead of 32-bit long ints. However, it should be possible to break up each 64-bit int into two 32-bit ints, then the runtime would appear to be almost twice as fast. Why do you think that? You could also think of it the other way (in terms of generating 64-bit ints). Instead of generating two 32-bit rints and concatenating them for a 64-bit int, you can just directly generate the 64-bit int. Since the 64-bit int requires only slightly more time to generate than either of the 32-bit ints individually, an almost 2x speedup is achieved. One other consideration to keep in mind is that the 64-bit version is not stream-compatible with the 32-bit implementation (you will get different sequences for the same input seed). Would it be worth it to implement this in numpy in order to get an almost 2x speedup on 64-bit machines? The incompatibility is a showstopper to replacing the PRNG on any platform. Why is stream-compatibility such a stringent requirement? It seems like this contstraint majorly limits your ability to adopt new/better technologies and reduces your flexibility to make changes to your code as needed. What end-use applications require stream-compatibility? The only thing I can think of is verification/regression testing for numpy, but those test cases could be updated to account for the break in compatibility (and specifically using the reference implementation's expected output). Wouldn't sufficient documentation of the change in behavior be sufficient? Regards, Mike ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Wow, interesting thread. Thanks everyone for the ideas. A few more comments: GPUs/CUDA: * Even though there is a bottleneck between main memory and GPU memory, as Nathan mentioned, the much larger memory bandwidth on a GPU often makes GPUs great for memory bound computations...as long as you can leave your data on the GPU for most of the computation. In my case I can do this and this is something I am pursuing as well. OpenMP: I don't really like OpenMP (pragma?!?), but it would be very nice if Cython had optional support for OpenMP that didn't use comments. Other ideas: What I would really like is a nice, super fast *library* built on top of pthreads that made it possible to do OpenMP-like things in Cython, but without depending on having an OpenMP compiler. Basically a fancy, fast thread pool implementation in Cython. And a question: With the new Numpy support in Cython, does Cython release the GIL if it can when running through through loops over numpy arrays? Does Cython call into the C API during these sections? Cheers, Brian ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Brian Granger wrote: And a question: With the new Numpy support in Cython, does Cython release the GIL if it can when running through through loops over numpy arrays? Does Cython call into the C API during these sections? You know, I thought of the exact same thing when reading your post. No, you need the GIL currently, but that's something I'd like to fix. Ideally, it would be something like this: cdef int i, s = 0, n = ... cdef np.ndarray[int] arr = ... # will require the GIL with nogil: for i in range(n): s += arr[i] # does not require GIL The only Python operation needed on the last line is throwing an exception if one is out of bounds; but I can always insert code to reacquire the GIL in that exceptional case. (And in most cases like this the user will turn off bounds-checking anyway.) http://trac.cython.org/cython_trac/ticket/210 will contain progress on this. Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?
On Thu, Feb 12, 2009 at 14:17, Michael S. Gilbert michael.s.gilb...@gmail.com wrote: On Thu, 12 Feb 2009 13:18:26 -0600 Robert Kern wrote: I did some testing with this 64-bit implementation (mt19937-64). I've found that it is actually slower than the 32-bit reference (mt19937ar) on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is likely because it generates 64-bit long long ints instead of 32-bit long ints. However, it should be possible to break up each 64-bit int into two 32-bit ints, then the runtime would appear to be almost twice as fast. Why do you think that? You could also think of it the other way (in terms of generating 64-bit ints). Instead of generating two 32-bit rints and concatenating them for a 64-bit int, you can just directly generate the 64-bit int. Since the 64-bit int requires only slightly more time to generate than either of the 32-bit ints individually, an almost 2x speedup is achieved. shrug I'll believe it when I see it. One other consideration to keep in mind is that the 64-bit version is not stream-compatible with the 32-bit implementation (you will get different sequences for the same input seed). Would it be worth it to implement this in numpy in order to get an almost 2x speedup on 64-bit machines? The incompatibility is a showstopper to replacing the PRNG on any platform. Why is stream-compatibility such a stringent requirement? It seems like this contstraint majorly limits your ability to adopt new/better technologies and reduces your flexibility to make changes to your code as needed. What end-use applications require stream-compatibility? The only thing I can think of is verification/regression testing for numpy, but those test cases could be updated to account for the break in compatibility (and specifically using the reference implementation's expected output). Wouldn't sufficient documentation of the change in behavior be sufficient? Some people don't think so. People have asked for more stringent compatibility than we can already provide (i.e. replicability even in the face of bug fixes). People use these as inputs to their scientific simulations. I'm not going to intentionally make their lives harder than that. Bruce Southey was working on exposing the innards a bit so that you could make use the a different core PRNG while reusing the numpy-specific stuff in RandomState. That would be the approach to apply different technologies. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?
On Thu, Feb 12, 2009 at 14:17, Michael S. Gilbert michael.s.gilb...@gmail.com wrote: On Thu, 12 Feb 2009 13:18:26 -0600 Robert Kern wrote: I did some testing with this 64-bit implementation (mt19937-64). I've found that it is actually slower than the 32-bit reference (mt19937ar) on 64-bit systems (2.15s vs 2.25s to generate 1 ints). This is likely because it generates 64-bit long long ints instead of 32-bit long ints. However, it should be possible to break up each 64-bit int into two 32-bit ints, then the runtime would appear to be almost twice as fast. Why do you think that? You could also think of it the other way (in terms of generating 64-bit ints). Instead of generating two 32-bit rints and concatenating them for a 64-bit int, you can just directly generate the 64-bit int. Since the 64-bit int requires only slightly more time to generate than either of the 32-bit ints individually, an almost 2x speedup is achieved. shrug I'll believe it when I see it. One other consideration to keep in mind is that the 64-bit version is not stream-compatible with the 32-bit implementation (you will get different sequences for the same input seed). Would it be worth it to implement this in numpy in order to get an almost 2x speedup on 64-bit machines? The incompatibility is a showstopper to replacing the PRNG on any platform. Why is stream-compatibility such a stringent requirement? It seems like this contstraint majorly limits your ability to adopt new/better technologies and reduces your flexibility to make changes to your code as needed. What end-use applications require stream-compatibility? The only thing I can think of is verification/regression testing for numpy, but those test cases could be updated to account for the break in compatibility (and specifically using the reference implementation's expected output). Wouldn't sufficient documentation of the change in behavior be sufficient? Some people don't think so. People have asked for more stringent compatibility than we can already provide (i.e. replicability even in the face of bug fixes). People use these as inputs to their scientific simulations. I'm not going to intentionally make their lives harder than that. Bruce Southey was working on exposing the innards a bit so that you could make use the a different core PRNG while reusing the numpy-specific stuff in RandomState. That would be the approach to apply different technologies. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Purpose for bit-wise and'ing the initial mersenne twister key?
On Thu, 12 Feb 2009 14:32:02 -0600 Robert Kern wrote: You could also think of it the other way (in terms of generating 64-bit ints). Instead of generating two 32-bit rints and concatenating them for a 64-bit int, you can just directly generate the 64-bit int. Since the 64-bit int requires only slightly more time to generate than either of the 32-bit ints individually, an almost 2x speedup is achieved. shrug I'll believe it when I see it. I'll put together a proof of concept when I have the time. Some people don't think so. People have asked for more stringent compatibility than we can already provide (i.e. replicability even in the face of bug fixes). People use these as inputs to their scientific simulations. I'm not going to intentionally make their lives harder than that. Bruce Southey was working on exposing the innards a bit so that you could make use the a different core PRNG while reusing the numpy-specific stuff in RandomState. That would be the approach to apply different technologies. This would be very useful. A backward-compatibility=numpy version number flag could be offered to fall back to a particular implementation so that you can push forward and still offer compatibility as needed. Regards, Mike ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
You know, I thought of the exact same thing when reading your post. No, you need the GIL currently, but that's something I'd like to fix. Ideally, it would be something like this: cdef int i, s = 0, n = ... cdef np.ndarray[int] arr = ... # will require the GIL with nogil: for i in range(n): s += arr[i] # does not require GIL Yep, that would be fantastic!!! While it doesn't make the code multithreaded, having this sure would make it easy to get code ready for multithreading The only Python operation needed on the last line is throwing an exception if one is out of bounds; but I can always insert code to reacquire the GIL in that exceptional case. (And in most cases like this the user will turn off bounds-checking anyway.) http://trac.cython.org/cython_trac/ticket/210 will contain progress on this. Great, I will watch this. I am more than willing to test this as well as it is something I would use often. Cheers, Brian ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] umath build failures @ trunk on some platforms?
Hi, The Buildbot (up once again) is showing build failures on some platforms: http://buildbot.scipy.org/builders/Windows_XP_x86_64_MSVC/builds/875/steps/shell/logs/stdio http://buildbot.scipy.org/builders/Linux_SPARC_64_Debian/builds/423/steps/shell/logs/stdio Are these a sign of some bug (in the new math config system?), or a glitch in the buildslaves only? -- Pauli Virtanen ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] umath build failures @ trunk on some platforms?
On Thu, Feb 12, 2009 at 2:23 PM, Pauli Virtanen p...@iki.fi wrote: Hi, The Buildbot (up once again) is showing build failures on some platforms: http://buildbot.scipy.org/builders/Windows_XP_x86_64_MSVC/builds/875/steps/shell/logs/stdio http://buildbot.scipy.org/builders/Linux_SPARC_64_Debian/builds/423/steps/shell/logs/stdio Are these a sign of some bug (in the new math config system?), or a glitch in the buildslaves only? They are old problems. The Debian problem comes from missing prototypes, the Windows problems from a missing function. I had workarounds for these but David removed them and wants to solve them otherwise. The Debian problem probably won't be solved at all because it is a distro problem, you might try an upgrade. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fast threading solution thoughts
Sturla Molden wrote: IMO there's a problem with using literal variable names here, because Python syntax implies that the value is passed. One shouldn't make syntax where private=(i,) is legal but private=(f(),) isn't. The latter would be illegal in OpenMP as well. OpenMP pragmas only take variable names, not function calls. S.M. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Filling gaps
Hi, Are there any routines to fill in the gaps in an array. The simplest would be by carrying the last known observation forward. 0,0,10,8,0,0,7,0 0,0,10,8,8,8,7,7 Or by somehow interpolating the missing values based on the previous and next known observations (mean). Thanks. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Filling gaps
On Feb 12, 2009, at 8:22 PM, A B wrote: Hi, Are there any routines to fill in the gaps in an array. The simplest would be by carrying the last known observation forward. 0,0,10,8,0,0,7,0 0,0,10,8,8,8,7,7 Or by somehow interpolating the missing values based on the previous and next known observations (mean). Thanks. The functions `forward_fill` and `backward_fill` in scikits.timeseries should do what you want. They work also on MaskedArray objects, meaning that you don't need to have actual series. The catch is that you need to install scikits.timeseries, of course. More info here:http://pytseries.sourceforge.net/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Filling gaps
On Thu, Feb 12, 2009 at 5:22 PM, A B python6...@gmail.com wrote: Are there any routines to fill in the gaps in an array. The simplest would be by carrying the last known observation forward. 0,0,10,8,0,0,7,0 0,0,10,8,8,8,7,7 Here's an obvious hack for 1d arrays: def fill_forward(x, miss=0): y = x.copy() for i in range(x.shape[0]): if y[i] == miss: y[i] = y[i-1] return y Seems to work: x array([ 0, 0, 10, 8, 0, 0, 7, 0]) fill_forward(x) array([ 0, 0, 10, 8, 8, 8, 7, 7]) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Filling gaps
On Thu, Feb 12, 2009 at 5:52 PM, Keith Goodman kwgood...@gmail.com wrote: On Thu, Feb 12, 2009 at 5:22 PM, A B python6...@gmail.com wrote: Are there any routines to fill in the gaps in an array. The simplest would be by carrying the last known observation forward. 0,0,10,8,0,0,7,0 0,0,10,8,8,8,7,7 Here's an obvious hack for 1d arrays: def fill_forward(x, miss=0): y = x.copy() for i in range(x.shape[0]): if y[i] == miss: y[i] = y[i-1] return y Seems to work: x array([ 0, 0, 10, 8, 0, 0, 7, 0]) fill_forward(x) array([ 0, 0, 10, 8, 8, 8, 7, 7]) I guess that should be for i in range(1, x.shape[0]): instead of for i in range(x.shape[0]): to avoid replacing the first element of the array, if it is missing, with the last. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Filling gaps
On Thu, Feb 12, 2009 at 6:04 PM, Keith Goodman kwgood...@gmail.com wrote: On Thu, Feb 12, 2009 at 5:52 PM, Keith Goodman kwgood...@gmail.com wrote: On Thu, Feb 12, 2009 at 5:22 PM, A B python6...@gmail.com wrote: Are there any routines to fill in the gaps in an array. The simplest would be by carrying the last known observation forward. 0,0,10,8,0,0,7,0 0,0,10,8,8,8,7,7 Here's an obvious hack for 1d arrays: def fill_forward(x, miss=0): y = x.copy() for i in range(x.shape[0]): if y[i] == miss: y[i] = y[i-1] return y Seems to work: x array([ 0, 0, 10, 8, 0, 0, 7, 0]) fill_forward(x) array([ 0, 0, 10, 8, 8, 8, 7, 7]) I guess that should be for i in range(1, x.shape[0]): instead of for i in range(x.shape[0]): to avoid replacing the first element of the array, if it is missing, with the last. For large 1d x arrays, this might be faster: def fill_forward2(x, miss=0): y = x.copy() while np.any(y == miss): idx = np.where(y == miss)[0] y[idx] = y[idx-1] return y But it does replace the first element of the array, if it is missing, with the last. We could speed it up by doing (y == miss) only once per loop. (But I bet the np.where is the bottleneck.) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bilateral filter: bug corrected
Attached here a code for bilateral filter: 1. The code is composed of a cython back end (bilateral_base.pyx) and a python front end (bilateral.py) 2. I do not have the tools to make windows binaries (I run it on gentoo linux) . 3. It is not hard to strip the cython code to get a pure (and slow) python implementation. 4. If someone finds the licensing inadequate, I have no problem to re-license. Any comments are (more then) welcome Nadav bilateral_base.pyx Description: bilateral_base.pyx # Copyright 2008, Nadav Horesh # nadavh at visionsense dot com # # The software is licenced under BSD licence. ''' A front end functions for bilateral filtering. The output functions bilateral bilateral_slow bilateral_fast should be roughly the same, but there should be a considerable speed factor. ''' from numpy.numarray.nd_image import generic_filter import bilateral_base as _BB def bilateral(mat, xy_sig, z_sig, filter_size=None, mode='reflect'): ''' Bilateral filter a 2D array. mat: A 2D array xy_sig: The sigma of the spatial Gaussian filter. z_sig: The sigma of the gray levels Gaussian filter. filter_size: Size of the spatial filter kernel: None or values 2 --- auto select. mode:See numpy.numarray.nd_image.generic_filter documentation output: A 2D array of the same dimensions as mat and a float64 dtype Remarks: 1. The spatial filter kernel size is ~4*xy_sig, unless specified otherwise 2. The algorithm is slow but has a minimal memory footprint ''' filter_fcn = _BB.Bilat_fcn(xy_sig, z_sig, filter_size) size = filter_fcn.xy_size return generic_filter(mat, filter_fcn.cfilter, size=size, mode=mode) def bilateral_slow(mat, xy_sig, z_sig, filter_size=None, mode='reflect'): 'A pure python implementation of the bilateral filter, for details see bilateral doc.' filter_fcn = _BB.Bilat_fcn(xy_sig, z_sig, filter_size) size = filter_fcn.xy_size return generic_filter(mat, filter_fcn, size=size, mode=mode) def bilateral_fast(mat, xy_sig, z_sig, filter_size=None, mode='reflect'): 'A fast implementation of bilateral filter, for details see bilateral doc.' filter_fcn = _BB.Bilat_fcn(xy_sig, z_sig, filter_size) size = filter_fcn.xy_size return generic_filter(mat, filter_fcn.fc_filter, size =size, mode=mode) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion