### Re: [Numpy-discussion] PyData Barcelona this May

On 20 March 2017 at 19:58, Jaime Fernández del Ríowrote: > > Taking NumPy In Stride > This workshop is aimed at users already familiar with NumPy. We will dissect > the NumPy memory model with the help of a very powerful abstraction: > strides. > Participants will learn how to create different views out of the same data, > including multidimensional ones, get a new angle on how and why broadcasting > works, and explore related techniques to write faster, more efficient code. I think I only understand this abstract because I know what views are. Maybe you could add a line explaining what they are? (I cannot think of one myself). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Move scipy.org docs to Github?

What about readthedocs? I haven't seen any explicit limit in traffic. On 15 March 2017 at 16:33, Matthew Brettwrote: > Hi, > > On Wed, Mar 15, 2017 at 2:47 AM, Ralf Gommers wrote: >> >> >> On Wed, Mar 15, 2017 at 3:21 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> The scipy.org site is down at the moment, and has been for more than 36 >>> hours: >>> >>> https://github.com/numpy/numpy/issues/8779#issue-213781439 >>> >>> This has happened before: >>> >>> https://github.com/scipy/scipy.org/issues/187#issue-186426408 >>> >>> I think it was down for about 24 hours that time. >>> >>> From the number of people opening issues or commenting on the >>> scipy.org website this time, it seems to be causing quite a bit of >>> disruption. >>> >>> It seems to me that we would have a much better chances of avoiding >>> significant down-time, if we switched to hosting the docs on github >>> pages. >>> >>> What do y'all think? >> >> >> Once the site is back up we should look at migrating to a better (hosted) >> infrastructure. I suspect that Github Pages won't work, we'll exceed or be >> close to exceeding both the 1 GB site size limit and the 100 GB/month >> bandwidth limit [1]. >> >> Rough bandwidth estimate (using page size from >> http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per >> month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to that >> pdf docs, which are ~20 MB in size: if only a small fraction of visitors >> download those, we'll be at >100 GB. > > Maybe we could host the PDF docs somewhere else? I wonder if Github > would consider allowing us to go a bit over if necessary? > > Cheers, > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] ANN: NumExpr3 Alpha

This is very nice indeed! On 17 February 2017 at 12:15, Robert McLeodwrote: > * bytes and unicode support > * reductions (mean, sum, prod, std) I use both a lot, maybe I can help you get them working. Also, regarding "Vectorization hasn't been done yet with cmath functions for real numbers (such as sqrt(), exp(), etc.), only for complex functions". What is the bottleneck? Is it in GCC or just someone has to sit down and adapt it? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] array comprehension

On 4 November 2016 at 16:04, Stephan Hoyerwrote: > > But, we also don't have an unstack function. This would mostly be syntactic > sugar, but I think it would be a nice addition. Such a function actually > exists in TensorFlow: > https://g3doc.corp.google.com/third_party/tensorflow/g3doc/api_docs/python/array_ops.md?cl=head#unstack That link is behind a login wall. This is the public version: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/array_ops.md ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

On 8 February 2016 at 20:25, Matthew Brettwrote: > > I used the latest release, v0.2.15: > > https://github.com/matthew-brett/manylinux-builds/blob/master/build_openblas.sh#L5 > > Is there a later version that we should try? > > Cheers, > That is the one in the Fedora repos that is working for me. How are you compiling it? Mine is compiled with GCC 5 with the options seen in the source rpm: http://koji.fedoraproject.org/koji/packageinfo?packageID=15277 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] resizeable arrays using shared memory?

On 6 February 2016 at 23:56, Elliot Hallmarkwrote: > Now, I would like to have these arrays shared between processes spawned via > multiprocessing (for fast interprocess communication purposes, not for > parallelizing work on an array). I don't care about mapping to a file on > disk, and I don't want disk I/O happening. I don't care (really) about data > being copied in memory on resize. I *do* want the array to be resized "in > place", so that the child processes can still access the arrays from the > object they were initialized with. If you are only reading in parallel, and you can afford the extra dependency, one alternative way to do this would be to use an expandable array from HDF5: http://www.pytables.org/usersguide/libref/homogenous_storage.html#earrayclassdescr To avoid I/O, your file can live in RAM. http://www.pytables.org/cookbook/inmemory_hdf5_files.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Linking other libm-Implementation

On 8 February 2016 at 18:36, Nathaniel Smithwrote: > I would be highly suspicious that this speed comes at the expense of > accuracy... My impression is that there's a lot of room to make > speed/accuracy tradeoffs in these functions, and modern glibc's libm has > seen a fair amount of scrutiny by people who have access to the same code > that openlibm is based off of. But then again, maybe not :-). I did some digging, and I found this: http://julia-programming-language.2336112.n4.nabble.com/Is-the-accuracy-of-Julia-s-elementary-functions-exp-sin-known-td32736.html In short: according to their devs, most openlibm functions are accurate to less than 1ulp, while GNU libm is rounded to closest float. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

On 6 February 2016 at 21:26, Matthew Brettwrote: > > pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy > python -c 'import numpy; numpy.test()' > python -c 'import scipy; scipy.test()' > All the tests pass on my Fedora 23 with Python 2.7, but it seems to be linking to the system openblas: numpy.show_config() lapack_opt_info: libraries = ['openblas'] library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)] language = c blas_opt_info: libraries = ['openblas'] library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)] language = c openblas_info: libraries = ['openblas'] library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)] language = c openblas_lapack_info: libraries = ['openblas'] library_dirs = ['/usr/local/lib'] define_macros = [('HAVE_CBLAS', None)] language = c blas_mkl_info: NOT AVAILABLE I can also reproduce Ogrisel's segfault. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Downstream integration testing

On 31 Jan 2016 13:08, "Pauli Virtanen"wrote: > For example, automated test rig that does the following: > > - run tests of a given downstream project version, against > previous numpy version, record output > > - run tests of a given downstream project version, against > numpy master, record output > > - determine which failures were added by the new numpy version > > - make this happen with just a single command, eg "python run.py", > and implement it for several downstream packages and versions. > (Probably good to steal ideas from travis-ci dependency matrix etc.) A simpler idea: build the master branch of a series of projects and run the tests. In case of failure, we can compare with Travis's logs from the project when they use the released numpy. In most cases, the master branch is clean, so an error will likely be a change in behaviour. This can be run automatically once a week, to not hog too much of Travis, and counting the costs in hours of work, is very cheap to set up, and free to maintain. /David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] array_equal too strict?

On 17 December 2015 at 14:43, Nico Schlömerwrote: > I'm not sure where I'm going wrong here. Any hints? You are dancing around the boundary between close floating point numbers, and when you are dealing with ULPs, number of decimal places is a bad measure. Working with plain numbers, instead of arrays (just so that the numbers are printed in full detail) a1 = np.float64(3.1415926535897930) a2 = np.float64(3.1415926535897939) They are numerically different: a2 - a1 8.8817841970012523e-16 In epsilons (defined as the smallest number such that (1 + eps) - 1 > 0): (a2 - a1) / np.finfo(np.float64).eps 4.0 In fact, there is one number in between, two epsilons away from each one: np.nextafter(a1, a2) 3.1415926535897936 np.nextafter(np.nextafter(a1, 10), 10) - a2 0.0 The next number on the other side: np.nextafter(a1, 0) 3.1415926535897927 For more information: print np.finfo(np.float64) Machine parameters for float64 --- precision= 15 resolution= 1.0001e-15 machep= -52 eps=2.2204460492503131e-16 negep = -53 epsneg= 1.1102230246251565e-16 minexp= -1022 tiny= 2.2250738585072014e-308 maxexp= 1024 max=1.7976931348623157e+308 nexp =11 min=-max --- /David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

On 16 December 2015 at 18:59, Francesc Altedwrote: > Probably MATLAB is shipping with Intel MKL enabled, which probably is the > fastest LAPACK implementation out there. NumPy supports linking with MKL, > and actually Anaconda does that by default, so switching to Anaconda would > be a good option for you. A free alternative is OpenBLAS. I am getting 20 s in an i7 Haswell with 8 cores. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Build broken

On 15 December 2015 at 07:25, Jaime Fernández del Ríowrote: > Can anyone who actually knows what Travis is doing take a look at the log: > > https://travis-ci.org/numpy/numpy/builds/96836128 > I don't claim to understand what is happening there, but I believe the function setup_chroot is missing the actual installation of Numpy, or perhaps a call to setup_base. https://github.com/numpy/numpy/blob/master/tools/travis-test.sh What is CHROOT install anyway? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] loadtxt and usecols

On 10 November 2015 at 16:07, Irvin Probstwrote: > I know this new behavior might break a lot of existing code as > usecol=(42,) used to return a 1-D array, but usecol=42, also > returns a 1-D array so the current behavior is not consistent imho. 42, is exactly the same as (42,) If you want a tuple of tuples, you have to do ((42,),), but then it raises: TypeError: list indices must be integers, not tuple. What numpy cares about is that whatever object you give it is iterable, and its entries are ints, so usecol={0:'a', 5:'b'} is perfectly valid. I think loadtxt should be a tool to read text files in the least surprising fashion, and a text file is a 1 or 2D container, so it shouldn't return any other shapes. Any fancy stuff one may want to do with the output should be done with the typical indexing tricks. If I want a single column, I would first be very surprised if I got a 2D array (I was bitten by this design in MATLAB many many times). For the rare cases where I do want a "fake" 2D array, I can make it explicit by expanding it with arr[:, np.newaxis], and then I know that the shape will be (N, 1) and not (1, N). Thus, usecols should be int or sequence of ints, and the result 1 or 2D. In your example: a=[[[2,],[],[],],[],[],[]] foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a) What would the shape of foo be? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

On 30 September 2015 at 18:20, Nathaniel Smithwrote: > - parallel code in general is not very composable. If someone is calling a > numpy operation from one thread, great, transparently using multiple > threads internally is a win. If they're exploiting some higher-level > structure in their problem to break it into pieces and process each in > parallel, and then using numpy on each piece, then numpy spawning threads > internally will probably destroy performance. And numpy is too low-level to > know which case it's in. This problem exists to some extent already with > multi-threaded BLAS, so people use various BLAS-specific knobs to manage it > in ad hoc ways, but this doesn't scale. > One idea: what about creating a "parallel numpy"? There are a few algorithms that can benefit from parallelisation. This library would mimic Numpy's signature, and the user would be responsible for choosing the single threaded or the parallel one by just changing np.function(x, y) to pnp.function(x, y) If that were deemed a good one, what would be the best parallelisation scheme? OpenMP? Threads? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Numpy 1.10.0 release

I don't get any failures on Fedora 22. I have installed it with pip, setting my CFLAGS to "-march=core-avx-i -O2 -pipe -mtune=native" and linking against openblas. With the new Numpy, Scipy full suite shows two errors, I am sorry I didn't think of running that in the RC phase: == FAIL: test_weighting (test_stats.TestHistogram) -- Traceback (most recent call last): File "/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/scipy/stats/tests/test_stats.py", line 892, in test_weighting decimal=2) File "/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/numpy/testing/utils.py", line 886, in assert_array_almost_equal precision=decimal) File "/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/numpy/testing/utils.py", line 708, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal to 2 decimals (mismatch 40.0%) x: array([ 4. ,0. ,4.5, -0.9,0. ,0.3, 110.2,0. , 0. , 42. ]) y: array([ 4. ,0. ,4.5, -0.9,0.3,0. ,7. , 103.2, 0. , 42. ]) == FAIL: test_nanmedian_all_axis (test_stats.TestNanFunc) -- Traceback (most recent call last): File "/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/scipy/stats/tests/test_stats.py", line 226, in test_nanmedian_all_axis assert_equal(len(w), 4) File "/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/numpy/testing/utils.py", line 354, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: 1 DESIRED: 4 I am almost sure these errors weren't there before. On 6 October 2015 at 13:53, Neal Beckerwrote: > 1 test failure: > > FAIL: test_blasdot.test_blasdot_used > -- > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in > runTest > self.test(*self.arg) > File "/home/nbecker/.local/lib/python2.7/site- > packages/numpy/testing/decorators.py", line 146, in skipper_func > return f(*args, **kwargs) > File "/home/nbecker/.local/lib/python2.7/site- > packages/numpy/core/tests/test_blasdot.py", line 31, in test_blasdot_used > assert_(dot is _dotblas.dot) > File "/home/nbecker/.local/lib/python2.7/site- > packages/numpy/testing/utils.py", line 53, in assert_ > raise AssertionError(smsg) > AssertionError > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

On 2 October 2015 at 11:58, Juha Jeronenwrote: > >> > First version done and uploaded: > > > https://yousource.it.jyu.fi/jjrandom2/miniprojects/trees/master/misc/polysolve_for_numpy > Small comment: now you are checking if the input is a scalar or a ndarray, but it should also accept any array-like. If I pass a list, I expect it to work, internally converting it into an array. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

On 1 October 2015 at 09:05, Nathaniel Smithwrote: > > >> - gcc + OpenMP on linux still breaks multiprocessing. There's a patch to > >> fix this but they still haven't applied it; alternatively there's a > >> workaround you can use in multiprocessing (not using fork mode), but > this > >> requires every user update their code and the workaround has other > >> limitations. We're unlikely to use OpenMP while this is the case. > > > > Any idea when is this going to be released? > > Which? The gcc patch? I spent 2 full release cycles nagging them and > they still can't be bothered to make a decision either way, so :-(. If > anyone has some ideas for how to get traction in gcc-land then I'm > happy to pass on details... > :( Have you tried asking Python-dev for help with this? Hopefully they would have some weight there. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

On 30 September 2015 at 18:20, Nathaniel Smith <n...@pobox.com> wrote: > On Sep 30, 2015 2:28 AM, "Daπid" <davidmen...@gmail.com> wrote: > [...] > > Is there a nice way to ship both versions? After all, most > implementations of BLAS and friends do spawn OpenMP threads, so I don't > think it would be outrageous to take advantage of it in more places; > provided there is a nice way to fallback to a serial version when it is not > available. > > This is incorrect -- the only common implementation of BLAS that uses > *OpenMP* threads is OpenBLAS, and even then it's not the default -- it only > happens if you run it in a special non-default configuration. > Right, sorry. I wanted to say they spawn parallel threads. What do you mean by a non default configuration? Setting he OMP_NUM_THREADS? > The challenges to providing transparent multithreading in numpy generally > are: > > - gcc + OpenMP on linux still breaks multiprocessing. There's a patch to > fix this but they still haven't applied it; alternatively there's a > workaround you can use in multiprocessing (not using fork mode), but this > requires every user update their code and the workaround has other > limitations. We're unlikely to use OpenMP while this is the case. > Any idea when is this going to be released? As I understand it, OpenBLAS doesn't have this problem, am I right? > - parallel code in general is not very composable. If someone is calling a > numpy operation from one thread, great, transparently using multiple > threads internally is a win. If they're exploiting some higher-level > structure in their problem to break it into pieces and process each in > parallel, and then using numpy on each piece, then numpy spawning threads > internally will probably destroy performance. And numpy is too low-level to > know which case it's in. This problem exists to some extent already with > multi-threaded BLAS, so people use various BLAS-specific knobs to manage it > in ad hoc ways, but this doesn't scale. > > (Ironically OpenMP is more composable then most approaches to threading, > but only if everyone is using it and, as per above, not everyone is and we > currently can't.) > That is what I meant with providing also a single threaded version. <https://mail.scipy.org/mailman/listinfo/numpy-discussion>The user can choose if they want the parallel or the serial, depending on the case. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

On 30 September 2015 at 10:20, Juha Jeronenwrote: > > Are we putting Cuthon in Numpy now? I lost track. >> > > I thought so? But then again, I'm just a user :) As of this moment, there are three Cython modules in Numpy, so yes. Releases ship the C generated modules, so it is not a dependency. ./numpy/distutils/tests/pyrex_ext/primes.pyx ./numpy/random/mtrand/mtrand.pyx ./tools/allocation_tracking/alloc_hook.pyx In any case, we should always include a single threaded version because sometimes the computations are already parallelised at a higher level. Is there a nice way to ship both versions? After all, most implementations of BLAS and friends do spawn OpenMP threads, so I don't think it would be outrageous to take advantage of it in more places; provided there is a nice way to fallback to a serial version when it is not available. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] ANN: Numpy 1.10.0rc1 released.

On 24 September 2015 at 09:17, Jens Jørgen Mortensenwrote: > > >>> np.vdot(a[:, :, 0], b[:, :, 0]).real > 84.0 > >>> np.__version__ > '1.10.0rc1' > > The result should be 42. This is on Ubuntu 15.04. I can reproduce on Fedora with Numpy linked against Openblas. Numpy 1.9 in the same configuration works. The full test suite raises one error, but only due to lack of space when saving a large array. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

On 24 May 2015 at 22:30, Sturla Molden sturla.mol...@gmail.com wrote: Personally I think we should only make guarantees about the data types, array shapes, and things like that, but not about the values. Those who need a particular version of NumPy for exact reproducibility should install the version of Python and NumPy they need. That is why virtual environments exist. But there is a lot of legacy code out there that doesn't specify the version required; and in most cases the original author cannot even be asked. Tests are a particularly annoying case. For example, when testing an algorithm, is usually a good practice to record the number of iterations as well as the result; consider it an early warning that we have changed something we possibly didn't mean to, even if the result is correct. If we want to support several NumPy versions, and the algorithm has any randomness, the tests would have to be duplicated, or find a seed that gives the exact same results. Thus, keeping different versions lets us compare the results against the old API, without needing to duplicate the tests. A lot less people will get annoyed. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] np.diag(np.dot(A, B))

On 22 May 2015 at 12:15, Mathieu Blondel math...@mblondel.org wrote: Right now I am using np.sum(A * B.T, axis=1) for dense data and I have implemented a Cython routine for sparse data. I haven't benched np.sum(A * B.T, axis=1) vs. np.einsum(ij,ji-i, A, B) yet since I am mostly interested in the sparse case right now. In my system, einsum seems to be faster. In [3]: N = 256 In [4]: A = np.random.random((N, N)) In [5]: B = np.random.random((N, N)) In [6]: %timeit np.sum(A * B.T, axis=1) 1000 loops, best of 3: 260 µs per loop In [7]: %timeit np.einsum(ij,ji-i, A, B) 1 loops, best of 3: 147 µs per loop In [9]: N = 1023 In [10]: A = np.random.random((N, N)) In [11]: B = np.random.random((N, N)) In [12]: %timeit np.sum(A * B.T, axis=1) 100 loops, best of 3: 14 ms per loop In [13]: %timeit np.einsum(ij,ji-i, A, B) 100 loops, best of 3: 10.7 ms per loop I have ATLAS installed from the Fedora repos, so not tuned; but einsum is only using one thread anyway, so probably it is not using it (definitely not computing the full dot, because that already takes 200 ms). If B is in FORTRAN order, it is much faster (for N=5000). In [25]: Bf = B.copy(order='F') In [26]: %timeit np.einsum(ij,ji-i, A, Bf) 10 loops, best of 3: 25.7 ms per loop In [27]: %timeit np.einsum(ij,ji-i, A, B) 1 loops, best of 3: 404 ms per loop In [29]: %timeit np.sum(A * Bf.T, axis=1) 10 loops, best of 3: 118 ms per loop In [30]: %timeit np.sum(A * B.T, axis=1) 1 loops, best of 3: 517 ms per loop But the copy is not worth it: In [31]: %timeit Bf = B.copy(order='F') 1 loops, best of 3: 463 ms per loop /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Introductory mail and GSoc Project Vector math library integration

On 11 March 2015 at 16:51, Dp Docs sdpa...@gmail.com wrote: On Wed, Mar 11, 2015 at 7:52 PM, Sturla Molden sturla.mol...@gmail.com wrote: Hi sturla, Thanks for suggestion. There are several vector math libraries NumPy could use, e.g. MKL/VML, Apple Accelerate (vecLib), ACML, and probably others. Are these libraries fast enough in comparison to C maths libraries? These are the fastest beast out there, written in C, Assembly, and arcane incantations. There are at least two ways to proceed here. One is to only use vector math when strides and alignment allow it. I didn't got it. can you explain in detail? One example, you can create a numpy 2D array using only the odd columns of a matrix. odd_matrix = full_matrix[::2, ::2] This is just a view of the original data, so you save the time and the memory of making a copy. The drawback is that you trash memory locality, as the elements are not contiguous in memory. If the memory is guaranteed to be contiguous, a compiler can apply extra optimisations, and this is what vector libraries usually assume. What I think Sturla is suggesting with when strides and aligment allow it is to use the fast version if the array is contiguous, and fall back to the present implementation otherwise. Another would be to make an optimally aligned copy, but that could eat up whatever time we save from using the faster library, and other problems. The difficulty with Numpy's strides is that they allow so many ways of manipulating the data... (alternating elements, transpositions, different precisions...). I think the actual problem is not to choose which library to integrate, it is how to integrate these libraries? as I have seen the code base and been told the current implementation uses the c math library, Can we just use the current implementation and whenever it is calling C Maths functions, we will replace by these above fast library functions?Then we have to modify the Numpy library (which usually get imported for maths operation) by using some if else conditions like first work with the faster one and if it is not available the look for the Default one. At the moment, we are linking to whichever LAPACK is avaliable at compile time, so no need for a runtime check. I guess it could (should?) be the same. Moreover, I have Another Doubt also. are we suppose to integrate just one fast library or more than one so that if one is not available, look for the second one and if second is not available then either go to default are look for the third one if available? Are we suppose to think like this: Let say exp is faster in sleef library so integrate sleef library for this operation and let say sin is faster in any other library, so integrate that library for sin operation? I mean, it may be possible that different operations are faster in different libraries So the implementation should be operation oriented or just integrate one complete library?Thanks Which one is faster depends on the hardware, the version of the library, and even the size of the problem: http://s3.postimg.org/wz0eis1o3/single.png I don't think you can reliably decide ahead of time which one should go for each operation. But, on the other hand, whichever one you go for will probably be fast enough for anyone using Python. Most of the work here is adapting Numpy's machinery to dispatch a call to the vector library, once that is ready, adding another one will hopefully be easier. At least, at the moment Numpy can use one of several linear algebra packages (MKL, ATLAS, CBLAS...) and they are added, I think, without too much pain (but maybe I am just far away from the screams of whoever did it). /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] dot and MKL memory

On 27 February 2015 at 17:50, Charles R Harris charlesr.har...@gmail.com wrote: No, but I don't know why that is happening with MKL. Can anyone else reproduce this? Chuck I can't reproduce. I have checked on my system python (ATLAS) and Conda with MKL running in parallel with vmstat. The difference between them is under 1 MB. I am Fedora 20, system python using GCC 4.8 and ATLAS 3.8.4. Coda is compiled with GCC 4.4 and linked to MKL 11.1. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

On 11 February 2015 at 08:06, Kartik Kumar Perisetla kartik.p...@gmail.com wrote: Thanks David. But do I need to install virtualenv on every node in hadoop cluster? Actually I am not very sure whether same namenodes are assigned for my every hadoop job. So how shall I proceed on such scenario. I have never used hadoop, but in the clusters I have used, you have a home folder on the central node, and each and every computing node has access to it. You can then install Python in your home folder and make every node run that, or pull a local copy. Probably the cluster support can clear this up further and adapt it to your particular case. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

On 11 February 2015 at 03:38, Kartik Kumar Perisetla kartik.p...@gmail.com wrote: Also, I don't have root access thus, can't install numpy or any other package on cluster You can create a virtualenv, and install packages on it without needing root access. To minimize trouble, you can ensure it uses the system packages when available. Here are instructions on how to install it: https://stackoverflow.com/questions/9348869/how-to-install-virtualenv-without-using-sudo http://opensourcehacker.com/2012/09/16/recommended-way-for-sudo-free-installation-of-python-software-with-virtualenv/ This does not require root access, but it is probably good to check with the sysadmins to make sure they are fine with it. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Any interest in a 'heaviside' ufunc?

On 4 February 2015 at 11:05, Sturla Molden sturla.mol...@gmail.com wrote: On 04/02/15 06:18, Warren Weckesser wrote: By discrete form, do you mean discrete time (i.e. a function defined on the integers)? Then I agree, the discrete time unit step function is defined as It is the cumulative integral of the delta function, and thus it can never obtain the value 0.5. The delta function is defined to have an integral of 0 or 1. Sturla There are several definitions. Abramowitz and Stegun (http://people.math.sfu.ca/~cbm/aands/page_1020.htm) assign the value 0.5 at x=0. It can also be defined as: H(x) = 1/2 * (1 + sign(x)) Where sign(0) = 0, and therefore H(0) = 1/2. Actually, Heaviside function is better seen as a distribution instead of a function, and then there is no problem with the value at 0, as long as it is finite. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Sorting refactor

On 16 January 2015 at 13:15, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps an algorithm can be made faster, but often these multicore algorithms are also less efficient, and a less data-dependent way of putting my cores to good use would have been preferable. Potentially, other code could slow down due to cache trashing if too many parallel tasks run in parallel. Id rather be in charge of such matters myself; but I imagine adding a keyword arg for these matters would not be much trouble? As I understand it, that is where the strength of Cilk+ lies. It does not force parallelisation, just suggests it. The decision to actually spawn parallel is decided at runtime depending on the load of the other cores. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Characteristic of a Matrix.

On 5 January 2015 at 20:40, Colin J. Williams cjwilliam...@gmail.com wrote: This illustrates a failure, which is reported later in the calculation: A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) Here 2 - 6 is treated as an expression. There should be a comma between 2 and -6. The rectangularity is checked, and in this case, it is not fulfilled. As such, NumPy creates a square matrix of size 1x1 of dtype object. If you want to make sure what you have manually inputed is correct, you should include a couple of assertions afterwards. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] median filtering a masked array

On 5 November 2014 19:11, Moroney, Catherine M (398E) catherine.m.moro...@jpl.nasa.gov wrote: What is the recommended way of doing a fast median filter on an array where only certain elements of the array need to be calculated? I'm trying to avoid a nested loop over all (I,J) elements. Since you are using FORTRAN, I believe the simplest way is to do this double loop in Cython. While you are at it, if you want to squeeze cycles, you can implement the median in Cython too. Here is some code: http://numpy-discussion.10968.n7.nabble.com/A-faster-median-Wirth-s-method-td14267.html /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions)

On 29 October 2014 10:48, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My point isn't about speed; its about the scope of numpy. typing np.fft.fft isn't more or less convenient than using some other symbol from the scientific python stack. The problem is in distribution. For many users, installing a new library is not easy (computing cluster, company regulations...). And this assuming the alternative library is held to the same quality standards as Numpy. Another argument is that this should only be living in Scipy, that is, after all, quite standard; but it requires a FORTRAN compiler, that is quite a big dependency. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] higher accuracy in diagonialzation

On 27 October 2014 09:37, Sunghwan Choi sunghwancho...@gmail.com wrote: One of them is inaccurate or both two of them are inaccurate within that range. Which one is more accurate? You can check it yourself using the eigenvectors. The cosine distance between v and M.dot(v) will give you the error in the eigenvectors, and the difference between ||lambda*v|| and ||M.dot(v)|| the error in the eigenvalue. I would also check the condition numbers, maybe your matrix is just not well conditioned. You would have to look at preconditioners. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

On 26 October 2014 12:54, Jeff Reback jeffreb...@gmail.com wrote: you should have a read here/ http://wesmckinney.com/blog/?p=543 going below the 2x memory usage on read in is non trivial and costly in terms of performance If you know in advance the number of rows (because it is in the header, counted with wc -l, or any other prior information) you can preallocate the array and fill in the numbers as you read, with virtually no overhead. If the number of rows is unknown, an alternative is to use a chunked data container like Bcolz [1] (former carray) instead of Python structures. It may be used as such, or copied back to a ndarray if we want the memory to be aligned. Including a bit of compression we can get the memory overhead to somewhere under 2x (depending on the dataset), at the cost of not so much CPU time, and this could be very useful for large data and slow filesystems. /David. [1] http://bcolz.blosc.org/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy log and exceptions

On 22 October 2014 15:43, Nathaniel Smith n...@pobox.com wrote: I guess we could make this more consistent by hand if we wanted - isnan is pretty cheap? Can it be made avoiding storing the full bool array? The 1/8 memory overhead can be problematic for large arrays. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] [SciPy-Dev] Hamming etc. windows are wrong

On 26 September 2014 10:41, Jerry lancebo...@qwest.net wrote: I am just learning Python-NumPy-SciPy but it appears as though the SciPy situation is that the documentation page above *** mentions the flag but the flag has not been implemented into SciPy itself. I would be glad to stand corrected. Regarding this, you can look at the source: odd = M % 2 if not sym and not odd: w = w[:-1] so it is implemented in Scipy, explicitely limited to even windows. It is not in Numpy, though (but that can be fixed). I think a reference is worth adding to the documentation, to make the intent and application of the sym flag clear. I am sure a PR will be most appreciated. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy ignores OPT/FOPT under Python3

On 9 September 2014 17:23, Thomas Unterthiner thomas_unterthi...@web.de wrote: I want to use the OPT/FOPT environment viariables to set compiler flags when compiling numpy. However it seems that they get ignored under python3. Using Ubuntu 14.04 and numpy 1.9.0, I did the following: export OPT=-march=native export FOPT = -march=native python setup.py build # python executes python2.7 [...snip...] C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -march=native -fPIC ^C Running the same on my computer (Fedora 20, python 2.7) doesn't seem to process the flags: C compiler: -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC Double-checking: $ echo $OPT -march=native ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Does a `mergesorted` function make sense?

On 27 August 2014 19:02, Jaime Fernández del Río jaime.f...@gmail.com wrote: Since there is at least one other person out there that likes it, is there any more interest in such a function? If yes, any comments on what the proper interface for extra output should be? Although perhaps the best is to leave that out for starters and see what use people make of it, if any. I think a perhaps more useful thing would be to implement timsort. I understand it is capable of take full advantage of the partially sorted arrays, with the extra safety of not making the assumption that the individual arrays are sorted. This will also be useful for other real world cases where the data is already partially sorted. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

On 28 July 2014 14:46, Sebastian Berg sebast...@sipsolutions.net wrote: To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the current master? If so, I think that would be very much regrettable; and if this is a minority opinion, I do hope that at least this gets documented in a most explicit manner. This will always give you different results. Though in master. the difference is more likely to be large, since (often the second one) maybe be less likely to run into bigger numerical issues. An example using float16 on Numpy 1.8.1 (I haven't seen diferences with float32): for N in np.logspace(2, 6): print N, (np.ones((N,2), dtype=np.float16).mean(0), np.ones((2,N), dtype=np.float16).mean(1)) The first one gives correct results up to 2049, from where the values start to fall. The second one, on the other hand, gives correct results up to 65519, where it blows to infinity. Interestingly, in the second case there are fluctuations. For example, for N = 65424, the mean is 0.99951172, but 1 for the next and previous numbers. But I think they are just an effect of the rounding, as: In [33]: np.ones(N+1, dtype=np.float16).sum() - N Out[33]: 16.0 In [35]: np.ones(N+1, dtype=np.float16).sum() - (N +1) Out[35]: 15.0 In [36]: np.ones(N-1, dtype=np.float16).sum() - (N -1) Out[36]: -15.0 In [37]: N = 65519 - 20 In [38]: np.ones(N, dtype=np.float16).sum() - N Out[38]: 5.0 In [39]: np.ones(N+1, dtype=np.float16).sum() - (N +1) Out[39]: 4.0 In [40]: np.ones(N-1, dtype=np.float16).sum() - (N -1) Out[40]: 6.0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

On 11 July 2014 22:30, Daniel da Silva var.mail.dan...@gmail.com wrote: I think the idea at hand is not that it would be used everyday, but it would be there when needed. What people do everyday is with *real* data. They are using functions to load the data. But sometimes we have to hard-code a few values, and it is true that making a list (or nested list) is quite verbose; one example are unittests. Having a MATLAB-style array creation would be convenient for those cases. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

On 7 July 2014 08:48, Jacco Hoekstra - LR j.m.hoeks...@tudelft.nl wrote: How about using the old name np.mat() for this type of array creation? How about a new one? np.matarray, for MATLAB array. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Cython requirement?

On 6 July 2014 20:40, Benjamin Root ben.r...@ou.edu wrote: When did Cython become a build requirement? I remember discussing the use of Cython a while back, and IIRC the agreement was that both the cython code and the generated C files would be included in version control so that cython wouldn't be a build requirement, only a developer requirement when modifying those files. The policy was changed to not include them in VC, but to them in the releases, not to pollute the repository and avoid having C files not matching the pyx, IIRC. The change was fairly recent, I was only able to dig this email mentioning it: http://numpy-discussion.10968.n7.nabble.com/numpy-git-master-requiring-cython-for-build-td37250.html /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Switch to using ATLAS for OSX binary wheels

On 13 June 2014 18:00, Ralf Gommers ralf.gomm...@gmail.com wrote: pip install numpy[mkl] ? I think that's possible. MKL are fairly famous, but perhaps it would be legally safer to use [mkl-nonfree] (or something of the sort) to signal the licence. But maybe I am bikeshedding here. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] fftw supported?

On 4 June 2014 23:34, Alexander Eberspächer alex.eberspaec...@gmail.com wrote: If you feel pyfftw bothers you with too many FFTW details, you may try something like https://github.com/aeberspaecher/transparent_pyfftw (be careful, it's a hack that has seen only little testing). pyFFTW provides a drop-in replacement for Numpy and Scipy's fftw: https://hgomersall.github.io/pyFFTW/pyfftw/interfaces/interfaces.html You can still set the number of threads and other advanced parameters, but you can just ignore them and view it as a very simple library. Does your wrapper set these to reasonable values? If not, I am missing completely the point. I am starting to use pyFFTW, and maybe I can help you test tfftw. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy quad and maple

On 10 May 2014 01:04, Ihab Riad ifr...@gmail.com wrote: Hi, I did the following integral with numpy quad quad(lambda h: np.exp(-.5*2334.0090702455204-1936.9610182100055*5*log10(h)-(12.5*2132.5498892927189)*log10(h)*log10(h)),0,inf) and I get the following values (1.8368139214123403e-126, 3.3631976081491865e-126). The first value is the integral, the second is the error. You have (1.8 +- 3.3) 10^-126. That is, your integral is almost zero. When I do the integral with maple and mathematica I get 2.643019766*10^(-127) That is inside the bracket Scipy said. And I wouldn't even trust them to be correct without doing some checking. Numbers so small are numerically non trustworthy. Your function is: E^(A - B log(h) - C log^2(h)) = E^A * h^B' * h ^ (C' log(h)) where B' and C' depend on your numbers. E^A is already 10^-507 (are you sure this numbers are correct?), so if you strip it out of your integral, you have a simpler expression, and more reasonable values. This is what is killing your integral and sending it to 0. Now, you have to normalise the values. So, changing variables: h- sx, dh = sdx: sx^B' * (sx)^[C' (log(s) + log(x))] Expand and simplify, find the value that makes more reasonable values (order of one). Last thing: giving quad a limit in infinity can make it behave crazily. If you look at your second term, that is roughly x^(-log(x)), it decays very quickly for large values of x (10^-5 for x=20), so you can, without affecting your integral much, set a cut at a some reasonable h (plot it to be sure). You can even make an estimate comparing with the analytical result [1] of that particular piece. Bottomline: don't use a lambda, use a function and expand your parameters. That will make the expression simpler. def f(h): log10 = np.log10(h) B = 1936.9610182100055*5 ... return ... /David. [1] http://www.wolframalpha.com/input/?i=Integrate[x ^-log%28x%29%2C+{x%2C+k%2C+infinity}] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] index partition

On 14 April 2014 18:17, Alan G Isaac alan.is...@gmail.com wrote: I find it rather more convenient to use boolean arrays, but I wonder if arrays of indexes might have other advantages (which would suggest using the set operations instead). In particular, might a[boolean_array] be slower that a[indexes]? (I'm just asking, not suggesting.) Indexing is generally faster, but convert from boolean to indexes gets more expensive: In [2]: arr =np.random.random(1000) In [3]: mask = arr0.7 In [4]: mask.sum() Out[4]: 290 In [5]: %timeit arr[mask] 10 loops, best of 3: 4.01 µs per loop In [6]: %%timeit ...: wh = np.where(mask) ...: arr[wh] ...: 10 loops, best of 3: 6.47 µs per loop In [8]: wh = np.where(mask) In [9]: %timeit arr[wh] 10 loops, best of 3: 2.57 µs per loop In [10]: %timeit np.where(mask) 10 loops, best of 3: 3.89 µs per loop In [14]: np.all(arr[wh] == arr[mask]) Out[14]: True If you want to apply the same mask to several arrays, it is then worth (performance-wise) to do it. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] match RNG numbers with R?

On Apr 7, 2014 3:59 AM, Yaroslav Halchenko li...@onerussian.com wrote: so I would assume that the devil is indeed in R post-processing and would look into it (if/when get a chance). The devil here is the pigeon and the holes problem. Mersenne Twister generates random integers in a certain range. The output is guaranteed to be deterministic, uniform, and reproducible. But when you want to cast those m possible input in n possible outputs, you need to do magic (or maths) to keep the new distribution truly uniform. Simply getting random bytes and viewing them as ints will give you low quality random numbers. The algorithm that casts MT output to a random integer is probably what is different, and perhaps you could find it documented somewhere. As a side note, C++11 includes in the standard a MT RNG guaranteed to be the same across implementations, but there is no promise about the distributions -not even across versions of the same implementation. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

On 2 April 2014 16:06, Sturla Molden sturla.mol...@gmail.com wrote: josef.p...@gmail.com wrote: pandas came later and thought ddof=1 is worth more than consistency. Pandas is a data analysis package. NumPy is a numerical array package. I think ddof=1 is justified for Pandas, for consistency with statistical software (SPSS et al.) For NumPy, there are many computational tasks where the Bessel correction is not wanted, so providing a uncorrected result is the correct thing to do. NumPy should be a low-level array library that does very little magic. All this discussion reminds me of the book Numerical Recipes: if the difference between N and N - 1 ever matters to you, then you are probably up to no good anyway -- e.g., trying to substantiate a questionable hypothesis with marginal data. For any reasonably sized data set, it is a correction in the second significant figure. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Where to put versionadded

On Apr 4, 2014 8:54 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Apr 4, 2014 at 12:48 PM, josef.p...@gmail.com wrote: On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Currently there are several placements of the '.. versionadded::' directive and I'd like to settle on a proper style for consistency. There are two occasions on which it is used, first, when a new function or class is added and second, when a new keyword is added to an existing function or method. The options are as follows. New Function 1) Originally, the directive was added in the notes section. Notes - .. versionadded:: 1.5.0 2) Alternatively, it is placed after the extended summary. blah, blah ..versionadded:: 1.5.0 Between these two, I vote for 2) because the version is easily found when reading the documentation either in a terminal or rendered into HTML. New Parameter 1) It is placed before the parameter description newoption : int, optional .. versionadded:: 1.5.0 blah. 2) It is placed after the parameter description. newoption : int, optional blah. .. versionadded:: 1.5.0 Both of these render correctly, but the first is more compact while the second puts the version after the description where it doesn't interrupt the reading. I'm tending towards 1) on account of its compactness. Thoughts? I'm in favor of putting them only in the Notes section. Most of the time they are not crucial information and it's distracting. I usually only look for them when I'm working explicitly across several numpy versions. like in python: versionadded 2.1 is only interesting for historians. I find the opposite to be true. Because numpy needs maintain compatibility with a number python versions, I often check the python documentation to see in which version a function was added. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion My user perspective: I am developing a tool whose main use is to run on my computer, so I prefer to run the newest and sweetest version of the libraries, and I this report the minimum versions. But it would be good if I could grep my code, see what numpy functions are being used and infer a probable minimum version required. If other libraries follow similar conventions, and one does not do metaprogramming or other fancy things, it is relatively easy to get automatically. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] argsort speed

On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] argsort speed

On 17 February 2014 00:12, Daπid davidmen...@gmail.com wrote: I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? Nevermind, it is in core/src/multiarray/methods.c ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Geometrically defined masking arrays; how to optimize?

Here it is an example: import numpy as np import pylab as plt N = 100 M = 200 x, y = np.meshgrid(np.arange(N), np.arange(M)) # Center at 40, 70, radius 20 x -= 40 y -= 70 out_circle = (x * x + y * y 20**2) out_ring = out_circle - (x * x + y * y 10**2) plt.imshow(out_circle) plt.figure() plt.imshow(out_ring) plt.show() Note that I have avoided taking the costly square root of each element by just taking the square of the radius. It can also be generalised to ellipses or rectangles, if you need it. Also, don't use 0 as a False value, and don't force it to be a 0. Instead, use if not ring: /David On 11 February 2014 21:56, Wolfgang Draxinger wolfgang.draxin...@physik.uni-muenchen.de wrote: Hi, I implemented the following helper function to create masking arrays: def gen_mask(self, ring, sector): in_segment = None if ring == 0: radius = self.scales()[0] def in_center_circle(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) return r radius in_segment = in_center_circle else: angle = ( self.a_sector(sector, ring), self.a_sector( (sector+1) % self.n_sectors(), ring) ) radii = self.scales()[ring:ring+1] def in_segment_ring(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) a = math.atan2( dy, dx ) return r = radii[0] and r radii[1] \ and a = angle[0] and a angle[1] in_segment = in_segment_ring width,height = self.arr.shape mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) for y in range(height): for x in range(width): mask[x][y] = in_segment((x,y)) return mask self.scales() generates a list of radius scaling factors. self.a_sector() gives the dividing angle between sectors sector and sector+1 on the given ring. The function works, it generates the masks as I need it. The problem is - of course - that it's quite slow due to the inner loops the perform the geometrical test if the given element of the array self.arr is withing the bounds of the ring-sector for which the mask is generated. I wonder if you guys have some ideas, on how I could accelerate this. This can be perfectly well constructed by boolean combination of elementary geometrical objects. For example a ring would be ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 r1 The sector would then be ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) I'd like to avoid doing this using some C helper routine. I'm looking for something that is the most efficient when it comes to speedup/time invested to develop this. Cheers, Wolfgang ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Geometrically defined masking arrays; how to optimize?

A small improvement: # Dimensions N = 100 M = 200 # Coordinates of the centre x0 = 40 y0 = 70 x, y = np.meshgrid(np.arange(N) - x0, np.arange(M) - y0, sparse=True) # Center at 40, 70, radius 20 out_circle = (x * x + y * y 20**2) out_ring = out_circle - (x * x + y * y 10**2) plt.imshow(out_circle) plt.figure() plt.imshow(out_ring) plt.show() Using sparse you can avoid the repetition of the arrays, getting the same functionality. Also, if the image is big, you can delegate the computations of out_circle and out_ring to numexpr for speed. /David. On 11 February 2014 22:16, Daπid davidmen...@gmail.com wrote: Here it is an example: import numpy as np import pylab as plt N = 100 M = 200 x, y = np.meshgrid(np.arange(N), np.arange(M)) # Center at 40, 70, radius 20 x -= 40 y -= 70 out_circle = (x * x + y * y 20**2) out_ring = out_circle - (x * x + y * y 10**2) plt.imshow(out_circle) plt.figure() plt.imshow(out_ring) plt.show() Note that I have avoided taking the costly square root of each element by just taking the square of the radius. It can also be generalised to ellipses or rectangles, if you need it. Also, don't use 0 as a False value, and don't force it to be a 0. Instead, use if not ring: /David On 11 February 2014 21:56, Wolfgang Draxinger wolfgang.draxin...@physik.uni-muenchen.de wrote: Hi, I implemented the following helper function to create masking arrays: def gen_mask(self, ring, sector): in_segment = None if ring == 0: radius = self.scales()[0] def in_center_circle(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) return r radius in_segment = in_center_circle else: angle = ( self.a_sector(sector, ring), self.a_sector( (sector+1) % self.n_sectors(), ring) ) radii = self.scales()[ring:ring+1] def in_segment_ring(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) a = math.atan2( dy, dx ) return r = radii[0] and r radii[1] \ and a = angle[0] and a angle[1] in_segment = in_segment_ring width,height = self.arr.shape mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) for y in range(height): for x in range(width): mask[x][y] = in_segment((x,y)) return mask self.scales() generates a list of radius scaling factors. self.a_sector() gives the dividing angle between sectors sector and sector+1 on the given ring. The function works, it generates the masks as I need it. The problem is - of course - that it's quite slow due to the inner loops the perform the geometrical test if the given element of the array self.arr is withing the bounds of the ring-sector for which the mask is generated. I wonder if you guys have some ideas, on how I could accelerate this. This can be perfectly well constructed by boolean combination of elementary geometrical objects. For example a ring would be ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 r1 The sector would then be ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) I'd like to avoid doing this using some C helper routine. I'm looking for something that is the most efficient when it comes to speedup/time invested to develop this. Cheers, Wolfgang ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Fast decrementation of indices

On 2 February 2014 20:58, Mads Ipsen mads.ip...@gmail.com wrote: ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In practical examples, the list can be much larger (N 100.000 connections. Perhaps you should consider an alternative approach. You could consider it a graph, and you could use Networkx or Scipy to work with them (provided it actually works well with the rest of your problem) In the case of Scipy, the graph is described by its adjacency matrix, and you just want to delete a row and a column. But, in any case, not knowing at all what is your overall project, renumbering nodes is not something one has to usually do when working with graphs, except for final results. The labels are that, labels, with no further meaning. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Catching out-of-memory error before it happens

On 24 January 2014 23:09, Dinesh Vadhia dineshbvad...@hotmail.com wrote: Francesc: Thanks. I looked at numexpr a few years back but it didn't support array slicing/indexing. Has that changed? No, but you can do it yourself. big_array = np.empty(2) piece = big_array[30:-50] ne.evaluate('sqrt(piece)') Here, creating piece does not increase memory use, as slicing shares the original data (well, actually, it adds a mere 80 bytes, the overhead of an array). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] using loadtxt to load a text file in to a numpy array

On 15 January 2014 11:12, Hedieh Ebrahimi hedieh.ebrah...@amphos21.comwrote: I try to print my fileContent array after I read it and it looks like this : [b'C:UsersDocumentsProjectmytextfile1.txt' b'C:UsersDocumentsProjectmytextfile2.txt' b'C:UsersDocumentsProjectmytextfile3.txt'] Why is this happening and how can I prevent it ? Also if I have a line that starts like this in my file, python will crash on me. how can i fix this ? What is wrong with this case? If you are concerned about the multiple backslashes, they are there because they are special symbols, and so they have to be escaped (you actually want a backslash, not whatever else they could mean). Depending on what else is on the file, you may be better off reading the file in pure python. Assuming there is nothing else, something like this would work: [line.strip() for line in open(filePath, 'r').readlines()] /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] adding fused multiply and add to numpy

On 8 January 2014 22:39, Julian Taylor jtaylor.deb...@googlemail.comwrote: As you can see even without real hardware support it is about 30% faster than inplace unblocked numpy due better use of memory bandwidth. Its even more than two times faster than unoptimized numpy. I have an i5, and AVX crashes, even though it is supported by my CPU. Here are my timings: SSE2: In [24]: %timeit npfma.fma(a, b, c) 10 loops, best of 3: 15 us per loop In [28]: %timeit npfma.fma(a, b, c) 100 loops, best of 3: 2.36 ms per loop In [29]: %timeit npfma.fms(a, b, c) 100 loops, best of 3: 2.36 ms per loop In [31]: %timeit pure_numpy_fma(a, b, c) 100 loops, best of 3: 7.5 ms per loop In [33]: %timeit pure_numpy_fma2(a, b, c) 100 loops, best of 3: 4.41 ms per loop The model supports all the way to sse4_2 libc: In [24]: %timeit npfma.fma(a, b, c) 1000 loops, best of 3: 883 us per loop In [28]: %timeit npfma.fma(a, b, c) 10 loops, best of 3: 88.7 ms per loop In [29]: %timeit npfma.fms(a, b, c) 10 loops, best of 3: 87.4 ms per loop In [31]: %timeit pure_numpy_fma(a, b, c) 100 loops, best of 3: 7.94 ms per loop In [33]: %timeit pure_numpy_fma2(a, b, c) 100 loops, best of 3: 3.03 ms per loop If you have a machine capable of fma instructions give it a spin to see if you get similar or better results. Please verify the assembly (objdump -d fma-suffix.o) to check if the compiler properly used the machine fma. Following the instructions in the readme, there is only one compiled file, npfma.so, but no .o. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] nasty bug in 1.8.0??

I get: In [4]: x.strides Out[4]: (8,) Same architecture and OS, Numpy installed via Pip on Python 2.7.5. On 2 December 2013 20:08, Neal Becker ndbeck...@gmail.com wrote: This is np 1.8.0 on fedora x86_64: In [5]: x =np.array ((1,)) In [6]: x.shape Out[6]: (1,) In [7]: x.strides Out[7]: (9223372036854775807,) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy vbench-marking, compiler comparison

Have you tried on an Intel CPU? I have both a i5 quad core and an i7 octo core where I could run it over the weekend. One may expect some compiler magic taking advantage of the advanced features, specially the i7. /David On Nov 25, 2013 8:16 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 25.11.2013 02:32, Yaroslav Halchenko wrote: On Tue, 15 Oct 2013, Nathaniel Smith wrote: What do you have to lose? btw -- fresh results are here http://yarikoptic.github.io/numpy-vbench/ . I have tuned benchmarking so it now reflects the best performance across multiple executions of the whole battery, thus eliminating spurious variance if estimate is provided from a single point in time. Eventually I expect many of those curves to become even cleaner. On another note, what do you think of moving the vbench benchmarks into the main numpy tree? We already require everyone who submits a bug fix to add a test; there are a bunch of speed enhancements coming in these days and it would be nice if we had some way to ask people to submit a benchmark along with each one so that we know that the enhancement stays enhanced... On this positive note (it is boring to start a new thread, isn't it?) -- would you be interested in me transfering numpy-vbench over to github.com/numpy ? as of today, plots on http://yarikoptic.github.io/numpy-vbench should be updating 24x7 (just a loop, thus no time guarantee after you submit new changes). Besides benchmarking new benchmarks (your PRs would still be very welcome, so far it was just me and Julian T) and revisions, that process also goes through a random sample of existing previously benchmarked revisions and re-runs the benchmarks thus improving upon the ultimate 'min' timing performance. So you can see already that many plots became much 'cleaner', although now there might be a bit of bias in estimates for recent revisions since they hadn't accumulated yet as many of 'independent runs' as older revisions. using the vbench I created a comparison of gcc and clang with different options. Cliffnotes: * gcc -O2 performs 5-10% better than -O3 in most benchmarks, except in a few select cases where the vectorizer does its magic * gcc and clang are very close in performance, but the cases where a compiler wins by a large margin its mostly gcc that wins I have collected some interesting plots on this notebook: http://nbviewer.ipython.org/7646615 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### [Numpy-discussion] RuntimeWarning raised in the wrong line

I have the following code, operating on some arrays. pen = 1 - (real_sum[int(i1 + t)]) / (absolute[int(i1 + t)]) if np.isnan(pen): pen = 0.0 I know that, sometimes, real_sum and absolute are 0 at the designated point, so I should get a RuntimeWarning. But, the warning is puzzling: RuntimeWarning: invalid value encountered in double_scalars pen = 0.0 Is the warning actually being raised at that point? Or can I not really trust the line reported by the warning? I have tried to replicate it in interactive terminal, but without luck, I always get it in the previous line. I am using Python 2.7 and Numpy 1.8.0. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] savetxt fmt argument fails when using a unicode string

On 14 November 2013 09:25, Pierre Haessig pierre.haes...@crans.org wrote: Le 13/11/2013 17:54, Daπid a écrit : np.savetxt('a.csv', [1], fmt=str('%.3f')) Thanks, that's what I did too. I'm just still wondering whether there is a cleaner solution... I have a fix, I am submitting a PR. The only thing I am missing is where the tests for IO are. https://github.com/numpy/numpy/pull/4053 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Precision/value change moving from C to Python

On 13 November 2013 02:40, Bart Baker bart...@gmail.com wrote: That is the order of the machine epsilon for double, that looks like roundoff errors to me. I'm trying to my head around this. So does that mean that neither of them is right, that it is just the result of doing the same calculation two different ways using different computational libraries? Essentially, yes. I am tempted to say that, depending on the compiler flags, the C version *could* be more accurate, as the compiler can reorganise the operations and reduce the number of steps. But also, if it is optimised for speed, it could be using faster and less accurate functions and techniques. In any case, if that 10^-16 matters to you, I'd say you are either doing something wrong or using the wrong dtype; and without knowing the specifics, I would bet on the first one. If you really need that precision, you would have to use more bits, and make sure your library supports that dtype. I believe the following proves that (my) numpy.cos can deal with 128 bits without converting it to float. a = np.array([1.2584568431575694895413875135786543], dtype=np.float128) np.cos(a)-np.cos(a.astype(np.float64)) array([ 7.2099444e-18], dtype=float128) The bottom line is, don't believe nor trust the least significant bits of your floating point numbers. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] savetxt fmt argument fails when using a unicode string

The following works on Numpy 1.8.0: from __future__ import unicode_literals import numpy as np np.savetxt('a.csv', [1], fmt=str('%.3f')) Without the str, I get a clearer error: Traceback (most recent call last): File a.py, line 4, in module np.savetxt('a.csv', [1], fmt='%.3f') File .virtualenv/local/lib/python2.7/site-packages/numpy/lib/npyio.py, line 1049, in savetxt raise ValueError('invalid fmt: %r' % (fmt,)) ValueError: invalid fmt: u'%.3f' /David. On 13 November 2013 17:28, Pierre Haessig pierre.haes...@crans.org wrote: Hi, I just noticed (with numpy 1.7.1) that the following code import numpy as np np.savetxt('a.csv', [1], fmt=u'%.3f') fails with: 1045 else: 1046 for row in X: - 1047 fh.write(asbytes(format % tuple(row) + newline)) 1048 if len(footer) 0: 1049 footer = footer.replace('\n', '\n' + comments) UnboundLocalError: local variable 'format' referenced before assignment (of course, the simple solution is to remove the u prefix, but in my real code, I have a from __future__ import unicode_literals) Since parts of the savetxt function were changed since that, I don't know if this bug is still applicable or not. I'm guessing that since Warren Weckesser commit https://github.com/numpy/numpy/commit/0d284764a855cae11699228ff1b81e6d18f38ed2 the problem would be caught earlier with a much nicer ValueError. However, I still see isinstance(fmt, str): (on line 1035 https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1035 ) How can I make the fmt argument work with unicode_literals ? best, Pierre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Precision/value change moving from C to Python

On 12 November 2013 12:01, Bart Baker bart...@gmail.com wrote: The issue is that there are some minor (10^-16) differences in the values when I do the calculation in C vs Python. That is the order of the machine epsilon for double, that looks like roundoff errors to me. I found similar results cythonising some code, everything was the same until I changed some numpy functions for libc functions (exp, sin, cos...). After some operations in float32, the results were apart for 1 in 10^-5 (the epsilon is 10^-6). I blame them on specific implementation differences between numpy's and my system's libc specific functions. To check equality, use np.allclose, it lets you define the relative and absolute error. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] memmory management question

On 28 October 2013 03:13, Georgios Exarchakis gexarc...@gmail.com wrote: If yes then how do you release memorry by slicing away parts of an array? An array is a single Python object. In your example, there is always one reference pointing to the array (either the whole array or only a view), so the memory cannot be released. In your case, a = a[:1].copy() releases the memory by creating a new object and pointing a to this new one, so the big array gets unreferenced and thus, garbage collected. The price to pay is that, for some time, you have living in memory both the original array and your new one: if they were very big, you could run out of memory; and that you have to perform a memcopy (again, slow if you have lots of data). /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] 1.8.0rc1

On 30 September 2013 17:17, Charles R Harris charlesr.har...@gmail.comwrote: NumPy 1.8.0rc1 is up now on sourceforgehttp://sourceforge.net/projects/numpy/files/NumPy/1.8.0rc1/.The binary builds are included except for Python 3.3 on windows, which will arrive later. Many thanks to Ralf for the binaries, and to those who found and fixed the bugs in the last beta. Any remaining bugs are all my fault ;) I hope this will be the last release before final, so please test it thoroughly. I installed it with # python setup.py install But something is wrong there: import numpy as np Traceback (most recent call last): File stdin, line 1, in module File /usr/lib64/python2.7/site-packages/numpy/__init__.py, line 137, in module import add_newdocs File /usr/lib64/python2.7/site-packages/numpy/add_newdocs.py, line 13, in module from numpy.lib import add_newdoc File /usr/lib64/python2.7/site-packages/numpy/lib/__init__.py, line 4, in module from type_check import * File /usr/lib64/python2.7/site-packages/numpy/lib/type_check.py, line 8, in module import numpy.core.numeric as _nx File /usr/lib64/python2.7/site-packages/numpy/core/__init__.py, line 45, in module from numpy.testing import Tester File /usr/lib64/python2.7/site-packages/numpy/testing/__init__.py, line 10, in module import decorators as dec File /usr/lib64/python2.7/site-packages/numpy/testing/decorators.py, line 19, in module from numpy.testing.utils import \ File /usr/lib64/python2.7/site-packages/numpy/testing/utils.py, line 12, in module from .nosetester import import_nose File /usr/lib64/python2.7/site-packages/numpy/testing/nosetester.py, line 12, in module from numpy.compat import basestring ImportError: cannot import name basestring I am using Python27 on Fedora 19. $ gcc --version gcc (GCC) 4.8.1 20130603 (Red Hat 4.8.1-1) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] 1.8.0rc1

Disregard that, I had not cleaned the previous installation properly. Sorry for the noise. On 1 October 2013 12:11, Daπid davidmen...@gmail.com wrote: On 30 September 2013 17:17, Charles R Harris charlesr.har...@gmail.comwrote: NumPy 1.8.0rc1 is up now on sourceforgehttp://sourceforge.net/projects/numpy/files/NumPy/1.8.0rc1/.The binary builds are included except for Python 3.3 on windows, which will arrive later. Many thanks to Ralf for the binaries, and to those who found and fixed the bugs in the last beta. Any remaining bugs are all my fault ;) I hope this will be the last release before final, so please test it thoroughly. I installed it with # python setup.py install But something is wrong there: import numpy as np Traceback (most recent call last): File stdin, line 1, in module File /usr/lib64/python2.7/site-packages/numpy/__init__.py, line 137, in module import add_newdocs File /usr/lib64/python2.7/site-packages/numpy/add_newdocs.py, line 13, in module from numpy.lib import add_newdoc File /usr/lib64/python2.7/site-packages/numpy/lib/__init__.py, line 4, in module from type_check import * File /usr/lib64/python2.7/site-packages/numpy/lib/type_check.py, line 8, in module import numpy.core.numeric as _nx File /usr/lib64/python2.7/site-packages/numpy/core/__init__.py, line 45, in module from numpy.testing import Tester File /usr/lib64/python2.7/site-packages/numpy/testing/__init__.py, line 10, in module import decorators as dec File /usr/lib64/python2.7/site-packages/numpy/testing/decorators.py, line 19, in module from numpy.testing.utils import \ File /usr/lib64/python2.7/site-packages/numpy/testing/utils.py, line 12, in module from .nosetester import import_nose File /usr/lib64/python2.7/site-packages/numpy/testing/nosetester.py, line 12, in module from numpy.compat import basestring ImportError: cannot import name basestring I am using Python27 on Fedora 19. $ gcc --version gcc (GCC) 4.8.1 20130603 (Red Hat 4.8.1-1) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Valid algorithm for generating a 3D Wiener Process?

On 26 September 2013 10:02, Daπid davidmen...@gmail.com wrote: The simplest way is to do it in cartesian coordinates: take x, y, and z independently from N(0,1). If you want to generate only one normal number per step, consider the jacobian in the angles. Actually, this is wrong, as it would allow displacements (at 1 sigma) of 1 along the axis, but up to sqrt(3) along diagonals. What you actually want is a multivariate normal distribution with covariance proportional to the identity (uncorrelation between axis and isotropy). See formulae here: http://en.wikipedia.org/wiki/Multivariate_normal_distribution ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Valid algorithm for generating a 3D Wiener Process?

On 25 September 2013 19:41, David Goldsmith d.l.goldsm...@gmail.com wrote: the 'angles' that describe the position undergo a random walk [actually, it would seem that they don't, since they too fail the varying-as-white-noise test], so the particle tends to move in the same direction over short intervals--is this just another way of saying that, since I was varying the angles by -1, 0, or 1 unit each time, the simulation is susceptible to unnaturally long strings of -1, 0, or 1 increments? In the 1D case, the white noise has a gaussian probability distribution of being positive or negative. Translated to the Wiener process, it means you would have to sum normally distributed values. When you go 3D you can do the same thing, taking a random displacement from a N(0,1) and two random angles. The issue here is that the polar angles cannot be taken uniformly, but instead they have to be distributed proportionally to the jacobian. As you have it now, your particle will tend to move towards the poles. If you want to visualize it: take a sphere and imagine dots spaced evenly at angles (intersection of meridians and parallels, for example): they are much more dense at the poles. The simplest way is to do it in cartesian coordinates: take x, y, and z independently from N(0,1). If you want to generate only one normal number per step, consider the jacobian in the angles. David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)

On 6 September 2013 21:21, Yaroslav Halchenko li...@onerussian.com wrote: some old ones are still there, some might be specific to my CPU here How long does one run take? Maybe I can run it in my machine (Intel i5) for comparison. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Stick intersection path algorithm

On 5 September 2013 13:14, Josè Luis Mietta joseluismie...@yahoo.com.arwrote: 2. Using networkx or other tool: how can I obtain the 'clusters size' distribution (that is: number of clusters of size 'D', for all cluster-sizes)? This is best asked in their mailing list. A possible way is: np.bincount([len(i) for i in nx.connected_components(G)]) For example: np.bincount([len(i) for i in nx.connected_components(nx.erdos_renyi_graph(100, 0.01))]) array([ 0, 39, 7, 3, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1]) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Stick intersection path algorithm

On 5 September 2013 17:03, Josè Luis Mietta joseluismie...@yahoo.com.arwrote: The array ([ 0, 39, 7, 3, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1]) means that in the sistem (graph) are : 4 cluster of size 1, one cluster of size 3, one cluster of size 7 and one cluste of size 39? No, it means there are 39 of size 1, 7 of size 2 and so on. David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] ANN: Numpy 1.8.0 beta 1 release

On 1 September 2013 18:54, Charles R Harris charlesr.har...@gmail.comwrote: I'm happy to announce the first beta release of Numpy 1.8.0. Please try this beta and report any issues on the numpy-dev mailing list. In an old thread there was discussion about adding minmax and sincos; is there a plan for implementing these? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Stick intersection path algorithm

On 2 September 2013 02:50, Josè Luis Mietta joseluismie...@yahoo.com.arwrote: I wanna see if two line segments are connected by a path formed by line segments intersected. You could build a network where each segment is a node and an intersection is a link. Then, you could use NetworkX connected_components to get groups of segments together. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] RAM problem during code execution - Numpya arrays

On 23 August 2013 16:59, Benjamin Root ben.r...@ou.edu wrote: A lot of the code you have here can be greatly simplified. I would start with just trying to get rid of appends as much as possible and use preallocated arrays with np.empty() or np.ones() or the likes. Also, if you don't know beforehand the final size of the array (I find difficult to follow the flow of the program, it is quite lengthy), you can use lists as a temporary thing: results = [] while SomeCondition: do_something() results.append(res) results = np.array(res) Also, it may also help to trace things down encapsulate pieces of code inside functions. There are two reasons for this: it will make the code more readable, easier to test, and you could run independently pieces of code to find where is your memory growing. I hope it is of any help. David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Reading footer lines

On 13 August 2013 14:20, Resmi l.re...@gmail.com wrote: As a workaround, I've tried using os.system along with grep. And I get the following output : os.system(grep -e 'tx' 'data.dat' ) ## tx =2023.06 0 Why is there a 0 in the output? The file has no blank lines. That 0 corresponds to the exit status of os.system, as there were no errors, it returns a 0. My idea would be to read the file and store it into a StringIO. That can be fed to np.loadtxt to extract the numbers, and look for your number. My (untested) attempt: import StringIO with datafile open as f: ... raw_data = StringIO.StringIO() ... raw_data.write(f.read()) data = np.loadfromtxt(raw_data) numbers = [float(line.split['='][1].strip()) for line in raw_data.readlines() if line[0] == '#'] If there is only one number there and it is after all the data, you could skip them all. From the shape of data you know there are N lines of data: for _ in range(N): ... raw_data.readline() while True: ... line = raw_data.readline() ... if line[0] == '#': .. number = float(line.split['='][1].strip()) .. break Alternatively, you could also use seek to put the pointer a certain distance from the end of the file and start from there, but this could cause problems in Windows. David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] add .H attribute?

An idea: If .H is ideally going to be a view, and we want to keep it this way, we could have a .h() method with the present implementation. This would preserve the name .H for the conjugate view --when someone finds the way to do it. This way we would increase the readability, simplify some matrix algebra code, and keep the API consistency. On 24 July 2013 13:08, Dave Hirschfeld dave.hirschf...@gmail.com wrote: Nathaniel Smith njs at pobox.com writes: As soon as you talk about attributes returning things you've already broken Python's mental model... attributes are things that sit there, not things that execute arbitrary code. Of course this is not how the actual implementation works, attribute access *can* in fact execute arbitrary code, but the mental model is important, so we should preserve it where-ever we can. Just mentioning an attribute should not cause unbounded memory allocations. Yep, sorry - sloppy use of terminology which I agree is important in helping understand what's happening. -Dave ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy.filled, again

On 12 June 2013 17:29, josef.p...@gmail.com wrote: +2 on this. I like a good name like `filled` for a function that I expect to use regularly. it's short and descriptive. FTIW, I am for this one too. That is not only clear, but a name that one may try even before checking the docs. Emtpy-filled sounds too much of an oximoron to me. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Avoiding messy code in Python

On 12 June 2013 18:27, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: without documentation and testing. This one is easy: add documentation and testing! This lecture may help you. It uses unittest, though, but the idea is applicable to whatever system you use: https://python.g-node.org/python-summerschool-2012/software_carpentry And for a brief introduction to best practices, you could see this: https://python.g-node.org/python-summerschool-2012/best_practices David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] About speed vs math...

On 3 June 2013 08:33, David Cournapeau courn...@gmail.com wrote: (around 50 on my own machine, but that's platform specific). In my machine, it is around 10. You also have to be aware of the data container: it is not the same to iterate over lists than over arrays. In [6]: a = np.arange(50) In [7]: %timeit np.sin(a) 10 loops, best of 3: 3.95 us per loop In [8]: %timeit np.array([math.sin(i) for i in a]) # We are working with arrays, we keep the array output. 1 loops, best of 3: 32.9 us per loop In [9]: %timeit [math.sin(i) for i in a]# Keep the list output. 1 loops, best of 3: 21.3 us per loop But, if you stick to lists: In [12]: b = range(50) In [13]: %timeit [math.sin(i) for i in b] 10 loops, best of 3: 12.4 us per loop In any case, switching between iterators and numpy functions is, in my opinion, error prone. I would use math only for scalars, and numpy for the rest, except if optimisation is needed (as a last step). For the record, another interesting example: In [18]: a = np.arange(10) In [19]: b = range(10) In [20]: %timeit a + b ## Mixed container 10 loops, best of 3: 10.4 us per loop In [21]: %timeit [i + j for i, j in zip(a, b)] 10 loops, best of 3: 12 us per loop In [22]: %timeit a + a ## Pure array 100 loops, best of 3: 1.23 us per loop In [23]: %timeit [i + j for i, j in zip(a, a)] 10 loops, best of 3: 7.21 us per loop In [24]: %timeit b + b ## Pure list 100 loops, best of 3: 197 ns per loop ## PS In [25]: %timeit [i + j for i, j in zip(b, b)] 10 loops, best of 3: 1.68 us per loop Here, b + b is just attaching the lists, not adding them. In a program, that may be an error due to forgotten conversion. Out of curiosity, VPython, a visual package oriented to beginners, has its own functions that redirect to math if applied on scalars and to numpy otherwise. David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Name change of the ptp() function

On May 10, 2013 3:18 PM, Robert Kern robert.k...@gmail.com wrote: Sure, it's probably more readable I am not sure of it. I would have to check the docs to see what it means. The mathematical term is range, but it already has a meaning in Python, so it is not a good way to go, being perhaps valuerange the compromise, but not really clear by itself. In some areas, nevertheless, ptp is the standard notation, as it is in electronics - and maybe that is why it made its way into Numeric. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] printing array in tabular form

On 9 May 2013 10:06, Sudheer Joseph sudheer.jos...@yahoo.com wrote: However writing a formatted out put looks to be bit tricky with python relative to other programing languages. If performance is not an issue, you could do it by hand, as you can always do in any programming language: savefile = open('data.txt', 'w') N = len(IL) for start in xrange(N/5): if start+5 N: end = N else: end = start+5 print savefile, IL[start : end] But this is actually more verbose, and once you get into NumPy workflow, it is actually simple. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)

On 7 May 2013 13:47, Sebastian Berg sebast...@sipsolutions.net wrote: Indexing/assignment was the first thing I thought of too (also because fancy indexing/assignment really could use some speedups...). Other then that maybe some timings for small arrays/scalar math, but that might be nice for that GSoC project. Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. Also, what about interfacing with other packages? It may increase the compiling overhead, but I would like to see Cython in action (say, only last version, maybe it can be fixed). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8

On 1 May 2013 03:36, Benjamin Root ben.r...@ou.edu wrote: Are there any other functions that others feel are missing from numpy and would like to see for v1.8? Let's discuss them here. I would like to have sincos, to compute sin and cos of the same number faster. According to some benchmarks, it is barely slower than just computing one of them. On 1 May 2013 07:13, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: Of course, the documentation for discussed before: np.minmax(). My thinking is that it would return a 2xN array How about a tuple: (min, max)? Consider the case of np.minmax(matrix, axis=1), you will end up with a tuple of two arrays. In that scenario, you probably want to do computations with both numbers, so having them in an array seems more convenient. If there is enough reason, we could always add a unpack=True flag and then return a tuple. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8

On 1 May 2013 17:13, Todd toddr...@gmail.com wrote: Speaking of which, I think there should be a function to construct a complex array out of two identically-shaped floating-point arrays, as well as perhaps an np.i class that converts a real array to an imaginary one (using __mul__ and such). np.i would be exactly the same as array * 1j, or am I missing anything? The same goes for constructing a complex, real + imag * 1j ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] GSoC proposal -- Numpy SciPy

On 1 May 2013 20:12, Blake Griffith blake.a.griff...@gmail.com wrote: However, it would still be useful to have ufuncs working well with the sparse package. How are you planning to deal with ufunc(0) != 0? cos(sparse) is actually dense. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8

On 1 May 2013 03:36, Benjamin Root ben.r...@ou.edu wrote: There is one other non-trivial function that have been discussed before: np.minmax(). My thinking is that it would return a 2xN array (where N is whatever size of the result that would be returned if just np.min() was used). This would allow one to do min, max = np.minmax(X). I had been looking for this function in the past, I think it is a good and necessary addition. It should also come with its companion, np.argminmax. David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### [Numpy-discussion] Pruned / sparse FFT 2D

Hi, I have a sparse cloud of points of ~10^4 points randomly scattered around a triangular domain [1] from which I want to take the Fourier transform. I have been looking for algorithms and found one library, but only appears to be for the 1-D case (and seems there is no documentation). In [3] there is C code for the FFTW, but seems it code is itself pruned (pun intended). Does anyone have a clue about how to perform it? Speed is not a big issue, but accuracy is quite important. Thanks, David. [1] http://www.asaaf.org/~davidmh/coverage.png [2] https://pypi.python.org/pypi/pynfftls/1.0 [3] http://www.fftw.org/pruned.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Sources more confusing in Python

On 7 April 2013 16:53, Happyman bahtiyor_zohi...@mail.ru wrote: I completed my Registration..but still no idea how to use it..it is ,may be, lack of experience of python .. You don't need registration. The easiest way to use it is via pip or easy_install. If you are on Windows, download and execute the easy_install installer, if you are on Linux, browse your repository for any of them. Once installed, you can do: $easy_install pip # to install pip, easiest way to get it on Windows $pip install numpy # to install package numpy All this, in the terminal. To get it on Windows, run - cmd ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] timezones and datetime64

On 4 April 2013 11:03, Daniele Nicolodi dani...@grinta.net wrote: I think that generally the issue is not relevant for any practical use of a timebase: there are not many applications requiring sub-second accuracy over many years periods. I agree. I think the basic datetime object should ignore this issue (properly documented), and leaving room for someone writing a datetime_leapsecond object, that would be aware of them. Avoiding this issue we can achieve a better performance and a simpler code base, with enough functionality for most practical purposes. Another point that should be noted: as stated earlier, the leap seconds cannot be predicted, they require a frequent update, making replicability difficult: the same code in two machines with same numpy version, but updated at different times can produce different results. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy

I am a bit worried about the differences in results. Just to be sure you are comparing apples with apples, it may be a good idea to set the seed at the beginning: np.random.seed( SEED ) where SEED is an int. This way, you will be inverting always the same matrix, regardless of the Python version. I think, even if the timing is different, the results should be the same. http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html#numpy.random.seed David. On 23 March 2013 15:39, Colin J. Williams cjwilliam...@gmail.com wrote: On 23/03/2013 7:21 AM, Ralf Gommers wrote: On Fri, Mar 22, 2013 at 10:39 PM, Colin J. Williams cjwilliam...@gmail.com wrote: On 20/03/2013 11:12 AM, Frédéric Bastien wrote: On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams cjwilliam...@gmail.com wrote: On 20/03/2013 10:30 AM, Frédéric Bastien wrote: Hi, win32 do not mean it is a 32 bits windows. sys.platform always return win32 on 32bits and 64 bits windows even for python 64 bits. But that is a good question, is your python 32 or 64 bits? 32 bits. That explain why you have memory problem but not other people with 64 bits version. So if you want to work with bigger input, change to a python 64 bits. Fred Thanks to the people who responded to my report that numpy, with Python 3.2 was significantly slower than with Python 2.7. I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and 3.3.0. The Pythons came from python.org and the Numpys from PyPi. The SciPy site still points to Source Forge, I gathered from the responses that Source Forge is no longer recommended for downloads. That's not the case. The official binaries for NumPy and SciPy are on SourceForge. The Windows installers on PyPI are there to make easy_install work, but they're likely slower than the SF installers (no SSE2/SSE3 instructions). Ralf Thanks, I'll read over Robert Kern's comments. PyPi is the simpler process, but, if the result is unoptimized code, then easy_install is not the way to go. The code is available here(http://web.ncf.ca/cjw/testFPSpeed.py) and the most recent test results are here(http://web.ncf.ca/cjw/FP%2023-Mar-13%20Test%20Summary.txt). These are using PyPi, I'll look into SourceForge. Colin W. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy

Without much detailed knowledge of the topic, I would expect both versions to give very similar timing, as it is essentially a call to ATLAS function, not much is done in Python. Given this, maybe the difference is in ATLAS itself. How have you installed it? When you compile ATLAS, it will do some machine-specific optimisation, but if you have installed a binary chances are that your version is optimised for a machine quite different from yours. So, two different installations could have been compiled in different machines and so one is more suited for your machine. If you want to be sure, I would try to compile ATLAS (this may be difficult) or check the same on a very different machine (like an AMD processor, different architecture...). Just for reference, on Linux Python 2.7 64 bits can deal with these matrices easily. %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat); res = np.dot(mat, matinv); diff= res-np.eye(6143); print np.sum(np.abs(diff)) 2.41799631031e-05 1.13955868701e-05 3.64338191541e-05 1.13484781021e-05 1 loops, best of 3: 156 s per loop Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository (I don't run heavy stuff on this computer). On 20 March 2013 14:46, Colin J. Williams c...@ncf.ca wrote: I have a small program which builds random matrices for increasing matrix orders, inverts the matrix and checks the precision of the product. At some point, one would expect operations to fail, when the memory capacity is exceeded. In both Python 2.7 and 3.2 matrices of order 3,071 area handled, but not 6,143. Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7. The profiler indicates a problem in the solver. Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free disk space. Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2. The results are show below. Colin W. _ 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] order=2 measure ofimprecision= 0.097 Time elapsed (seconds)= 0.004143 order=5 measure ofimprecision= 2.207 Time elapsed (seconds)= 0.001514 order= 11 measure ofimprecision= 2.372 Time elapsed (seconds)= 0.001455 order= 23 measure ofimprecision= 3.318 Time elapsed (seconds)= 0.001608 order= 47 measure ofimprecision= 4.257 Time elapsed (seconds)= 0.002339 order= 95 measure ofimprecision= 4.986 Time elapsed (seconds)= 0.005747 order= 191 measure ofimprecision= 5.788 Time elapsed (seconds)= 0.029974 order= 383 measure ofimprecision= 6.765 Time elapsed (seconds)= 0.145339 order= 767 measure ofimprecision= 7.909 Time elapsed (seconds)= 0.841142 order= 1535 measure ofimprecision= 8.532 Time elapsed (seconds)= 5.793630 order= 3071 measure ofimprecision= 9.774 Time elapsed (seconds)= 39.559540 order= 6143 Process terminated by a MemoryError Above: 2.7.3 Below: Python 3.2.3 bbb_bbb 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] order=2 measure ofimprecision= 0.000 Time elapsed (seconds)= 0.113930 order=5 measure ofimprecision= 1.807 Time elapsed (seconds)= 0.001373 order= 11 measure ofimprecision= 2.395 Time elapsed (seconds)= 0.001468 order= 23 measure ofimprecision= 3.073 Time elapsed (seconds)= 0.001609 order= 47 measure ofimprecision= 5.642 Time elapsed (seconds)= 0.002687 order= 95 measure ofimprecision= 5.745 Time elapsed (seconds)= 0.013510 order= 191 measure ofimprecision= 5.866 Time elapsed (seconds)= 0.061560 order= 383 measure ofimprecision= 7.129 Time elapsed (seconds)= 0.418490 order= 767 measure ofimprecision= 8.240 Time elapsed (seconds)= 3.815713 order= 1535 measure ofimprecision= 8.735 Time elapsed (seconds)= 27.877270 order= 3071 measure ofimprecision= 9.996 Time elapsed (seconds)=212.545610 order= 6143 Process terminated by a MemoryError ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

On 13 March 2013 16:54, Andrea Cimatoribus andrea.cimatori...@nioz.nlwrote: Since I'm in the process of buying new hardware too, a slight OT (but definitely related). Does an ssd provide substantial improvement in these cases? It should help. Nevertheless, when talking about performance, it is difficult to predict, mainly because in a computer there are many things going on and many layers involved. I have a couple of computers equipped with SSD, if you want, if you send me some benchmarks I can run them and see if I get any speedup. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy.test('full') errors and failures

On 12 February 2013 10:55, Dominic Steinitz domi...@steinitz.org wrote: Running unit tests for numpy NumPy version 1.8.0.dev-4600b2f I can see this is not the stable version, try the 1.7 that has been just released. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] number of tests

On 12 February 2013 04:54, Davide Del Vento ddve...@ucar.edu wrote: Ran 3587 tests in 22.211s FAILED (KNOWNFAIL=5, SKIP=11, failures=2) Whereas in a remote batch node (with a script) I get: Ran 3229 tests in 15.642s OK (KNOWNFAIL=5, SKIP=19) On my machine (linux 64 bits) In [3]: np.test('full') Running unit tests for numpy NumPy version 1.7.0 NumPy is installed in /usr/lib64/python2.7/site-packages/numpy Python version 2.7.3 (default, Aug 9 2012, 17:23:57) [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] nose version 1.2.1 -- Ran 4836 tests in 33.016s OK (KNOWNFAIL=5, SKIP=1) Out[3]: nose.result.TextTestResult run=4836 errors=0 failures=0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### [Numpy-discussion] pip install numpy throwing a lot of output.

I have just upgraded numpy with pip on Linux 64 bits with Python 2.7, and I got *a lot* of output, so much it doesn't fit in the terminal. Most of it are gcc commands, but there are many different errors thrown by the compiler. Is this expected? I am not too worried as the test suite passes, but pip is supposed to give only meaningful output (or at least, this is what the creators intended). David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] pip install numpy throwing a lot of output.

On 12 February 2013 14:58, Francesc Alted franc...@continuum.io wrote: Yes, I think that's expected. Just to make sure, can you send some excerpts of the errors that you are getting? Actually the errors are at the beginning of the process, so they are out of the reach of my terminal right now. Seems like pip doesn't keep a log in case of success. The ones I can see are mostly warnings of unused variables and functions, maybe this is the expected behaviour for a library? This errors come from a complete reinstall instead of the original upgrade (the cat closed the terminal, worst excuse ever!): compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/quicksort.c In file included from /usr/include/python2.7/pyconfig.h:6:0, from /usr/include/python2.7/Python.h:8, from numpy/core/src/private/npy_sort.h:5, from numpy/core/src/npysort/quicksort.c.src:32: /usr/include/python2.7/pyconfig-64.h:1170:0: warning: _POSIX_C_SOURCE redefined [enabled by default] In file included from /usr/include/stdlib.h:24:0, from numpy/core/src/npysort/quicksort.c.src:31: /usr/include/features.h:168:0: note: this is the location of the previous definition In file included from /usr/include/python2.7/pyconfig.h:6:0, from /usr/include/python2.7/Python.h:8, from numpy/core/src/private/npy_sort.h:5, from numpy/core/src/npysort/quicksort.c.src:32: /usr/include/python2.7/pyconfig-64.h:1192:0: warning: _XOPEN_SOURCE redefined [enabled by default] In file included from /usr/include/stdlib.h:24:0, from numpy/core/src/npysort/quicksort.c.src:31: /usr/include/features.h:170:0: note: this is the location of the previous definition gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/mergesort.c In file included from /usr/include/python2.7/pyconfig.h:6:0, from /usr/include/python2.7/Python.h:8, from numpy/core/src/private/npy_sort.h:5, from numpy/core/src/npysort/mergesort.c.src:32: /usr/include/python2.7/pyconfig-64.h:1170:0: warning: _POSIX_C_SOURCE redefined [enabled by default] In file included from /usr/include/stdlib.h:24:0, from numpy/core/src/npysort/mergesort.c.src:31: /usr/include/features.h:168:0: note: this is the location of the previous definition In file included from /usr/include/python2.7/pyconfig.h:6:0, from /usr/include/python2.7/Python.h:8, from numpy/core/src/private/npy_sort.h:5, from numpy/core/src/npysort/mergesort.c.src:32: /usr/include/python2.7/pyconfig-64.h:1192:0: warning: _XOPEN_SOURCE redefined [enabled by default] In file included from /usr/include/stdlib.h:24:0, from numpy/core/src/npysort/mergesort.c.src:31: /usr/include/features.h:170:0: note: this is the location of the previous definition gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/heapsort.c In file included from /usr/include/python2.7/pyconfig.h:6:0, from /usr/include/python2.7/Python.h:8, from numpy/core/src/private/npy_sort.h:5, from numpy/core/src/npysort/heapsort.c.src:32: /usr/include/python2.7/pyconfig-64.h:1170:0: warning: _POSIX_C_SOURCE redefined [enabled by default] In file included from /usr/include/stdlib.h:24:0, from numpy/core/src/npysort/heapsort.c.src:31: /usr/include/features.h:168:0: note: this is the location of the previous definition In file included from /usr/include/python2.7/pyconfig.h:6:0, from /usr/include/python2.7/Python.h:8, from numpy/core/src/private/npy_sort.h:5, from numpy/core/src/npysort/heapsort.c.src:32: /usr/include/python2.7/pyconfig-64.h:1192:0: warning: _XOPEN_SOURCE redefined [enabled by default] compile options: '-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: numpy/core/src/umath/umathmodule_onefile.c In file included from

### Re: [Numpy-discussion] Issues to fix for 1.7.0rc2.

On 6 February 2013 08:41, Charles R Harris charlesr.har...@gmail.com wrote: More Python craziness In [6]: print None or 0 0 In [7]: print 0 or None None Just for clarifying this behaviour: In [1]: print None or 0 0 In [2]: print 0 or None None In [3]: val = 0 or None In [4]: print val None ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Insights / lessons learned from NumPy design

On Jan 9, 2013 11:35 AM, Mike Anderson mike.r.anderson...@gmail.com wrote: But I'm curious: what is the main use case for the alternative data types in NumPy? Is it for columns of data of heterogeneous types? or something else? In my case, I have used 32 bit (or lower) arrays due to memory limitations and some significant speedups in certain situations. This was particularly useful when I was preprocessing numerous arrays to especially Boolean data, saved a lot of hd space and I/O. I have used 128 bits when precision was critical, as I was dealing with very small differences. It is also nice to be able to repeat your computation with different precision in order to spot possible numerical instabilities, even if the performance is not great.l David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] Efficient way of binning points and applying functions to these groups

This looks like the perfect work for cython. It it's great opp optimizing loops. Another option is the new Numba, an automatic compiler. David. El 26/12/2012 10:09, Eric Emsellem eric.emsel...@eso.org escribió: Hi! I am looking for an efficient way of doing some simple binning of points and then applying some functions to points within each bin. I have tried several ways, including crude looping over the indices, or using digitize (see below) but I cannot manage to get it as efficient as I need it to be. I have a comparison with a similar (although complex) code in idl, and thought I would ask the forum. In idl there is a way to invert an histogram and get a reverse set of indices (via the histogram function) which seems to do the trick (or maybe it is faster for another reason). Below I provide a (dummy) example of what I wish to achieve. Any hint on how to do this EFFICIENTLY using numpy is most welcome. I need to speed things up quite a bit (at the moment, the version I have, see below, is 10 times slower than the more complex idl routine..., I must be doing something wrong!). thanks!! Eric # I have a random set of data points in 2D with coordinates x,y : import numpy as np x = np.random.random(1000) y = np.random.random(1000) # I have now a 2D grid given by let's say 10x10 grid points: nx = 11 ny = 21 lx = linspace(0,1,nx) ly = linspace(0,1,ny) gx, gy = np.meshgrid(lx, ly) # So my set of 2D bins are (not needed in the solution I present but just for clarity) bins = np.dstack((gx.ravel(), gy.ravel()))[0] # Now I want to have the list of points in each bin and # if the number of points in that bin is larger than 10, apply (dummy) function func1 (see below) # If less than 10, apply (dummy) function func2 so (dum?) # if 0, do nothing # for two dummy functions like (for example): def func1(x) : return x.mean() def func2(x) : return x.std() # One solution would be to use digitize in 1D and histogram in 2D (don't need gx, gy for this one): h = histogram2d(x, y, bins=[lx, ly])[0] digitX = np.digitize(x, lx) digitY = np.digitize(y, ly) # create the output array, with -999 values to make sure I see which ones are not filled in result = np.zeros_like(h) - 999 for i in range(nx-1) : for j in range(ny-1) : selectionofpoints = (digitX == i+1) (digitY == j+1) if h[i,j] 10 : result[i,j] = func1(x[selectionofpoints]) elif h[i,j] 0 : result[i,j] = func2(x[selectionofpoints]) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

### Re: [Numpy-discussion] numpy where function on different sized arrays

A pure Python approach could be: for i, x in enumerate(a): for j, y in enumerate(x): if y in b: idx.append((i,j)) Of course, it is slow if the arrays are large, but it is very readable, and probably very fast if cythonised. David. On Sat, Nov 24, 2012 at 10:19 PM, David Warde-Farley d.warde.far...@gmail.com wrote: M = A[..., np.newaxis] == B will give you a 40x60x20 boolean 3d-array where M[..., i] gives you a boolean mask for all the occurrences of B[i] in A. If you wanted all the (i, j) pairs for each value in B, you could do something like import numpy as np from itertools import izip, groupby from operator import itemgetter id1, id2, id3 = np.where(A[..., np.newaxis] == B) order = np.argsort(id3) triples_iter = izip(id3[order], id1[order], id2[order]) grouped = groupby(triples_iter, itemgetter(0)) d = dict((b_value, [idx[1:] for idx in indices]) for b_value, indices in grouped) Then d[value] is a list of all the (i, j) pairs where A[i, j] == value, and the keys of d are every value in B. On Sat, Nov 24, 2012 at 3:36 PM, Siegfried Gonzi sgo...@staffmail.ed.ac.uk wrote: Hi all This must have been answered in the past but my google search capabilities are not the best. Given an array A say of dimension 40x60 and given another array/vector B of dimension 20 (the values in B occur only once). What I would like to do is the following which of course does not work (by the way doesn't work in IDL either): indx=where(A == B) I understand A and B are both of different dimensions. So my question: what would the fastest or proper way to accomplish this (I found a solution but think is rather awkward and not very scipy/numpy-tonic tough). Thanks -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion