Re: [Numpy-discussion] Fast decrementation of indices
I think you'll find the algorithm below to be a lot faster, especially if the arrays are big. Checking each array index against the list of included or excluded elements is must slower than simply creating a secondary array and looking up whether the elements are included or not. b = np.array([ [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5] ]) a = [1,2,3,7,8] keepdata = np.ones(12, dtype=np.bool) keepdata[a] = False w = np.where(keepdata[b[0]] keepdata[b[1]]) newindex = keepdata.cumsum()-1 c = newindex[b[:,w[0]]] Cheers, Rick On 2 February 2014 20:58, Mads Ipsen mads.ip...@gmail.com wrote: Hi, I have run into a potential 'for loop' bottleneck. Let me outline: The following array describes bonds (connections) in a benzene molecule b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5]] ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In practical examples, the list can be much larger (N 100.000 connections. Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds connecting those atoms must be deleted. I achieve this doing i_0 = numpy.in1d(b[0], a) i_1 = numpy.in1d(b[1], a) b_i = numpy.where(i_0 | i_1)[0] b = b[:,~(i_0 | i_1)] If you find this approach lacking, feel free to comment. This results in the following updated bond list b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]] This list is however not correct: Since atoms [1,2,3,7,8] have been deleted, the remaining atoms with indices larger than the deleted atoms must be decremented. I do this as follows: for i in a: b = numpy.where(b i, bonds-1, bonds) (*) yielding the correct result b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]] The Python for loop in (*) may easily contain 50.000 iteration. Is there a smart way to utilize numpy functionality to avoid this? Thanks and best regards, Mads -- +-+ | Mads Ipsen | +--+--+ | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby| email: mads.ip...@gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 38, Issue 11
The difference between IDL and numpy is that IDL uses single precision floats by default while numpy uses doubles. If you try it with doubles in IDL, you will see that it also returns false. As David Cournapeau said, you should not expect different floating point arithmetic operations to give exactly the same result. It's just luck if they do. E.g., in IDL you'll find that 0.1+0.6 is not equal to 0.7. That's because 0.1 and 0.6 are not exactly representable as floats. On Nov 4, 2009, Jean-Luc Menut wrote: Hello all, If if define 2 variables a and b by doing the following : on [5]: a Out[5]: array([ 1.7]) In [6]: b=array([0.8])+array([0.9]) In [7]: b Out[7]: array([ 1.7]) if I test the equality of a and b, instead to obatin True, I have : In [8]: a==b Out[8]: array([False], dtype=bool) I know it not related only to numpy but also to python. How do you tackle this problem ? (I also tried on IDL and it works well : I guess it truncate the number). Thanks , Jean-Luc ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] exec: bad practice?
You're not supposed to write to the locals() dictionary. Sometimes it works, but sometimes it doesn't. From the Python library docs: locals() Update and return a dictionary representing the current local symbol table. Note: The contents of this dictionary should not be modified; changes may not affect the values of local variables used by the interpreter. I think the only way to create a variable with a program-specified name in the local namespace is to use exec (but I'd be happy to be corrected). Cheers, Rick On Sep 15, 2009 Sebastien Binet wrote: hi John, I have a bit of code where I create arrays with meaningful names via: meat = ['beef','lamb','pork'] cut = ['ribs','cutlets'] for m in meat: for c in cut: exec(consumed_%s_%s = np.zeros ((numxgrid,numygrid,nummeasured)) % (m,c)) Is this 'pythonic'? Or is it bad practice (and potentially slow) to use the 'exec' statement? usage of the exec statement is usually frown upon and can be side stepped. e.g: for m in meat: for c in cut: locals()['consumed_%s_%s' % (m,c)] = some_array hth, sebastien. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] puzzle: generate index with many ranges
Here's a technique that works: Python 2.4.2 (#5, Nov 21 2005, 23:08:11) [GCC 4.0.0 20041026 (Apple Computer, Inc. build 4061)] on darwin Type help, copyright, credits or license for more information. import numpy as np a = np.array([0,4,0,11]) b = np.array([-1,11,4,15]) rangelen = b-a+1 cumlen = rangelen.cumsum() c = np.arange(cumlen[-1],dtype=np.int32) c += np.repeat(a[1:]-c[cumlen[0:-1]], rangelen[1:]) print c [ 4 5 6 7 8 9 10 11 0 1 2 3 4 11 12 13 14 15] The basic idea is that the difference of your desired output from a simple range is an array with a bunch of constant values appended together, and that is what repeat() does. I'm assuming that you'll never have b a. Notice the slight ugliness of prepending the elements at the beginning so that the cumsum starts with zero. (Maybe there is a cleaner way to do that.) This does create a second array (via the repeat) that is the same length as the result. If that uses too much memory, you could break up the repeat and update of c into segments using a loop. (You wouldn't need a loop for every a,b element -- do a bunch in each iteration.) -- Rick Raik Gruenberg wrote: Hi there, perhaps someone has a bright idea for this one: I want to concatenate ranges of numbers into a single array (for indexing). So I have generated an array a with starting positions, for example: a = [4, 0, 11] I have an array b with stop positions: b = [11, 4, 15] and I would like to generate an index array that takes 4..11, then 0..4, then 11..15. In reality, a and b have 1+ elements and the arrays to be sliced are very large so I want to avoid any for loops etc. Any idea how this could be done? I thought some combination of *repeat* and adding of *arange* should do the trick but just cannot nail it down. Thanks in advance for any hints! Greetings, Raik ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Histograms of extremely large data sets
On Dec 14, 2006, at 2:56 AM, Cameron Walsh wrote: At some point I might try and test different cache sizes for different data-set sizes and see what the effect is. For now, 65536 seems a good number and I would be happy to see this replace the current numpy.histogram. I experimented a little on my machine and found that 64k was a good size, but it is fairly insensitive to the size over a wide range (16000 to 1e6). I'd be interested to hear how this scales on other machines -- I'm pretty sure that the ideal size will keep the piece of the array being sorted smaller than the on-chip cache. Just so we don't get too smug about the speed, if I do this in IDL on the same machine it is 10 times faster (0.28 seconds instead of 4 seconds). I'm sure the IDL version uses the much faster approach of just sweeping through the array once, incrementing counts in the appropriate bins. It only handles equal-sized bins, so it is not as general as the numpy version -- but equal-sized bins is a very common case. I'd still like to see a C version of histogram (which I guess would need to be a ufunc) go into the core numpy. Rick ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Histograms of extremely large data sets
On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote: I'm trying to generate histograms of extremely large datasets. I've tried a few methods, listed below, all with their own shortcomings. Mailing-list archive and google searches have not revealed any solutions. The numpy.histogram function can be modified to use memory much more efficiently when the input array is large, and the modification turns out to be faster even for smallish arrays (in my tests, anyway). Below is a modified version of the histogram function from function_base.py. It is almost identical, but it avoids doing the sort of the entire input array simply by dividing it into blocks. (It would be even better to avoid the call to ravel too.) The only other messy detail is that the builtin range function is shadowed by the 'range' parameter. In my timing tests this is about the same speed for arrays about the same size as the block size and is faster than the current version by 30-40% for large arrays. The speed difference increases as the array size increases. I haven't compared this to Eric's weave function, but this has the advantages of being pure Python and of being much simpler. On my machine (MacBook Pro) it takes about 4 seconds for an array with 100 million elements. The time increases perfectly linearly with array size for arrays larger than a million elements. Rick from numpy import * lrange = range def histogram(a, bins=10, range=None, normed=False): a = asarray(a).ravel() if not iterable(bins): if range is None: range = (a.min(), a.max()) mn, mx = [mi+0.0 for mi in range] if mn == mx: mn -= 0.5 mx += 0.5 bins = linspace(mn, mx, bins, endpoint=False) # best block size probably depends on processor cache size block = 65536 n = sort(a[:block]).searchsorted(bins) for i in lrange(block,len(a),block): n += sort(a[i:i+block]).searchsorted(bins) n = concatenate([n, [len(a)]]) n = n[1:]-n[:-1] if normed: db = bins[1] - bins[0] return 1.0/(a.size*db) * n, bins else: return n, bins ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion