Re: [Numpy-discussion] python array
The difference appears to be that the boolean selection pulls out all data values = 0.5 whether or not they are masked, and then carries over the appropriate masks to the new array. So r2010 and bt contain identical unmasked values but different numbers of masked values. Because the initial fill value for your masked values was a large negative number, in r2010 those masked values are carried over. In bt, you've taken the absolute value of the data array, so those fill values are now positive and they are no longer carried over into the indexed array. Because the final arrays are still masked, you are observing no difference in the statistical properties of the arrays, only their sizes, because one contains many more masked values than the other. I don't think this should be a problem for your computations. If you're concerned, you could always explicitly demask them before your computations. See the example problem below. ~Brett In [61]: import numpy as np In [62]: import numpy.ma as ma In [65]: a = np.arange(-8, 8).reshape((4, 4)) In [66]: a Out[66]: array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [68]: b = ma.masked_array(a, mask=a 0) In [69]: b Out[69]: masked_array(data = [[-- -- -- --] [-- -- -- --] [0 1 2 3] [4 5 6 7]], mask = [[ True True True True] [ True True True True] [False False False False] [False False False False]], fill_value = 99) In [70]: b.data Out[70]: array([[-8, -7, -6, -5], [-4, -3, -2, -1], [ 0, 1, 2, 3], [ 4, 5, 6, 7]]) In [71]: c = abs(b) In [72]: c[c = 4].shape Out[72]: (9L,) In [73]: b[b = 4].shape Out[73]: (13L,) In [74]: b[b = 4] Out[74]: masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3 4], mask = [ True True True True True True True True False False False False False], fill_value = 99) In [75]: c[c = 4] Out[75]: masked_array(data = [-- -- -- -- 0 1 2 3 4], mask = [ True True True True False False False False False], fill_value = 99) On Thu, Mar 13, 2014 at 8:14 PM, Sudheer Joseph sudheer.jos...@yahoo.comwrote: Sorry, The below solution I thoght working was not working but was just giving array size. On Fri, 14/3/14, Sudheer Joseph sudheer.jos...@yahoo.com wrote: Subject: Re: [Numpy-discussion] python array To: Discussion of Numerical Python numpy-discussion@scipy.org Date: Friday, 14 March, 2014, 1:09 AM Thank you very much Nicolas and Chris, The hint was helpful and from that I treid below steps ( a crude way I would say) and getting same result now I have been using abs available by default and it is the same with numpy.absolute( i checked). nr= ((r2010r2010.min()) (r2010r2010.max())) nr[nr.5].shape Out[25]: (33868,) anr=numpy.absolute(nr) anr[anr.5].shape Out[27]: (33868,) This way I used may have problem when mask used has values which can affect the min max operation. So I would like to know if there is a standard formal ( python/numpy) way to handle masked array when they need to be subjected to boolean operations. with best regards, Sudheer *** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.in...@gmail.com;sudheer.jos...@yahoo.com Web- http://oppamthadathil.tripod.com *** On Thu, 13/3/14, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: Subject: Re: [Numpy-discussion] python array To: Discussion of Numerical Python numpy-discussion@scipy.org Date: Thursday, 13 March, 2014, 11:53 PM On Mar 13, 2014, at 9:39 AM, Nicolas Rougier nicolas.roug...@inria.fr wrote: Seems to be related to the masked values: Good hint -- a masked array keeps the junk values in the main array. What abs are you using -- it may not be mask-aware. ( you want a numpy abs anyway) Also -- I'm not sure I know what happens with Boolean operators on masked arrays when you use them to index. I'd investigate that. (sorry, not at a machine I can play with now) Chris print r2010[:3,:3] [[-- -- --] [-- -- --] [-- -- --]] print abs(r2010)[:3,:3] [[-- -- --] [-- -- --] [-- -- --]] print r2010[ r2010[:3,:3] 0 ] [-- -- -- -- -- -- -- -- --] print r2010[ abs(r2010)[:3,:3] 0] [] Nicolas On 13 Mar 2014, at 16:52, Sudheer Joseph sudheer.jos...@yahoo.com wrote:
Re: [Numpy-discussion] Robust Sorting of Points
Here's some code implementing the replace similar values with an arbitrarily chosen one (in this case the smallest of the similar values). I didn't see any way to do this cleverly with strides, so I just did a simple loop. It's about 100 times slower in pure Python, or a bit under 10 times slower if you're willing to use a bit of Cython. Not sure if this is good enough for your purposes. I imagine you could go a bit faster if you were willing to do the lexical integration by hand (since you've already done the separate sorting of each subarray for value replacement purposes) instead of passing that off to np.lexsort. Note that this approach will only work if your points are not only well-separated in space but also either well-separated or identical in each dimension as well. It's OK to have points with the same, say, x value, but if you have points that have close x values before the noise is added, then the noise can move intermediate points around in the sort order. It works well with the gridded data I used as a sample, but if you're, say, generating random points, this could be a problem: point 1 is (1, 0, 1e-12) point 2 is (0, 1, 0) These are well separated. The algorithm will pool those z values and report 1 as coming before 2. Unless you get jitter like this: point 1: (1, 0, 1.5e-12) point 2: (0, 1, -0.5e-12) Now they won't be pooled any more and we'll get 2 as coming before 1. Anyway, here's the code: In [1]: import numpy as np In [2]: def gen_grid(n, d): #Generate a bunch of grid points, n in each dimension of spacing d vals = np.linspace(0, (n-1)*d, n) x, y, z = np.meshgrid(vals, vals, vals) grid = np.empty((n**3, 3)) grid[:,0] = x.flatten() grid[:,1] = y.flatten() grid[:,2] = z.flatten() return grid def jitter(array, epsilon=1e-12): #Add random jitter from a uniform distribution of width epsilon return array + np.random.random(array.shape) * epsilon - epsilon / 2 In [3]: grid = gen_grid(4, 0.1) print np.lexsort(grid.T) print np.lexsort(jitter(grid.T)) [ 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63] [60 4 48 32 40 12 36 28 44 56 16 8 24 0 52 20 45 25 49 1 53 29 9 33 5 61 41 37 17 13 21 57 22 50 18 10 2 62 58 54 6 34 26 42 38 46 14 30 3 11 55 63 27 15 35 43 31 39 7 59 47 23 51 19] In [4]: def pool_values(A, epsilon=1e-12): idx = np.argsort(A) for i in range(1, len(A)): if A[idx[i]] - A[idx[i-1]] epsilon: A[idx[i]] = A[idx[i-1]] return A def stable_sort(grid): return np.lexsort((pool_values(grid[:,0]), pool_values(grid[:,1]), pool_values(grid[:,2]))) In [5]: print stable_sort(grid) print stable_sort(jitter(grid)) [ 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63] [ 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63] In [6]: %timeit np.lexsort(jitter(grid.T)) 10 loops, best of 3: 10.4 µs per loop In [7]: %timeit stable_sort(jitter(grid)) 1000 loops, best of 3: 1.39 ms per loop In [8]: %load_ext cythonmagic In [12]: %%cython import numpy as np cimport numpy as np cdef fast_pool_values(double[:] A, double epsilon=1e-12): cdef long[:] idx = np.argsort(A) cdef int i for i in range(1, len(A)): if A[idx[i]] - A[idx[i-1]] epsilon: A[idx[i]] = A[idx[i-1]] return A def fast_stable_sort(grid): return np.lexsort((fast_pool_values(grid[:,0]), fast_pool_values(grid[:,1]), fast_pool_values(grid[:,2]))) In [10]: %timeit np.lexsort(jitter(grid.T)) 1 loops, best of 3: 38.5 µs per loop In [13]: %timeit fast_stable_sort(jitter(grid)) 1000 loops, best of 3: 309 µs per loop On Sun, Oct 27, 2013 at 5:41 PM, Freddie Witherden fred...@witherden.orgwrote: On 27/10/13 21:05, Jonathan March wrote: If an almost always works solution is good enough, then sort on the distance to some fixed random point that is in the vicinity of your N points. I had considered this. Unfortunately I need a solution which really does always work. The only pure-Python solution I can envision -- at the moment anyway -- is to do some cleverness with the output of np.unique to identify similar values and replace them with an arbitrarily chosen one. This should permit the output to be passed to np.lexsort without issue. Regards, Freddie. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Stick (line segments) percolation algorithm - graph theory?
I can see a couple opportunities for improvements in your algorithm. Running your code on a single experiment, I get about 2.9 seconds to run. I get this down to about 1.0 seconds by (1) exploiting the symmetry of the M matrix and (2) avoiding the costly inner loop over k in favor of array operations: def check_segments(j, others, data): x1, y1, x2, y2 = data x_A1B1 = x2[j]-x1[j] y_A1B1 = y2[j]-y1[j] x_A1A2 = x1[others]-x1[j] y_A1A2 = y1[others]-y1[j] x_A2A1 = -1*x_A1A2 y_A2A1 = -1*y_A1A2 x_A2B2 = x2[others]-x1[others] y_A2B2 = y2[others]-y1[others] x_A1B2 = x2[others]-x1[j] y_A1B2 = y2[others]-y1[j] x_A2B1 = x2[j]-x1[others] y_A2B1 = y2[j]-y1[others] p1 = x_A1B1*y_A1A2 - y_A1B1*x_A1A2 p2 = x_A1B1*y_A1B2 - y_A1B1*x_A1B2 p3 = x_A2B2*y_A2B1 - y_A2B2*x_A2B1 p4 = x_A2B2*y_A2A1 - y_A2B2*x_A2A1 condition_1=p1*p2 condition_2=p3*p4 return (p1 * p2 = 0) (p3 * p4 = 0) for j in xrange(1, N): valid = check_segments(j, range(j), (x1, y1, x2, y2)) M[j,0:j] = valid M[0:j,j] = valid I don't see any other particularly simple ways to improve this. You could probably add an interval check to ensure that the x and y intervals for the segments of interest overlap before doing the full check, but how much that would help would depend on the implementations. ~Brett On Fri, Aug 23, 2013 at 5:09 PM, Josè Luis Mietta joseluismie...@yahoo.com.ar wrote: I wrote an algorithm for study stick percolation (i.e.: networks between line segments that intersect between them). In my algorithm N sticks (line segments) are created inside a rectangular box of sides 'b' and 'h' and then, one by one, the algorithm explores the intersection between all line segments. This is a Monte Carlo simulation, so the 'experiment' is executed many times (no less than 100 times). Written like that, very much RAM is consumed: Here, the element Mij=1 if stick i intersects stick j and Mij=0 if not. How can I optimize my algorithm? Graph theory is useful in this case? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimize removing nan-values of dataset
The example data/method you've provided doesn't do what you describe. E.g., in your example data you have several 2x2 blocks of NaNs. According to your description, these should not be replaced (as they all have a neighbor that is also a NaN). Your example method, however, replaces them - in fact, replaces any NaN values that are not in the first or last row or contiguous with NaNs in the first or last row. Here's a replacement method that does do what you've described: def nan_to_mean(data): data[1:-1][np.isnan(data[1:-1])] = ((data[:-2] + data[2:]) / 2)[np.isnan(data[1:-1])] return data ~Brett On Tue, Aug 13, 2013 at 1:50 AM, Thomas Goebel thomas.goe...@th-nuernberg.de wrote: Hi, i am trying to remove nan-values from an array of shape(40, 6). These nan-values at point data[x] should be replaced by the mean of data[x-1] and data[x+1] if both values at x-1 and x+1 are not nan. The function nan_to_mean (see below) is working but i wonder if i could optimize the code. I thought about something like 1. Find all nan values in array: nans = np.isnan(dataarray) 2. Check if values before, after nan indice are not nan 3. Calculate mean While using this script for my original dataset of shape(63856, 6) it takes 139.343 seconds to run it. And some datasets are even bigger. I attached the example_dataset.txt and the example.py script. Thanks for any help, Tom def nan_to_mean(arr): for cnt, value in enumerate(arr): # Check if first value is nan, if so continue if cnt == 0 and np.isnan(value): continue # Check if last value is nan: # If x-1 value is nan dont do anything! # If x-1 is float, last value will be value of x-1 elif cnt == (len(arr)-1): if np.isnan(value) and not np.isnan(arr[cnt-1]): arr[cnt] = arr[cnt-1] # If the first values of file are nan ignore them all elif np.isnan(value) and np.isnan(arr[cnt-1]): continue # Found nan value and x-1 value is of type float elif np.isnan(value) and not np.isnan(arr[cnt-1]): # Check if x+1 value is not nan if not np.isnan(arr[cnt+1]): arr[cnt] = '%.1f' % np.mean(( arr[cnt-1],arr[cnt+1])) # If x+1 value is nan, go to next value else: for N in xrange(2, 30): if cnt+N == (len(arr)): break elif not np.isnan(arr[cnt+N]): arr[cnt] = '%.1f' % np.mean( (arr[cnt-1], arr[cnt+N])) return arr ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Smart way to do this?
a = np.ones(30) idx = np.array([2, 3, 2]) a += 2 * np.bincount(idx, minlength=len(a)) a array([ 1., 1., 5., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) As for speed: def loop(a, idx): for i in idx: a[i] += 2 def count(a, idx): a += 2 * np.bincount(idx, minlength=len(a)) %timeit loop(np.ones(30), np.array([2, 3, 2])) 1 loops, best of 3: 19.9 us per loop %timeit count(np.ones(30), np.array(2, 3, 2])) 10 loops, best of 3: 19.2 us per loop So no big difference here. But go to larger systems and you'll see a huge difference: %timeit loop(np.ones(1), np.random.randint(1, size=10)) 1 loops, best of 3: 260 ms per loop %timeit count(np.ones(1), np.random.randint(1, size=10)) 100 loops, best of 3: 3.03 ms per loop. ~Brett On Fri, Feb 22, 2013 at 8:38 PM, santhu kumar mesan...@gmail.com wrote: Sorry typo : a = np.ones(30) idx = np.array([2,3,2]) # there is a duplicate index of 2 a[idx] += 2 On Fri, Feb 22, 2013 at 8:35 PM, santhu kumar mesan...@gmail.com wrote: Hi all, I dont want to run a loop for this but it should be possible using numpy smart ways. a = np.ones(30) idx = np.array([2,3,2]) # there is a duplicate index of 2 a += 2 a array([ 1., 1., 3., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) But if we do this : for i in range(idx.shape[0]): a[idx[i]] += 2 a array([ 1., 1., 5., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) How to achieve the second result without looping?? Thanks Santhosh ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Is there a more efficient way to do this?
On Wed, Aug 8, 2012 at 9:19 AM, Laszlo Nagy gand...@shopzeus.com wrote: Is there a more efficient way to calculate the slices array below? I do not want to make copies of DATA, because it can be huge. The argsort is fast enough. I just need to create slices for different dimensions. The above code works, but it does a linear time search, implemented in pure Python code. For every iteration, Python code is executed. For 1 million rows, this is very slow. Is there a way to produce slices with numpy code? I could write C code for this, but I would prefer to do it with mass numpy operations. Thanks, Laszlo #Code import numpy as np #rows between 100 to 1M rows = 1000 data = np.random.random_integers(0, 100, rows) def get_slices_slow(data): o = np.argsort(data) slices = [] prev_val = None sidx = -1 for oidx, rowidx in enumerate(o): val = data[rowidx] if not val == prev_val: if prev_val is None: prev_val = val sidx = oidx else: slices.append((prev_val, sidx, oidx)) sidx = oidx prev_val = val if (sidx = 0) and (sidx rows): slices.append((val, sidx, rows)) slices = np.array(slices, dtype=np.int64) return slices def get_slices_fast(data): nums = np.unique(data) slices = np.zeros((len(nums), 3), dtype=np.int64) slices[:,0] = nums count = 0 for i, num in enumerate(nums): count += (data == num).sum() slices[i,2] = count slices[1:,1] = slices[:-1,2] return slices def get_slices_faster(data): nums = np.unique(data) slices = np.zeros((len(nums), 3), dtype=np.int64) slices[:,0] = nums count = np.bincount(data) slices[:,2] = count.cumsum() slices[1:,1] = slices[:-1,2] return slices #Testing in ipython In [2]: (get_slices_slow(data) == get_slices_fast(data)).all() Out[2]: True In [3]: (get_slices_slow(data) == get_slices_faster(data)).all() Out[3]: True In [4]: timeit get_slices_slow(data) 100 loops, best of 3: 3.51 ms per loop In [5]: timeit get_slices_fast(data) 1000 loops, best of 3: 1.76 ms per loop In [6]: timeit get_slices_faster(data) 1 loops, best of 3: 116 us per loop So using the fast bincount and array indexing methods gets you about a factor of 30 improvement. Even just doing the counting in a loop with good indexing will get you a factor of 2. ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy array in networkx graph?
This seems to work: import networkx as nx import pylab import numpy as N M = N.random.random((10, 10)) G = nx.Graph(M) node_colors = [] for i in xrange(len(M)): if M[i,0] 0.5: node_colors.append('white') else: node_colors.append('blue') nx.draw(G, node_color=node_colors) pylab.show() ~Brett On Tue, Jun 12, 2012 at 1:49 PM, bob tnur bobtnu...@gmail.com wrote: can anyone give me a hint on the following code? import network as nx import pylab as plt G=nx.Graph(M) # M is numpy matrix ,i.e:type(M)=numpy.ndarray for i in xrange(len(M)): tt=P[i,:].sum() if tt==1: G.add_node(i,color='blue') elif tt==2: G.add_node(i,color='red') elif tt==3: G.add_node(i,color='white') else: tt==4 G.add_node(i,color='green') G.nodes(data=True) T=nx.draw(G) plt.axis('off') plt.savefig(test.png) I didn't get color change, still the defualt color is used.Did I miss something? my aim is to obtain: something like: find total number of w-red-red-z path number of w-red-red-red-z path number of w-red-red-red-red-z path where w (left side of some cyclic polygon(can also be conjugated ring)) and z(right-side of it)are any of the colors except red. any comment is appreciated? Thanks Bob ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] all elements equal
Another issue to watch out for is if the array is empty. Technically speaking, that should be True, but some of the solutions offered so far would fail in this case. Similarly, NaNs or Infs could cause problems: they should signal as False, but several of the solutions would return True. ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Forbidden charcter in the names argument of genfromtxt?
On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes hugad...@gwmail.gwu.edu wrote: Hey everyone, I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data: name1.txt name2.txt name3.txt 32 34 953 32 03 402 I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command: print data['name1.txt'] Nothing happens. However, when I remove the file extension, Eg: name1 name2 name3 32 34 953 32 03 402 Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand. It looks like the period is just getting stripped out of the names: In [1]: import numpy as N In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', 'f8'), ('name2txt', 'f8'), ('name3txt', 'f8')]) Interestingly, this still happens if you supply the names manually: In [17]: def reader(filename): : infile = open(filename, 'r') : names = infile.readline().split() : data = N.genfromtxt(infile, names=names) : infile.close() : return data : In [20]: data = reader('sample.txt') In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', 'f8'), ('name2txt', 'f8'), ('name3txt', 'f8')]) What you can do is reset the names after genfromtxt is through with it, though: In [34]: def reader(filename): : infile = open(filename, 'r') : names = infile.readline().split() : infile.close() : data = N.genfromtxt(filename, names=True) : data.dtype.names = names : return data : In [35]: data = reader('sample.txt') In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', 'f8'), ('name2.txt', 'f8'), ('name3.txt', 'f8')]) Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems. ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (no subject)
The namespace is different. If you want to use numpy.sin(), for example, you would use: import numpy as np np.sin(angle) or from numpy import * sin(angle) I generally prefer the first option because then I don't need to worry about multiple imports writing on top of each other (i.e., having test functions in several modules, and then accidentally using the wrong one). ~Brett On Mon, Feb 6, 2012 at 1:21 PM, Debashish Saha silid...@gmail.com wrote: basic difference between the commands: import numpy as np from numpy import * ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Addressing arrays
On Mon, Jan 30, 2012 at 10:57 AM, Ted To rainexpec...@theo.to wrote: Sure thing. To keep it simple suppose I have just a two dimensional array (time,output): [(1,2),(2,3),(3,4)] I would like to look at all values of output for which, for example time==2. My actual application has a six dimensional array and I'd like to look at the contents using one or more of the first three dimensions. Many thanks, Ted Couldn't you just do something like this with boolean indexing: In [1]: import numpy as np In [2]: a = np.array([(1,2),(2,3),(3,4)]) In [3]: a Out[3]: array([[1, 2], [2, 3], [3, 4]]) In [4]: mask = a[:,0] == 2 In [5]: mask Out[5]: array([False, True, False], dtype=bool) In [6]: a[mask,1] Out[6]: array([3]) ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Addressing arrays
On Mon, Jan 30, 2012 at 11:31 AM, Ted To rainexpec...@theo.to wrote: On 01/30/2012 12:13 PM, Brett Olsen wrote: On Mon, Jan 30, 2012 at 10:57 AM, Ted To rainexpec...@theo.to wrote: Sure thing. To keep it simple suppose I have just a two dimensional array (time,output): [(1,2),(2,3),(3,4)] I would like to look at all values of output for which, for example time==2. My actual application has a six dimensional array and I'd like to look at the contents using one or more of the first three dimensions. Many thanks, Ted Couldn't you just do something like this with boolean indexing: In [1]: import numpy as np In [2]: a = np.array([(1,2),(2,3),(3,4)]) In [3]: a Out[3]: array([[1, 2], [2, 3], [3, 4]]) In [4]: mask = a[:,0] == 2 In [5]: mask Out[5]: array([False, True, False], dtype=bool) In [6]: a[mask,1] Out[6]: array([3]) ~Brett Thanks! That works great if I only want to search over one index but I can't quite figure out what to do with more than a single index. So suppose I have a labeled, multidimensional array with labels 'month', 'year' and 'quantity'. a[['month','year']] gives me an array of indices but a[['month','year']]==(1,1960) produces False. I'm sure I simply don't know the proper syntax and I apologize for that -- I'm kind of new to numpy. Ted You'd want to update your mask appropriately to get everything you want to select, one criteria at a time e.g.: mask = a[:,0] == 1 mask = a[:,1] == 1960 Alternatively: mask = (a[:,0] == 1) (a[:,1] == 1960) but be careful with the parens, and | are normally high-priority bitwise operators and if you leave the parens out, it will try to bitwise-and 1 and a[:,1] and throw an error. If you've got a ton of parameters, you can combine these more aesthetically with: mask = (a[:,[0,1]] == [1, 1960]).all(axis=1) ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to output array with indexes to a text file?
On Thu, Aug 25, 2011 at 2:10 PM, Paul Menzel paulepan...@users.sourceforge.net wrote: is there an easy way to also save the indexes of an array (columns, rows or both) when outputting it to a text file. For saving an array to a file I only found `savetxt()` [1] which does not seem to have such an option. Adding indexes manually is doable but I would like to avoid that. Is there a way to accomplish that task without reserving the 0th row or column to store the indexes? I want to process these text files to produce graphs and MetaPost’s [2] graph package needs these indexes. (I know about Matplotlib [3], but I would like to use MetaPost.) Thanks, Paul Why don't you just write a wrapper for numpy.savetxt that adds the indices? E.g.: In [1]: import numpy as N In [2]: a = N.arange(6,12).reshape((2,3)) In [3]: a Out[3]: array([[ 6, 7, 8], [ 9, 10, 11]]) In [4]: def save_with_indices(filename, output): ...: (rows, cols) = output.shape ...: tmp = N.hstack((N.arange(1,rows+1).reshape((rows,1)), output)) ...: tmp = N.vstack((N.arange(cols+1).reshape((1,cols+1)), tmp)) ...: N.savetxt(filename, tmp, fmt='%8i') ...: In [5]: N.savetxt('noidx.txt', a, fmt='%8i') In [6]: save_with_indices('idx.txt', a) 'noidx.txt' looks like: 678 9 10 11 'idx.txt' looks like: 0123 1678 29 10 11 ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice
On Tue, Aug 2, 2011 at 9:44 AM, Jeremy Conlin jlcon...@gmail.com wrote: I am trying to create a numpy array from some text I'm reading from a file. Ideally, I'd like to create a structured array with the first element as an int and the remaining as floats. I'm currently unsuccessful in my attempts. I've copied a simple script below that shows what I've done and the wrong output. Can someone please show me what is happening? I'm using numpy version 1.5.1 under Python 2.7.1 on a Mac running Snow Leopard. Thanks, Jeremy I'd use numpy.loadtxt: In [1]: import numpy, StringIO In [2]: l = ' 32000 7.89131E-01 8.05999E-03 3.88222E+03' In [3]: tfc_dtype = numpy.dtype([('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')]) In [4]: input = StringIO.StringIO(l) In [5]: numpy.loadtxt(input, dtype=tfc_dtype) Out[5]: array((32000L, 0.789131003, 0.00805998995, 3882.21998), dtype=[('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')]) In [6]: input.close() In [7]: input = StringIO.StringIO(l) In [8]: numpy.loadtxt(input) Out[8]: array([ 3.2000e+04, 7.89131000e-01, 8.05999000e-03, 3.88222000e+03]) In [9]: input.close() If you're reading from a file you can replace the StringIO objects with file objects. ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array.
This method is probably simpler: In [1]: import numpy as N In [2]: A = N.random.random_integers(-10, 10, 25).reshape((5, 5)) In [3]: A Out[3]: array([[ -5, 9, 1, 9, -2], [ -8, 0, 9, 7, -10], [ 2, -3, -1, 5, -7], [ 0, -2, -2, 9, 1], [ -7, -9, -4, -1, 6]]) In [4]: A[A 0] = 0 In [5]: A Out[5]: array([[0, 9, 1, 9, 0], [0, 0, 9, 7, 0], [2, 0, 0, 5, 0], [0, 0, 0, 9, 1], [0, 0, 0, 0, 6]]) ~Brett On Mon, Aug 1, 2011 at 4:31 AM, dileep kunjaai dileepkunj...@gmail.com wrote: Dear sir, How can we fill a particular value in the place of number satisfying certain condition by another number in an array. Example: A=[[[ 9.42233087e-42 - 4.71116544e-42 0.e+00 ..., 1.48303127e+01 1.31524124e+01 1.14745111e+01] [ 3.91788793e+00 1.95894396e+00 0.e+00 ..., 1.78252487e+01 1.28667984e+01 7.90834856e+00] [ 7.83592510e+00 -3.91796255e+00 0.e+00 ..., 2.08202991e+01 1.25811749e+01 4.34205008e+00] ..., [ -8.51249974e-03 7.00901222e+00 -1.40095119e+01 ..., 0.e+00 0.e+00 0.e+00] [ 4.26390441e-03 3.51080871e+00 -7.01735353e+00 ..., 0.e+00 0.e+00 0.e+00] [ 0.e+00 0.e+00 0.e+00 ..., 0.e+00 0.e+00 0.e+00]] [[ 9.42233087e-42 -4.71116544e-42 0.e+00 ..., 8.48242474e+00 7.97146845e+00 7.46051216e+00] [ 5.16325808e+00 2.58162904e+00 0.e+00 ..., 8.47719383e+00 8.28024673e+00 8.08330059e+00] [ 1.03267126e+01 5.16335630e+00 0.e+00 ..., 8.47196198e+00 8.58903694e+00 8.70611191e+00] ..., [ 0.e+00 2.74500012e-01 5.4925e-01 ..., 0.e+00 0.e+00 0.e+00] [ 0.e+00 1.37496844e-01 -2.74993688e-01 ..., 0.e+00 0.e+00 0.e+00] [ 0.e+00 0.e+00 0.e+00 ..., 0.e+00 0.e+00 0.e+00]] [[ 9.42233087e-42 4.71116544e-42 0.e+00 ..., 1.18437748e+01 9.72778034e+00 7.61178637e+00] [ 2.96431869e-01 1.48215935e-01 0.e+00 ..., 1.64031239e+01 1.32768812e+01 1.01506386e+01] [ 5.92875004e-01 2.96437502e-01 0.e+00 ..., 2.09626484e+01 1.68261185e+01 1.26895866e+01] ..., [ 1.78188753e+00 -8.90943766e-01 0.e+00 ..., 0.e+00 1.2755e-03 2.5509e-03] [ 9.34620261e-01 -4.67310131e-01 0.e+00 ..., 0.e+00 6.38646539e-04 1.27729308e-03] [ 8.4339e-02 4.21500020e-02 0.e+00 ..., 0.e+00 0.e+00 0.e+00]]] A contain some negative value i want to change the negative numbers to '0'. I used 'masked_where', command but I failed. Please help me -- DILEEPKUMAR. R J R F, IIT DELHI ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Alternative to boolean array
On Tue, Jul 19, 2011 at 11:08 AM, Robert Kern robert.k...@gmail.com wrote: On Tue, Jul 19, 2011 at 07:38, Andrea Cimatoribus g.plantagen...@gmail.com wrote: Dear all, I would like to avoid the use of a boolean array (mask) in the following statement: mask = (A != 0.) B = A[mask] in order to be able to move this bit of code in a cython script (boolean arrays are not yet implemented there, and they slow down execution a lot as they can't be defined explicitely). Any idea of an efficient alternative? You will have to count the number of True values, create the B array with the right size, then run a simple loop to assign into it where A != 0. This makes you do the comparisons twice. Or you can allocate a B array the same size as A, run your loop to assign into it when A != 0 and incrementing the index into B, then slice out or memcpy out the portion that you assigned. According to my calculations, the last method is the fastest, though the savings aren't considerable. In cython, defining some test mask functions (saved as cython_mask.pyx): import numpy as N cimport numpy as N def mask1(N.ndarray[N.int32_t, ndim=1] A): cdef N.ndarray[N.int32_t, ndim=1] B B = A[A != 0] return B def mask2(N.ndarray[N.int32_t, ndim=1] A): cdef int i cdef int count = 0 for i in range(len(A)): if A[i] == 0: continue count += 1 cdef N.ndarray[N.int32_t, ndim=1] B = N.empty(count, dtype=int) count = 0 for i in range(len(A)): if A[i] == 0: continue B[count] = A[i] count += 1 return B def mask3(N.ndarray[N.int32_t, ndim=1] A): cdef N.ndarray[N.int32_t, ndim=1] B = N.empty(len(A), dtype=int) cdef int i cdef int count = 0 for i in range(len(A)): if A[i] == 0: continue B[count] = A[i] count += 1 return B[:count] In [1]: import numpy as N In [2]: import timeit In [3]: from cython_mask import * In [4]: A = N.random.randint(0, 2, 1) In [5]: def mask4(A): ...: return A[A != 0] ...: In [6]: %timeit mask1(A) 1 loops, best of 3: 195 us per loop In [7]: %timeit mask2(A) 1 loops, best of 3: 136 us per loop In [8]: %timeit mask3(A) 1 loops, best of 3: 117 us per loop In [9]: %timeit mask4(A) 1 loops, best of 3: 193 us per loop ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Beginner's question
On Sat, Apr 16, 2011 at 2:08 PM, Laszlo Nagy gand...@shopzeus.com wrote: import numpy as np import numpy.random as rnd def dim_weight(X): weights = X[0] volumes = X[1]*X[2]*X[3] res = np.empty(len(volumes), dtype=np.double) for i,v in enumerate(volumes): if v5184: res[i] = v/194.0 else: res[i] = weights[i] return res N = 10 X = rnd.randint( 1,25, (4,N)) print dim_weight(X) Laszlo This works: def dim_weight2(X): w = X[0] v = X[1]*X[2]*X[3] res = np.empty(len(volumes), dtype=np.double) res[:] = w[:] res[v5184] = v[v5184]/194.0 return res ~Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slicing / indexing question
On Tue, Sep 21, 2010 at 6:20 PM, Timothy W. Hilton hil...@meteo.psu.edu wrote: Hello, I have an indexing problem which I suspect has a simple solution, but I've not been able to piece together various threads I've read on this list to solve. I have an 80x1200x1200 nd.array of floats this_par. I have a 1200x1200 boolean array idx, and an 80-element float array pars. For each element of idx that is True, I wish to replace the corresponding 80x1x1 slice of this_par with the elements of pars. I've tried lots of variations on the theme of this_par[idx[np.newaxis, ...]] = pars[:, np.newaxis, np.newaxis] but so far, no dice. Any help greatly appreciated! Thanks, Tim This works, although I imagine it could be streamlined. In [1]: this_par = N.ones((2,4,4)) In [2]: idx = N.random.random((4,4)) 0.5 In [3]: pars = N.arange(2) - 10 In [4]: this_par[:,idx] = N.tile(pars, (idx.sum(), 1)).transpose() In [5]: idx Out[5] array([[ True, False, True, False], [False, False, True, True], [False, False, False, False], [False, False, False, False]], dtype=bool) In [6]: this_par Out[6]: array([[[-10., 1., -10., 1.], [ 1., 1., -10., -10.], [ 1., 1.,1., 1.], [ 1., 1.,1., 1.]], [[ -9., 1., -9., 1.], [ 1., 1., -9., -9.], [ 1., 1.,1., 1.], [ 1., 1.,1., 1.]]]) Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Two questions on indexing
On Wed, Sep 15, 2010 at 4:38 PM, Mark Fenner mfen...@gmail.com wrote: A separate question. Suppose I have a slice for indexing that looks like: [:, :, 2, :, 5] How can I get an indexing slice for all OTHER dimension values besides those specified. Conceptually, something like: [:, :, all but 2, :, all but 5] Incidentally, the goal is to construct a new array with all those other spots filled in with zero and the specified spots with their original values. Would it be easier to construct a 0-1 indicator array with 1s in the [:,:,2,:,5] positions and multiply it out? Humm, I may have just answered my own question. For argument sake, how would you do it with indexing/slicing? I suppose one develops some intuition as one gains experience with numpy with regards to when to (1) use clever matrix ops and when to (2) use clever slicing and when to (3) use a combination of both. This works, although I'm not sure how efficient it is compared to other methods: In [19]: a = N.arange(16).reshape(4,4) In [20]: a Out[20]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) In [24]: a[:,N.arange(4) != 2] Out[24]: array([[ 0, 1, 3], [ 4, 5, 7], [ 8, 9, 11], [12, 13, 15]]) Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scan array to extract min-max values (with if condition)
On Sat, Sep 11, 2010 at 7:45 AM, Massimo Di Stefano massimodisa...@gmail.com wrote: Hello All, i need to extract data from an array, that are inside a rectangle area defined as : N, S, E, W = 234560.94503118, 234482.56929822, 921336.53116178, 921185.3779625 the data are in a csv (comma delimited text file, with 3 columns X,Y,Z) #X,Y,Z 3020081.5500,76.3100,0.0300 3020086.2000,769991.6500,0.4600 3020099.6600,769996.2700,0.9000 ... ... i read it using numpy.loadtxt data : http://www.geofemengineering.it/data/csv.txt 5,3 mb (158735 rows) to extract data that are inside the boundy-box area (N, S, E, W) i'm using a loop inside a function like : import numpy as np def getMinMaxBB(data, N, S, E, W): mydata = data * 0.3048006096012 for i in range(len(mydata)): if mydata[i,0] E or mydata[i,0] W or mydata[i,1] N or mydata[i,1] S : if i == 0: newdata = np.array((mydata[i,0],mydata[i,1],mydata[i,2]), float) else : newdata = np.vstack((newdata,(mydata[i,0], mydata[i,1], mydata[i,2]))) results = {} results['Max_Z'] = newdata.max(0)[2] results['Min_Z'] = newdata.min(0)[2] results['Num_P'] = len(newdata) return results N, S, E, W = 234560.94503118, 234482.56929822, 921336.53116178, 921185.3779625 data = '/Users/sasha/csv.txt' mydata = np.loadtxt(data, comments='#', delimiter=',') out = getMinMaxBB(mydata, N, S, E, W) print out Use boolean arrays to index the parts of your array that you want to look at: def newGetMinMax(data, N, S, E, W): mydata = data * 0.3048006096012 mask = np.zeros(mydata.shape[0], dtype=bool) mask |= mydata[:,0] E mask |= mydata[:,0] W mask |= mydata[:,1] N mask |= mydata[:,1] S results = {} results['Max_Z'] = mydata[mask,2].max() results['Min_Z'] = mydata[mask,2].min() results['Num_P'] = mask.sum() return results This runs about 5000 times faster on my machine. Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scan array to extract min-max values (with if condition)
On Sat, Sep 11, 2010 at 4:46 PM, Massimo Di Stefano massimodisa...@gmail.com wrote: Thanks Pierre, i tried it and all works fine and fast. my apologize :-( i used a wrong if statment to represent my needs if mydata[i,0] E or mydata[i,0] W or mydata[i,1] N or mydata[i,1] S : ^^ totally wrong for my needs^^ this if instead : if W mydata[i,0] E and S mydata[i,1] N: should reflect your example : yselect = (data[:,1] = N) (data[:,1] = S) xselect = (data[:,0] = E) (data[:,0] = W) selected_data = data[xselect yselect] a question, how to code a masked array, as in the Brett's code, to reflect the new (right) if statment ? Just replace the lines mask |= mydata[:,0] E mask |= mydata[:,0] W mask |= mydata[:,1] N mask |= mydata[:,1] S with mask = mydata[:,0] E mask = mydata[:,0] W mask = mydata[:,1] N mask = mydata[:,1] S Sorry, I wasn't paying attention to what you were actually trying to do and just duplicated the function of the code you supplied. There's a good primer on how to index with boolean arrays at http://www.scipy.org/Tentative_NumPy_Tutorial#head-d55e594d46b4f347c20efe1b4c65c92779f06268 that will explain why this works. Brett ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Boolean arrays
Hello, I have an array of non-numeric data, and I want to create a boolean array denoting whether each element in this array is a valid value or not. This is straightforward if there's only one possible valid value: import numpy as N ar = N.array((a, b, c, b, b, a, d, c, a)) ar == a array([ True, False, False, False, False, True, False, False, True], dtype=bool) If there's multiple possible valid values, I've come up with a couple possible methods, but they all seem to be inefficient or kludges: valid = N.array((a, c)) (ar == valid[0]) | (ar == valid[1]) array([ True, False, True, False, False, True, False, True, True], dtype=bool) N.array(map(lambda x: x in valid, ar)) array([ True, False, True, False, False, True, False, True, True], dtype=bool) Is there a numpy-appropriate way to do this? Thanks, Brett Olsen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion