Re: [Numpy-discussion] Numpy interpolate: cut through 2D matrix
On 03/01/2012 12:35 PM, Pierre Barthelemy wrote: Hello, for a data analysis tool i am programming, i need to plot a cut through a 2D graph. I then have a 2D array, and the indices start=(start_x,start_y) and stop=(stop_x,stop_y) that are the position of the starting point and stop point of the cut. The code i programmed is placed on the bottom. This code returns only value existing in the original array: if the cut should pass between the column index i and column index i+1, it returns anyway the value at column index i. Is it possible to use the numpy.interpolate library to make such that when passing between i and i+1, the function returns an interpolation of the graph between the points [row,column] and [row,column+1] ? Pierre, if you have scipy next to numpy, have a look at scipy.ndimage.map_coordinates. In short, you define the coordinates you want your array to be interpolated at, and it will give you the interpolated values, using spline interpolation. Best, Vincent. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] python geospatial package?
On 02/22/2012 10:45 PM, Chao YUE wrote: Hi all, Is anyone using some python geospatial package that can do jobs like intersection, etc. the job is like you automatically extract a region on a global map etc. thanks and cheers, Chao Chao, shapely would do this, though I found it had a bit of a steep learning curve. Or you could go the gdal/ogr way, which uses the geos library under the hood (if present) to do geometrical operations like intersections etc. cheers, Vincent. -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] find location of maximum values
On Wed, 04 Jan 2012 12:29:36 +0100, Derek Homeier wrote: On 04.01.2012, at 5:10AM, questions anon wrote: Thanks for your responses but I am still having difficuties with this problem. Using argmax gives me one very large value and I am not sure what it is. it is the index in the flattened array. To translate this into a multidimensional index, use numpy.unravel_index(i, original_shape). Cheers, Vincent. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] local variable referenced before assignment
On 05/31/2011 11:04 AM, Edoardo Pasca wrote: Dear all, sometimes I encounter the problem that calling many times a function it happens that some local variables are not defined and the procedure crashes. For example I have a function defined as def procedure(tt, Ctissue, WeinmannFit, bodyWeight): for maxFastSampling in range(1,len(tt)): if tt[maxFastSampling] - tt[maxFastSampling-1] 10: break if numpy.equal(Ctissue[:maxFastSampling] , 0 ).all(): # do something Now the inputs are all numpy.array except bodyWeight that is a number. The program calls the function procedure many times but at some point fails at the evaluation of numpy.equal with UnboundLocalError: local variable 'maxFastSampling' referenced before assignment I wonder whether you see any way that the maxFastSampling variable might not be set. Edo, When len(tt) is 2, range(1, len(tt)) gives an emtpy list, the for loop will never execute and maxFastSampling will not be set. VS. Python 2.6.4 numpy 1.5.0 Thanks Edo ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] need a better way to fill a grid
On 01/24/2011 02:53 PM, John wrote: Hello, I'm trying to cycle over some vectors (lat,lon,emissions) of irregularly spaced lat/lon spots, and values. I need to sum the values each contributing to grid on a regular lat lon grid. This is what I have presently, but it is too slow. Is there a more efficient way to do this? I would prefer not to create an external module (f2py, cython) unless there is really no way to make this more efficient... it's the looping through the grid I guess that takes so long. Use np.histogram2d with weights=emissions, and lat and lon as your x and y to histogram. Choose the bins to match your grid, and it will effectively sum the emission values for each grid cell. Vincent. Thanks, john def grid_emissions(lon,lat,emissions,grid.dx, grid.dy, grid.outlat0, grid.outlon0, grid.nxmax, grid.nymax): sample the emissions into a grid to fold into model output dx = grid.dxout dy = grid.dyout # Generate a regular grid to fill with the sum of emissions xi = np.linspace(grid.outlon0, grid.outlon0+(grid.nxmax*grid.d), grid.nxmax) yi = np.linspace(grid.outlat0, grid.outlat0+(grid.nymax*grid.dy), grid.nymax) X, Y = np.meshgrid(yi, xi) Z = np.zeros(X.shape) for i,x in enumerate(xi): for j,y in enumerate(yi): Z[i,j] = np.sum( emissions[\ np.where(((laty-dy) (laty+dy)) ((lonx-dx) (lonx+dx)))[0]]) return Z ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] large float32 array issue
On 11/03/2010 03:04 PM, Bruce Southey wrote: On 11/03/2010 06:52 AM, Pauli Virtanen wrote: Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote: [clip] Btw, should I file a bug on this? One can argue that mean() and sum() should use a numerically stabler algorithm, so yes, a bug can be filed if there is not yet one already. This is a 'user bug' not a numpy bug because it is a well known numerical problem. I recall that we have had this type of discussion before that has resulted in these functions being left as they are. The numerical problem is mentioned better in the np.mean docstring than the np.sum docstring. My understanding was that any new algorithm has to be better than the current algorithm especially in speed and accuracy across 'typical' numpy problems across the different Python and OS versions not just for numerically challenged cases. For example, I would not want to sacrifice speed if I achieve the same accuracy without losing as much speed as just changing the dtype to float128 (as I use x86_64 Linux). Also in Warren's mean example, this is simply a 32-bit error because it disappears when using 64-bit (numpy's default) - well, until we reach the extreme 64-bit values. np.ones((11334,16002)).mean() 1.0 np.ones((11334,16002),np.float32).mean() 0.092504406598019437 np.ones((11334,16002),np.float32).mean().dtype dtype('float64') Note that there is probably a bug in np.mean because a 64-bit dtype is returned for integers and 32-bit or lower precision floats. So upcast is not apparently being done on the accumulator. Bruce Thanks for the info, all. I agree that this is a 'user bug', however, mentioning this as a corner case someplace a user would look when finding errors like these might be an idea, as I have the feeling it will keep turning up once and again at this list otherwise. Maybe start a webpage 'numpy and some often encountered floating point issues'? For now, I've just ordered more memory. The cheapest and simplest solution, if you ask me :-) Vincent. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] large float32 array issue
Hi, I'm running in this strange issue when using some pretty large float32 arrays. In the following code I create a large array filled with ones, and calculate mean and sum, first with a float64 version, then with a float32 version. Note the difference between the two. NB the float64 version is obviously right :-) In [2]: areaGrid = numpy.ones((11334, 16002)) In [3]: print(areaGrid.dtype) float64 In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), areaGrid.mean(), areaGrid.sum()) ((11334, 16002), 1.0, 1.0, 1.0, 18138.0) In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) In [6]: print(areaGrid.dtype) float32 In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), areaGrid.mean(), areaGrid.sum()) ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) Can anybody confirm this? And better: explain it? Am I running into a for me till now hidden ieee float 'feature'? Or is it a bug somewhere? Btw I'd like to use float32 arrays, as precision is not really an issue in this case, but memory usage is... This is using python 2.7, numpy from git (yesterday's checkout), on arch linux 64bit. Best, Vincent. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] large float32 array issue
On 11/03/2010 12:31 PM, Warren Weckesser wrote: On Wed, Nov 3, 2010 at 5:59 AM, Warren Weckesser warren.weckes...@enthought.com mailto:warren.weckes...@enthought.com wrote: On Wed, Nov 3, 2010 at 3:54 AM, Vincent Schut sc...@sarvision.nl mailto:sc...@sarvision.nl wrote: Hi, I'm running in this strange issue when using some pretty large float32 arrays. In the following code I create a large array filled with ones, and calculate mean and sum, first with a float64 version, then with a float32 version. Note the difference between the two. NB the float64 version is obviously right :-) In [2]: areaGrid = numpy.ones((11334, 16002)) In [3]: print(areaGrid.dtype) float64 In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), areaGrid.mean(), areaGrid.sum()) ((11334, 16002), 1.0, 1.0, 1.0, 18138.0) In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32) In [6]: print(areaGrid.dtype) float32 In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(), areaGrid.mean(), areaGrid.sum()) ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0) Can anybody confirm this? And better: explain it? Am I running into a for me till now hidden ieee float 'feature'? Or is it a bug somewhere? Btw I'd like to use float32 arrays, as precision is not really an issue in this case, but memory usage is... This is using python 2.7, numpy from git (yesterday's checkout), on arch linux 64bit. The problem kicks in with an array of ones of size 2**24. Note that np.float32(2**24) + np.float32(1.0) equals np.float32(2**24): In [41]: b = np.ones(2**24, np.float32) In [42]: b.size, b.sum() Out[42]: (16777216, 16777216.0) In [43]: b = np.ones(2**24+1, np.float32) In [44]: b.size, b.sum() Out[44]: (16777217, 16777216.0) In [45]: np.spacing(np.float32(2**24)) Out[45]: 2.0 In [46]: np.float32(2**24) + np.float32(1) Out[46]: 16777216.0 By the way, you can override the dtype of the accumulator of the mean() function: In [61]: a = np.ones((11334,16002),np.float32) In [62]: a.mean() # Not correct Out[62]: 0.092504406598019437 In [63]: a.mean(dtype=np.float64) Out[63]: 1.0 Thanks for this. That at least gives me a temporary solution (I actually need sum() instead of mean(), but the trick works for sum too). Btw, should I file a bug on this? Vincent. Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] summarizing blocks of an array using a moving window
On 07/22/2010 06:47 AM, Robin Kraft wrote: Hello all, The short version: For a given NxN array, is there an efficient way to use a moving window to collect a summary statistic on a chunk of the array, and insert it into another array? Hi Robin, been wrestling with similar stuff myself, though I have no obvious answer... A lot depends on your data and the stats you want. Some thoughts: - though you want non-overlapping stats, you could take a look at scipy.ndimage. Lots of this is coded in C, and though it does overlapping moving window stats, its speed might outweigh the stuff you have to do otherwise to 'tile' your array. You would then just slice the result of the ndimage operation (e.g. [::2, ::2] or similar). - an other option would be some smart reshaping, which finally gives you a [y//2, x//2, 2, 2] array, which you could then reduce to calculate stats (mean, std, etc) on the last two axes. I *think* you'd have to first reshape both x and y axes, and then reposition the axes. An example: a = gdal_array.BandReadAsArray(blabla) y,x = a.shape #y and x need be divideable by 2! b = a.reshape(y/2, 2, x/2, x).transpose(0,2,1,3).reshape(y/2, x/2, 4) bMean = b.mean(axis=-1) bMax = ..etc. - a third option would be to create an index array, which has a unique value per 2x2 square, and then use histogram2d. This would, if you use its 'weight' functionality, at least enable you to get efficient counts and sums/means. Other stats might be hard, though. Good luck! Vincent Schut. The long version: I am trying to resample an image loaded with GDAL into an NxN array. Note that this is for statistical purposes, so image quality doesn't matter. For the curious, the image is derived from satellite imagery and displays a map of hotspots of tropical deforestation at 500m resolution. I need to assign a count of these deforestation 'hits' to each pixel in an existing array of 1000m pixels. Let's say the image looks like this: np.random.randint(0,2, 16).reshape(4,4) array([[0, 0, 0, 1], [0, 0, 1, 1], [1, 1, 0, 0], [0, 0, 0, 0]]) I want to use a square, non-overlapping moving window for resampling, so that I get a count of all the 1's in each 2x2 window. 0, 0, 0, 1 0, 0, 1, 1 0 3 =2 0 1, 1, 0, 0 0, 0, 0, 0 In another situation with similar data I'll need the average, or the maximum value, etc.. My brute-force method is to loop through the rows and columns to get little chunks of data to process. But must be a more elegant way to do this - it's probably obvious too ... (inelegant way further down). Another way to do it would be to use np.tile to create an array called arr filed with blocks of [[0,1],[2,3]]. I could then use something like this to get the stats I want: d0 = img[np.where(arr==0)] d1 = img[np.where(arr==1)] d2 = img[np.where(arr==2)] d3 = img[np.where(arr==3)] img_out = d0 + d1 + d2 + d3 This would be quite snappy if I could create arr efficiently. Unfortunately np.tile does something akin to np.hstack to create this array, so it isn't square and can't be reshaped appropriately (np.tile(np.arange(2**2).reshape(2, 2), 4)): array([[0, 1, 0, 1, 0, 1, 0, 1], [2, 3, 2, 3, 2, 3, 2, 3]]) Inefficient sample code below. Advice greatly appreciated! -Robin import numpy as np from math import sqrt from time import time def rs(img_size=16, window_size=2): w = window_size # making sure the numbers work if img_size % sqrt(img_size) 0: print Square root of image size must be an integer. print Sqrt =, sqrt(img_size) return elif (img_size / sqrt(img_size)) % w 0: print Square root of image size must be evenly divisible by, w print Sqrt =, sqrt(img_size) print sqrt(img_size), /, w, =, sqrt(img_size)/w return else: rows = int(sqrt(img_size)) cols = int(sqrt(img_size)) # create fake data: ones and zeros img = np.random.randint(0, 2, img_size).reshape(rows, cols) # create empty array for resampled data img_out = np.zeros((rows/w, cols/w), dtype=np.int) t = time() # retreive blocks of pixels in img # insert summary into img_out for r in xrange(0, rows, w): # skip rows based on window size for c in xrange(0, cols, w): # skip columns based on window size # calculate the sum of the values in moving window, #insert value into corresponding pixel in img_out img_out[r/w, c/w] = np.int(np.sum(img[r:r+w, c:c+w])) t1= time() elapsed = t1-t print img shape:, img.shape print img, \n print img_out shape:, img_out.shape print img_out print %.7f seconds % elapsed rs(imgage_size=16, window=2) rs(81, 3) rs
Re: [Numpy-discussion] Crosstabulation
On 07/19/2010 10:14 PM, Friedrich Romstedt wrote: 2010/7/19 sandric ionutsandricio...@yahoo.com: For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like: forest orchards built-up 0-3 10 20 15 3-15 5 10 20 15-35 5 15 15 where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data) Okay everything is clear now. I would suggestest looping over the cells of the table. E.g. when: landuse_catmap slope_map are the respective maps, you can categorise the slope_map with slope_catmap = 0 * (0= slope_map) * (slope_map 3) + 1 * (3= slope_map) * (slope_map 15) + 2 * (15= slope_map) to get categories 0, 1, and 2, where the zero case is included here only for illustration, since it's just zero it can be omitted. Then you are maybe supposed to do: table = numpy.zeros((N_slope_cats, N_landuse_cats)) and finally the looping over its elements, where the looping over the map cells is done entirely by numpy: for slope_cat in xrange(0, N_slope_cats): for landuse_cat in xrange(0, N_landuse_cats): table[slope_cat, landuse_cat] = \ ((slope_catmap == slope_cat) * \ (landuse_catmap == landuse_cat)).sum() I believe there is some slightly faster but more obscure way doing this table creation in one single large step by putting the maps into a hyperdimensianal array but I also believe this is beyond our scope here if this is already fast enough. That 'slightly faster but more obscure way' would be histogram2d :-) Especially when you have many landuse and/or slope classes over which to loop, histogram2d will be more than just slightly faster. About obscure, well, I leave that to the user. Probably depends on your background. Personally I use it all the time. Loops are evil :-) your 'more obscure' code would become something like this: slope_bin_edges = [0, 3, 15, 35] landuse_bin_edges = [0, 1, 2, 3] crosstab = numpy.histogram2d(landuse, slope, bins=(landuse_bin_edges, slope_bin_edges)) VS. Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Crosstabulation
On 07/19/2010 09:55 AM, sandric ionut wrote: Hi Friedrich: For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which 3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like: forest orchards built-up 0-3 10 n bsp; 20 15 3-15 5 10 20 15-35 5 15 15 where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data) The analysis is better illustrated here: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=tabulate_area Ionut Ha, we're messing up lingo's here :-) Need to switch to GIS (geographic information systems) dialect. - DEM = digital elevation map, usually a 2d array ('raster') with elevation values (for a certain area on earth) - slope gradient = the slope (literally, not as in math speak) of the surface depicted by the elevation map. Mostly defined as the maximum slope within a certain moving window; several competing methods to estimate/calculate slope exist. - land use/cover class: raster (array) where each cell ('pixel') has an integer value, which maps to some well defined land use at that location (e.g. 0 means sea, 1 means forest, 2 means agriculture, etc) - crosstabulation usually means some kind of 2d histogram, where the total number of raster cells with a certain value (e.g. depicting 'land use class') 'within' a range of values of another raster with the same shape (and matching locations). Like: how many cells of forest lie withing a slope range of 0-10 degrees? Right. On to the answers. I think you should look into numpy.histogram2d, where you can do exactly what you want. Your land use array is x, your slope gradient array = y, then you define the bins as your class numbers (for x) and your slope gradient ranges (for y), and you will get a pixel count for each bin combination. see: http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html Regards, Vincent Schut. *From:* Friedrich Romstedt friedrichromst...@gmail.com *To:* Discussion of Numerical Python numpy-discussion@scipy.org *Sent:* Sun, July 18, 2010 12:09:04 AM *Subject:* Re: [Numpy-discussion] Crosstabulation 2010/7/17 Robert Kern robert.k...@gmail.com mailto:robert.k...@gmail.com: On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt friedrichromst...@gmail.com mailto:friedrichromst...@gmail.com wrote: 2010/7/14 Ionut Sandric sandricio...@yahoo.com mailto:sandricio...@yahoo.com: I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem? Digital Elevation Map. (n/a in my dictionary) And sorry for the cross-talk on the other first post by you ... And by slope gradient you mean second derivative? No, the first derivative. Slope gradient is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map. Thanks Robert, that clarifies a lot. But still I don't understand how the crosstabulation shall work. What are the classes? Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy for Python 3?
On 07/19/2010 02:56 PM, Richard D. Moores wrote: On Mon, Jul 19, 2010 at 05:28, Vincent Schutsc...@sarvision.nl wrote: Well, you might want to read up on some beginners guide for python? It's up to you, of course, but usually before starting with numpy (which extends python), it is advised to have at least some basic python understanding... Googling will give you plenty of good resources, if you'd like to. I'm not a python beginner. Why did you assume I was? I appologize, then. This, however, made me think you were: 'Now what? Try simple commands? Like Lemme outta here!?' Now english is not my native language, so some subtle humour might have escaped me and I may have understood that entirely wrong... :-) Then, for the sake of helping you further anyway: you'll have to mind the significance of paths (=directories or folders in windows speak I think). The folder you're currently in, will restrict what you find when typing commands. If you need to reference something from a different folder, you'll need to explicitly specify that. Yes, I have that understanding. Good. The fact that you were starting python from the Python31 folder, and then typed 'now what?' gave me the idea you did not... To build numpy, you'll need to be in the numpy source folder (the numpy you extracted from svn). But if you're there, simply typing 'python' or 'python.exe' will probably not work because 'python.exe' is in a different folder (c:\Python31). You could go into that folder, but then you would not be able to find numpy's setup.py script. Best way to solve that: make sure you're in the numpy folder, and type something like: 'c:\Python31\python.exe setup.py build'. That should get you started at least. However, if I'm allowed to give you some unaskedfor advice: this might become lots easier if you make sure you're at least a bit comfortable with 1) the windows command prompt, 2) python, and 3) building python stuff from svn source checkouts. No offence meant. But you sound as you feel a lot more comfortable with pre-built packages compared to building yourself from source on windows... No, I fail your number 3. Well, than you've come along quite far already, and are on the right list :-) Then, please post the output of your 'python setup.py build' command, which will give us some clues about *why* you fail... (or, if the output is long, try to find the relevant lines indicating where and what goes wrong) Good luck anyway! Vincent Schut. Thanks, Vincent. And I am more comfortable with pre-built packages. Most of us are ;-) But sometimes you just need to bite the bullet... Dick ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy for Python 3?
On 07/19/2010 03:34 PM, Richard D. Moores wrote: On Mon, Jul 19, 2010 at 06:15, Vincent Schutsc...@sarvision.nl wrote: On 07/19/2010 02:56 PM, Richard D. Moores wrote: On Mon, Jul 19, 2010 at 05:28, Vincent Schutsc...@sarvision.nlwrote: Well, you might want to read up on some beginners guide for python? It's up to you, of course, but usually before starting with numpy (which extends python), it is advised to have at least some basic python understanding... Googling will give you plenty of good resources, if you'd like to. I'm not a python beginner. Why did you assume I was? I appologize, then. This, however, made me think you were: 'Now what? Try simple commands? Like Lemme outta here!?' Now english is not my native language, so some subtle humour might have escaped me and I may have understood that entirely wrong... :-) Oh, that's OK. But it's otherwise hard to believe English is not your native language. Ha, thanks :-) Dutch it is, however realizing that we're just a tiny speck on the world's stage we get tought english from our 10th... Then, for the sake of helping you further anyway: you'll have to mind the significance of paths (=directories or folders in windows speak I think). The folder you're currently in, will restrict what you find when typing commands. If you need to reference something from a different folder, you'll need to explicitly specify that. Yes, I have that understanding. Good. The fact that you were starting python from the Python31 folder, and then typed 'now what?' gave me the idea you did not... To build numpy, you'll need to be in the numpy source folder (the numpy you extracted from svn). But if you're there, simply typing 'python' or 'python.exe' will probably not work because 'python.exe' is in a different folder (c:\Python31). You could go into that folder, but then you would not be able to find numpy's setup.py script. Best way to solve that: make sure you're in the numpy folder, and type something like: 'c:\Python31\python.exe setup.py build'. That should get you started at least. However, if I'm allowed to give you some unaskedfor advice: this might become lots easier if you make sure you're at least a bit comfortable with 1) the windows command prompt, 2) python, and 3) building python stuff from svn source checkouts. No offence meant. But you sound as you feel a lot more comfortable with pre-built packages compared to building yourself from source on windows... No, I fail your number 3. Well, than you've come along quite far already, and are on the right list :-) Then, please post the output of your 'python setup.py build' command, which will give us some clues about *why* you fail... (or, if the output is long, try to find the relevant lines indicating where and what goes wrong) I posted the output as an attached text file in my reply to Dave. Not sure that got to the list as I'm not familiar with the list's rules about attachments. I, my fault then, I didn't see that. Seen read it now, though. I'm not a windows users, and am afraid that I can't help you any further anymore with this as your problems seem to be pretty much windows/msvc related... Others on this list will know more, probably. Thanks, Vincent. And I am more comfortable with pre-built packages. Most of us are ;-) But sometimes you just need to bite the bullet... I'm biting, I'm biting (that's another kind of U.S. joking). Actually, several years ago I was using Ulipad, an IDE for Python. It was under active development and frequently updated via svn. So I had and used TortoiseSVN then, but on an old computer. So I'm starting over getting the details of how to use it back. Ha, I've used ulipad too. Mostly because it's lean 'n mean and doesn't force you to create an entire 'project' to just create a new python script (like eclipse/pydev et.al. do). Have newer hardware now, so the lean 'n' mean argument is of less significance, but still use it sometimes for some quick hacking... Dick ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2D binning
On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote: On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincuszachary.pin...@yale.edu wrote: I guess it's as fast as I'm going to get. I don't really see any other way. BTW, the lat/lons are integers) You could (in c or cython) try a brain-dead hashtable with no collision detection: for lat, long, data in dataset: bin = (lat ^ long) % num_bins hashtable[bin] = update_incremental_mean(hashtable[bin], data) you'll of course want to do some experiments to see if your data are sufficiently sparse and/or you can afford a large enough hashtable array that you won't get spurious hash collisions. Adding error- checking to ensure that there are no collisions would be pretty trivial (just keep a table of the lat/long for each hash value, which you'll need anyway, and check that different lat/long pairs don't get assigned the same bin). Zach -Mathew On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincuszachary.pin...@yale.edu wrote: Hi Can anyone think of a clever (non-lopping) solution to the following? A have a list of latitudes, a list of longitudes, and list of data values. All lists are the same length. I want to compute an average of data values for each lat/lon pair. e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then data[1001] = (data[1001] + data[2001])/2 Looping is going to take wa to long. As a start, are the equal lat/lon pairs exactly equal (i.e. either not floating-point, or floats that will always compare equal, that is, the floating-point bit-patterns will be guaranteed to be identical) or approximately equal to float tolerance? If you're in the approx-equal case, then look at the KD-tree in scipy for doing near-neighbors queries. If you're in the exact-equal case, you could consider hashing the lat/ lon pairs or something. At least then the looping is O(N) and not O(N^2): import collections grouped = collections.defaultdict(list) for lt, ln, da in zip(lat, lon, data): grouped[(lt, ln)].append(da) averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items()) Is that fast enough? If the lat lon can be converted to a 1d label as Wes suggested, then in a similar timing exercise ndimage was the fastest. http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html And as you said your lats and lons are integers, you could simply do ll = lat*1000 + lon to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or lon will never exceed 360 (degrees). After that, either use the ndimage approach, or you could use histogramming with weighting by data values and divide by histogram withouth weighting, or just loop. Vincent (this was for python 2.4, also later I found np.bincount which requires that the labels are consecutive integers, but is as fast as ndimage) I don't know how it would compare to the new suggestions. Josef Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Extracting values from one array corresponding to argmax elements in another array
On 04/06/2010 03:22 PM, Ken Basye wrote: From: Vincent Schut sc...@sarvision.nl On 04/05/2010 06:06 PM, Keith Goodman wrote: On Mon, Apr 5, 2010 at 8:44 AM, Ken Basyekbas...@jhu.edu wrote: snip b[a.argmax(axis=0), range(3)] array([0, 4, 5]) Which does not work anymore when your arrays become more-dimensional (like in my case: 4 or more) and the axis you want to select on is not the first/last one. If I recall correctly, I needed to construct the full index arrays for the other dimensions too (like with ogrid I think). So: create the ogrid, replace the one for the dimensions you want the argmax selection to take place on with the argmax parameter, and use those index arrays to index your b array. I'd need to look up my source code to be more sure/precise. If anyone would like me to, please let me know. If anyone knows a less elaborate way, also please let us know! :-) Hi Vincent, I'd like to see more about your solution. For my present purposes, Keith's solution was sufficient, but I'm still very interested in a solution that's independent of dimension and axis. Thanks (and thanks, Keith), Ken I've tracked it down. I have created the following quick'n dirty function: def selectFromDimension(data, selection, dimension): ogridStatement = numpy.ogrid[ slices = [] for d in data.shape: slices.append(0: + str(d)) ogridStatement += ,.join(slices) ogridStatement += ] grid = eval(ogridStatement) grid[dimension] = selection result = data[grid] return result which you call with the array to select from, the result of argmin/max(axis=...), and the axis to select from (usually the same as the axis argument to the argmin/max) It uses eval to be able to create the ogrid independent of dimensionality. There's probably a much cleaner way, but this worked for me :-) Now I think about it, probably another way could be to transpose your axes such that the axis to select on becomes the first axis. Then just index with the result of argmin/max, and transpose back if necessary. However, I have not tried this and I might miss something obvious here... Vincent. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Extracting values from one array corresponding to argmax elements in another array
On 04/05/2010 06:06 PM, Keith Goodman wrote: On Mon, Apr 5, 2010 at 8:44 AM, Ken Basyekbas...@jhu.edu wrote: Hi Folks, I have two arrays, A and B, with the same shape. I want to find the highest values in A along some axis, then extract the corresponding values from B. I can get the highest values in A with A.max(axis=0) and the indices of these highest values with A.argmax(axis=0). I'm trying to figure out a loop-free way to extract the corresponding elements from B using these indices. Here's code with a loop that will do what I want for two-dimensional arrays: a array([[ 100.,0.,0.], [ 0., 100., 100.], [ 0.,0.,0.]]) a.max(axis=0) array([ 100., 100., 100.]) sel = a.argmax(axis=0) sel array([0, 1, 1]) b = np.arange(9).reshape((3,3)) b array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) b_best = np.empty(3) for i in xrange(3): ...b_best[i] = b[sel[i], i] ... b_best array([ 0., 4., 5.]) Here's one way: b[a.argmax(axis=0), range(3)] array([0, 4, 5]) Which does not work anymore when your arrays become more-dimensional (like in my case: 4 or more) and the axis you want to select on is not the first/last one. If I recall correctly, I needed to construct the full index arrays for the other dimensions too (like with ogrid I think). So: create the ogrid, replace the one for the dimensions you want the argmax selection to take place on with the argmax parameter, and use those index arrays to index your b array. I'd need to look up my source code to be more sure/precise. If anyone would like me to, please let me know. If anyone knows a less elaborate way, also please let us know! :-) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Is this a bug in numpy.ma.reduce?
On 03/05/2010 11:51 AM, Pierre GM wrote: On Mar 5, 2010, at 4:38 AM, David Goldsmith wrote: Hi! Sorry for the cross-post, but my own investigation has led me to suspect that mine is actually a numpy problem, not a matplotlib problem. I'm getting the following traceback from a call to matplotlib.imshow: ... Based on examination of the code, the last self is an instance of ma._extrema_operation (or one of its subclasses) - is there a reason why this class is unable to deal with a zero-size array to ufunc.reduce without identity, (i.e., was it thought that it would - or should - never get one) or was this merely an oversight? Either way, there's other instances on the lists of this error cropping up, so this circumstance should probably be handled more robustly. In the meantime, workaround? 'm'fraid no. I gonna have to investigate that. Please open a ticket with a self-contained example that reproduces the issue. Thx in advance... P. This might be completely wrong, but I seem to remember a similar issue, which I then traced down to having a masked array with a mask that was set to True or False, instead of being a full fledged bool mask array. I was in a hurry then and completely forgot about it later, so filed no bug report whatsoever, for which I apologize. VS. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Producing a Histogram When Bins Are Known
Wayne Watson wrote: I have a list that already has the frequencies from 0 to 255. However, I'd like to make a histogram that has say 32 bins whose ranges are 0-7, 8-15, ... 248-255. Is it possible? Wayne, you might find the 'numpy example list with doc' webpage quite informative... http://www.scipy.org/Numpy_Example_List_With_Doc (give it some time to load, it's pretty large...) For new users (I was one once...) it usually takes some time to find the usual suspects in numpy/scipy help and docs... This one page has really become unvaluable for me. It gives you the docstrings for numpy functions, often including some example code. If you check out the histogram() function, you'll see it takes a 'bins=' argument: bins : int or sequence of scalars, optional If `bins` is an int, it defines the number of equal-width bins in the given range (10, by default). If `bins` is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. So, if your bins are known, you can pass it to numpy.histogram, either as number-of-bins (if equal width), if necessary combined with the 'range=' parameter to specify the range to divide into equal bins, or as bin edges (e.g. in your case: (0, 8, 16, ... 256) or numpy.linspace(0,256,33) which will give you this range nicely. If you don't specify the 'range=' parameter, it will check the min and max from your input data and use that as lower and upper bounds. Good luck learning numpy! :) Vincent. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help making better use of numpy array functions
Hi mdekauwe, as long as your data size is small enough to fit a 8x array in memory and use it, why not just skip the running total and average 8 data arrays each 8day period? And skip the x and y loops; these are real performance killers. As a bonus, your code gets a lot smaller and more readeable... :) Please correct me is I'm wrong: the ultimate goal is to have a average of the valid pixels 8 days of data, each 8 days, correct? I'd do it this way (more or less pythonic pseudocode, untested, but your should get the idea): # read filenames filenames = glob.glob. filenames.sort() # to be sure they are in proper order filenamesChunked = [filenames[n:n+8] for n in range(0, len(filenames, 8)] # this will produce a list of lists, with each sublist containing the # next 8 filenames nodatavalue = -999 for fchunk in filenamesChunked: data8days = numpy.ones((8, ncols, nrows)) * -999 # fill with nodata values, in case there are less than 8 days for di in range(len(fchunk)): f = fchunk[di] data8days[di] = read f weights = data8days nodatavalue # this way all nodata values get a weight of 0 avg8days = numpy.average(data8days, axis=0, weights=weights) save avg8days uhm... well, that's it, really :) best regards, Vincent. mdekauwe wrote: Hi I have written some code and I would appreciate any suggestions to make better use of the numpy arrays functions to make it a bit more efficient and less of a port from C. Any tricks are thoughts would be much appreciated. The code reads in a series of images, collects a running total if the value is valid and averages everything every 8 images. Code below... Thanks in advance... #!/usr/bin/env python Average the daily LST values to make an 8 day product... Martin De Kauwe 18th November 2009 import sys, os, glob import numpy as np import matplotlib.pyplot as plt def averageEightDays(files, lst, count, numrows, numcols): day_count = 0 # starting day - tag for output file doy = 122 # loop over all the images and average all the information # every 8 days... for fname in glob.glob(os.path.join(path, files)): current_file = fname.split('/')[8] year = current_file.split('_')[2][0:4] try: f = open(fname, 'rb') except IOError: print Cannot open outfile for read, fname sys.exit(1) data = np.fromfile(f, dtype=np.float32).reshape(numrows, numcols) f.close() # if it is day 1 then we need to fill up the holding array... # copy the first file into the array... if day_count == 0: lst = data for i in xrange(numrows): for j in xrange(numcols): # increase the pixel count if we got vaild data (undefined = -999.0 if lst[i,j] -900.: count[i,j] = 1 day_count += 1 # keep a running total of valid lst values in an 8 day sequence elif day_count 0 and day_count = 7: for i in xrange(numrows): for j in xrange(numcols): # lst valid pixel? if data[i,j] -900.: # was the existing pixel valid? if lst[i,j] -900.: lst[i,j] = data[i,j] count[i,j] = 1 else: lst[i,j] += data[i,j] count[i,j] += 1 day_count += 1 # need to average previous 8 days and write to a file... if day_count == 8: for i in xrange(numrows): for j in xrange(numcols): if count[i,j] 0: lst[i,j] = lst[i,j] / count[i,j] else: lst[i,j] = -999.0 day_count = 0 doy += 8 lst, count = write_outputs(lst, count, year, doy) # it is possible we didn't have enough slices to do the last 8day avg... # but if we have more than one day we shall use it # need to average previous 8 days and write to a file... if day_count 1: for i in xrange(numrows): for j in xrange(numcols): if count[i,j] 0: lst[i,j] = lst[i,j] / count[i,j] else: lst[i,j] = -999.0 day_count = 0 doy += 8 lst, count = write_outputs(lst, count, year, doy) def write_outputs(lst, count, year, doy):
Re: [Numpy-discussion] Help making better use of numpy array functions
mdekauwe wrote: Thanks...I have adopted that and as you said it is a lot neater! Though I need to keep the pixel count for a weighting in another piece of code so have amended your logic slightly. Alternatively, you could simply take the sum over axis=0 of the weight array to get the pixel count (e.g. pixelcount=weight.sum(axis=0)). I don't know if performance is an issue, but I'd say that skipping these numpy.where's and increments of count and running total, and use numpy.average instead, should give you a speedup. Though the size of your arrays might be too small for it to become very noticeable. However, in your current version, it makes no sense to allocate an 8.nx.ny array to read the data, as you anyways calculate a running total. The 8,nx,ny array only makes sense to postpone calculations till you've read all 8 days, and only then calculate the average (and pixel count). Currently you preserve the previous days of data, but you don't use them anywhere anymore. Either way will work, though, I guess. Choose what feels most naturally for you, as that will help you maintain and understand your code later on. Oh, and minor issue: creating a array of zeros and then multiplying with -999 still makes an array of zeros... I'd incorporated an array of *ones* multiplied with -999, because for the last chunk of days you could end up with a 8day array only partly filled with real data. E.g. if you'd have only 3 days of data left in your last chunk, 8dayData[0:3] would be data, and to prevent the rest ([3:8]) to be incorporated into the average calculation, these need to be -999. Currently these pixels will be 0, which is nodatavalue, and thus infuencing your average (the pixelcount will be incremented for these 0 values). Vincent. #!/usr/bin/env python import sys, os, glob import numpy as np def averageEightDays(filenamesList, numrows, numcols, year): doy = 122 nodatavalue = -999.0 for fchunk in filenamesList: # fill with nodata values, in case there are less than 8 days data8days = np.zeros((8, numrows, numcols), dtype=np.float32) * -999.0 pixCount = np.zeros((numrows, numcols), dtype=np.int) avg8days = np.zeros((numrows, numcols), dtype=np.float32) for day in xrange(len(fchunk)): f = fchunk[day] data8days[day] = np.fromfile(f, dtype=np.float32).reshape(numrows, numcols) avg8days = np.where(data8days[day] nodatavalue, avg8days+data8days[day], avg8days) pixCount = np.where(data8days[day] nodatavalue, pixCount+1, pixCount) avg8days = np.where(pixCount 0, avg8days / np.float32(pixCount), -999.0) doy += 8 print year,':',doy outfile = lst_8day1030am_ + str(year) + str(doy) + .gra write_outputs(outfile, avg8days) outfile = pixelcount_8day1030am_ + str(year) + str(doy) + .gra write_outputs(outfile, pixCount) def write_outputs(outfile, data): opath = /users/eow/mgdk/research/HOFF_plots/LST/tmp try: of = open(os.path.join(opath, outfile), 'wb') except IOError: print Cannot open outfile for write, outfile sys.exit(1) # empty stuff data.tofile(of) of.close() if __name__ == __main__: numrows = 332 numcols = 667 path = /users/eow/mgdk/research/HOFF_plots/LST/gridded_03/ files = 'lst_scr_2006*.gra' filenames = glob.glob(os.path.join(path, files)) filenames.sort() filenamesList = [filenames[n:n+8] for n in xrange(0, len(filenames), 8)] averageEightDays(filenamesList, numrows, numcols, year=2006) files = 'lst_scr_2007*.gra' filenames = glob.glob(os.path.join(path, files)) filenames.sort() filenamesList = [filenames[n:n+8] for n in xrange(0, len(filenames), 8)] averageEightDays(filenamesList, numrows, numcols, year=2007) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help making better use of numpy array functions
mdekauwe wrote: Vincent Schut-2 wrote: Oh, and minor issue: creating a array of zeros and then multiplying with -999 still makes an array of zeros... I'd incorporated an array of *ones* multiplied with -999, because for the last chunk of days you could end up with a 8day array only partly filled with real data. E.g. if you'd have only 3 days of data left in your last chunk, 8dayData[0:3] would be data, and to prevent the rest ([3:8]) to be incorporated into the average calculation, these need to be -999. Currently these pixels will be 0, which is nodatavalue, and thus infuencing your average (the pixelcount will be incremented for these 0 values). Ok I hadn't thought about it in that way but you are of course right! I have amended it. Vincent Schut-2 wrote: Alternatively, you could simply take the sum over axis=0 of the weight array to get the pixel count (e.g. pixelcount=weight.sum(axis=0)). Ok I see your point here as well. So I tried implementing your suggestion, as I understand it weights = data8days nodatavalue will make and 8, nrows, ncols array containing true and false. as you said I can get the pixel count I was after by using weights.sum(axis=0). However when I try to do the averaging step: avg8days = np.average(data8days, axis=0, weights=weights) I get the error msg in average raise ZeroDivisionError, Weights sum to zero, can't be normalized ZeroDivisionError: Weights sum to zero, can't be normalized Which I guess (but don't know) comes from the trying do weight by a pixel count of zero. So not sure what has happened here? Thanks Ah... apparently numpy.average can't handle the situation when all weights are 0... didn't know that. Hmm... what would you want to have in your average array when for a certain pixel there are only nodata values? If you'd like to have -999 in your average result then, the solution is simple: in the weight array, for those pixels where weight is always 0, set 1 dayweight to 1. this way you'll get a nice -999 in your average result for those pixels. e.g.: weights[0] |= (weights.sum(axis=0)==0) will set (boolean OR assign) all pixels with weight==0 for all days to True (==1). Hope this helps. Vincent. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] histogram: sum up values in each bin
Tim Michelsen wrote: Hello, I need some advice on histograms. If I interpret the documentation [1, 2] for numpy.histogram correctly, the result of the function is a count of the occurences sorted into each bin. (n, bins) = numpy.histogram(v, bins=50, normed=1) But how can I apply another function on these values stacked in each bin? Like summing them up or building averages? Hi Tim, If you just want to sum and/or average (= sum / count), you can shortcut this using the weights parameter of numpy.histogram, e.g. something like: data = numpy.random.random((100,)) countsPerBin = numpy.histogram(data) sumsPerBin = numpy.histogram(data, weights=data) averagePerBin = sumsPerBin / countsPerBin Regards, Vincent. Thanks, Timmie [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html [2] http://www.scipy.org/Tentative_NumPy_Tutorial#head-aa75ec76530ff51a2e98071adb7224a4b793519e ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Removing scipy.stsci was [Re: [SciPy-dev] Deprecate chararray [was Plea for help]]
Stéfan van der Walt wrote: Hi Vincent 2009/8/21 Vincent Schut sc...@sarvision.nl: I know it probably will be a pretty involved task, as ndimage comes from numarray and seems to be largely implemented in C. But I really wanted to raise the issue now the image processing subject turns up once again, and hope some folks with more/better programming skills than me might like the idea... What would you like the behaviour to be? For example, how should ndimage.zoom handle these missing values? Good question :-) I see 2 possibilities, both of them can be usefull in their own situations. Note that I am really not into splines mathematically, so my suggestions and terminology might not apply at all... 1. for any output cell that depends on a missing value in the input, return a missing/masked/NaN value, but (and I think this differs from the current implementation), for any output cell which could be calculated, return a proper output. Currently any array that contains one or more NaNs will give an output array full of NaNs (except for order=0, which shows this exact behaviour already). But maybe that's inherent to splines interpolation? This would at least allow input arrays with missing values (or masked arrays) to be used; this behaviour could be extended to many of the ndimage functions, like the kernel based stuff. FFT based convolutions could be another story altogether... 2. In case of zoomco: only use the non-missing values to calculate the splines, thus effectively inter/extrapolating missing/masked values in the process. This probably raises a lot of new questions about the implementation. It would however be highly usefull for me... I don't know if a irregular grid based splines interpolation implementation exists? What I currently do in a case like this (zooming an array with missing values) is first fill the missing values by using ndimage.generic_filter with a kernel function that averages the non-missing values in the moving window. This works as long as there are not too many missing values next to each other, however it is very slow... I think that, if an effort like this is to be made, a thorough discussion on the possible behaviours of ndimage functions with missing values should take place on one of the numpy/scipy related mailing lists. I'm sure I'm not the only one with ideas and/or use cases for this, and I'm certainly not someone with a lot of theoretical knowledge in this area. Regards, Vincent. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Removing scipy.stsci was [Re: [SciPy-dev] Deprecate chararray [was Plea for help]]
Christopher Hanley wrote: Hi Stefan, Never mind. I just found the Sprint website and read the description. I'm sorry I hadn't found this sooner. I would have made plans to stay and help. My apologizes. Hi list, I just saw this too and would like to misuse this thread to suggest another enhancement related to image processing you might consider: make ndimage maskedarray aware. Unfortunately we don't have any money to spend, otherwise I'd love to support this financially too. But it's a thing I'm running into (ndimage not working with missing data, that is) regularly, and for which it often is pretty hard to work out a workaround. E.g. any of the resampling (ndimage.zoom) or kernel filtering routines choke on array's with NaN's, and don't recognize masked arrays. Alas virtually all of the image data I process (satellite imagery) contains missing/bad data... I know it probably will be a pretty involved task, as ndimage comes from numarray and seems to be largely implemented in C. But I really wanted to raise the issue now the image processing subject turns up once again, and hope some folks with more/better programming skills than me might like the idea... Oh and I know of course ndimage is scipy, and this list is numpy. But as the image processing subject emerged here, well... Cheers, Vincent Schut. Sorry, Chris ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] add axis to results of reduction (mean, min, ...)
Keith Goodman wrote: On Thu, Aug 6, 2009 at 9:58 AM, Charles R Harrischarlesr.har...@gmail.com wrote: On Thu, Aug 6, 2009 at 9:55 AM, josef.p...@gmail.com wrote: What's the best way of getting back the correct shape to be able to broadcast, mean, min,.. to the original array, that works for arbitrary dimension and axis? I thought I have seen some helper functions, but I don't find them anymore? Adding a keyword to retain the number of dimensions has been mooted. It shouldn't be too difficult to implement and would allow things like: scaled = a/a.max(1, reduce=0) I could do that for 1.4 if folks are interested. I'd use that. It's better than what I usually do: scaled = a / a.max(1).reshape(-1,1) To chime in after returning from holidays: I'd use that keyword a great deal. Would be more than welcome to me. I currently have loads of code numpy.newaxis-ing the results of min/max/mean operations... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bitwise view on numpy array
Hi, I'm gonna have large (e.g. 2400x2400) arrays of 16 and 32 bit bitfields. I've been searching in vain for an efficient and convenient way to represent these array's individual bit's (or, even better, configureable bitfields of 1-4 bits each). Of course I know I can 'split' the array in its separate bitfields using bitwise operators and shifts, but this will greatly increase the memory usage because it'll create one byte array for each bitfield. So I was looking for a way to create a bitwise view on the original array's data. I've been looking at recarray's, but the smallest element these can use are bytes, correct?. I've been looking at ctypes arrays of Structure subclasses, which can define bitfields. However, these will give me an object array of elements with the Structure class subclass, and only allow me to access the bits per array element instead of for the entire array (or a subset), e.g. data[:].bit17-19 or someting like that. After searching the net in vain for some hours, the list is my last resort :-) Anyone having ideas of how to get both memory-efficient and convenient access to single bits of a numpy array? On a slightly related note, during my search I found some comments saying that numpy.bool arrays use an entire byte for each element. Could someone confirm (or, better, negate) that? Thanks, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is it a bug?
Travis E. Oliphant wrote: shuwj5...@163.com wrote: snipsnip Travis, thanks for the excellent explanation! It clears something which I think is related to this, I've been wanting to ask on the ml for some time already. Now here's the case. I often have 4d arrays that are actually related sets of satellite imagery, and have the form of [date, band, y, x]. These can get pretty large, so I like to prevent too much broadcasting or reshape-copy-ing when indexing to save some memory. However, I regularly have to apply some boolean index of [date, y, x] to each of the band dimensions (think of the bool mask as a threshold base on just one of the bands). Currently I usually loop over all band indices, e.g. for b in data.shape[1]: data[:, b, :, :][mask] = 0 Would there be a way to do this in a more numpy-like fashion that is also memory-efficient? Thanks, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fancy view question
Gael Varoquaux wrote: On Tue, Feb 17, 2009 at 10:18:11AM -0600, Robert Kern wrote: On Tue, Feb 17, 2009 at 10:16, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Tue, Feb 17, 2009 at 09:09:38AM -0600, Robert Kern wrote: np.repeat(np.repeat(x, 2, axis=0), 2, axis=1) stride_tricks are fun, but this is already a solved problem in numpy. Wow. I still have a lot to learn! How about adding a see-also in as_strided. They're not really related. Yes, I now understand that there where no views involved, on the contrary to what I first thought, so indeed there are not related. Gaël Wow, did I start something! :-) Thanks for all that shared their knowledge and ideas. I learned a lot beside getting answers to my questions. Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] fancy view question
Hi list, would it be possible to create a view on an array, such that this view is twice as large (in some dimensions) and in fact does a nearest neighbour 'zoom' on the original array? E.g. using some fancy slicing/striding tricks? an example: a = [[1, 2], [3, 4]] then I'd like a view on a such that this view is: [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]] I know several ways to create this as a copy, but as memory does play a role in this case (yes, I know it's cheap :-))... If not, fancy ways to create this as a copy are apreciated. I currently either create an empty array and then loop-fill it (e.g. this would be 2x2 loops), of if I' lazy I use ndimage.zoom, which actually is a bit overkill. Regards, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] memory usage
Huang-Wen Chen wrote: Robert Kern wrote: from numpy import * for i in range(1000): a = random.randn(512**2) b = a.argsort(kind='quick') Can you try upgrading to numpy 1.2.0? On my machine with numpy 1.2.0 on OS X, the memory usage is stable. I tried the code fragment on two platforms and the memory usage is also normal. 1. numpy 1.1.1, python 2.5.1 on Vista 32bit 2. numpy 1.2.0, python 2.6 on RedHat 64bit If I recall correctly, there were some major improvements in python's memory management/garbage collection from version 2.4 to 2.5. If you could try to upgrade your python to 2.5 (and possibly also your numpy to 1.2.0), you'd probably see some better behaviour. Regards, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] xml-rpc with numpy arrays
Lisandro Dalcin wrote: I believe xmlrpclib is currently the simpler approach. Some day I'll have the time to implement something similar using MPI communication with mpi4py. However, I believe it can be done even better: local, client-side proxies should automatically provide access to all members/methods of remote, server-side instances. The registering step needed within xmlrpclib is a bit ugly ;-) Try pyro: http://pyro.sourceforge.net/ or rpyc: http://rpyc.wikispaces.com/, both of which if I recall correctly, do implement this. VS. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] maskedarray: how to force mask to expand
Pierre, Thanks for your explanations. It still seems a little (too) complicated, but from a backwards-compatibility pov combined with your 'nomask is not False' implementation detail, I can understand mostly :-) I think the idea that when a.mask returns False, that actually means nomask instead of the False I'm used to, is what caused a major part of my confusion. It might actually be nice to give you some context of why I asked this: during my (satellite image) processing, I use maskedarrays by default for each step in the processing chain, and I need to save the result of each step to disk. That means saving the array and its mask. I save both of them as tiff files (because these can include all other info that is nice for satellite imagery, like coordinates and projection). When saving the mask, I'm creating a tiff file and pushing the .mask array into it. Therefore, I obviously need the .mask not to be nomask, but to be a full shaped array. That's the context you need to see my confusion in. Because speed and memory don't matter that much to me (well, they to matter, but I'm processing huge amounts of data anyways, and using parallel/clustered processing anyways, so I can take those masks...), I thought it might be easiest to make sure my data always has a full shaped mask. But, of course, perfomance-wise it would be best to be able to work with nomasks, and only expand these to full shaped masks when writing to disk. That's why I asked for a possible method on an ma to force expanding a mask, or e.g. an ma.mask.as_full_shaped_mask() method that returns either the mask, or (if nomask) a new array of Falses. I just supposed it existed and I could not find it, but now I understand it does not exist. But I could easily write something that checks for nomask and always returns an expanded mask. The 'trick' to create ma's with the mask=False keyword is neat, I had not thought about that. Same applies for masking values using ma[idx] = ma.masked. Just for completeness (in case someone else is reading this and wondering how to *unmask* values): just setting ma[idx] to some valid number will unset the mask for that index. No need to do ma[idx] = ma.unmask or whatever, just ma[idx] = v. OK, top posting is bad :) Further comments inline. Pierre GM wrote: Vincent, The argument of speed (having a default mask of nomask) applies only to the computations inside MaskedArray. Of course, it is still far faster not to use masks but only ndarrays. Just for clarity, to rephrase my question: how do I force ma to give me (always/by default/by some method on a maskedarray) a full shaped mask instead of 'False' or nomask? Because I am sure from the beginning that I'll need this mask in full shape, I want it, and I want to be able to treat it like any normal bool array :-) Easy: a = ma.array([1,2,3,4], mask=False) masked_array(data = [1 2 3 4], mask = [False False False False], fill_value=99) Puzzling ? Not so much. See, by default, the value of the mask parameter is `nomask`. nomask is in fact a 0-size boolean ndarray with value 0 (False). At the creation of the masked array, we check whether a value was given to the mask parameter. If no value is given, we default to `nomask`, and we end up with `a.mask is nomask`. If you force the mask parameter to the boolean False, you're not using `nomask`: in that case, the full mask is created. I understand that now. That won't work with ma.zeros or ma.ones. I could add an extra keyword to deal witht that, but is it really needed when you can simply do a.mask=False ? No real need for that, then. Would be just conveniencewise sugar-addition on (especially my) cake. Note: a=ma.array([1,2,3,]) a.mask is False False a.mask is ma.nomask True a.mask == False True Btw, in future versions, would it be an idea to separate 'nomask' and 'False' a little more? I always assumed (correctly?) that in python, True and False are singletons (as far as that is possible in python), just like None. 'False is False' should always compare to True, then. In this case (a.mask is False) it at least *seems* to break that 'rule'... And is there also a complement, like ma.unmasked? I could not find it (very quick search, I admit)... Or can I use !ma.masked? Can just set elements to unmask them, found out that. No need for ma.unmasked. No, there isn't and no you can't. ma.masked is actually a constant defined module-wide and independent of any array. That way, you can test whether an element is masked with `a[..] is masked`. Ah, now the magic starts... (normal user cap on head, beware): In [9]: am.mask Out[9]: False In [10]: am.mask = False In [11]: am.mask Out[11]: array([[False, False], [False, False]], dtype=bool) while (with the same am as before [9], with am.mask == False): In [15]: am.mask = am.mask In [16]: am.mask Out[16]: False Do you see (and agree with
Re: [Numpy-discussion] maskedarray: how to force mask to expand
to force the mask at creation to full shape, and there is no method on a maskedarray to change the mask to full shape. However, one can apply some magic and use 'a.mask' = False directly after creation to force the mask to full shape. This of course only works when the mask already *was* False, otherwise you'll be effectively changing your mask. So we presume ma never by default returns a mask of 'True', and then this works. The obvious trick to workaround this remote possibility of a mask of 'True' would be a.mask = a.mask, but that does not work. Hey, sorry about starting a discussion about this, while I meant to ask just a simple question (and really assumed I had overlooked something, it seemed so simple...). Again, no offence meant, and your work on ma is really appreciated. I hope this discussion will result in more intuitiveness in a future (C?) implementation of ma. Regards, Vincent. On Wednesday 24 September 2008 06:25:57 Vincent Schut wrote: Probably I'm just overlooking something obvious, but I'm having problems with maskedarrays (numpy.ma from svn: '1.3.0.dev5861'), the mask by default being a single bool value ('False') instead of a properly sized bool array. If I then try to mask one value by assigning values to certain mask positions (a.mask[0,0]=True) I get an error, logically. I know I can use mask_where, but I like the mask[...] idiom. And I have to expand the mask anyway, as I'm gonna write it to a file at the end. 1) Is there a way to have ma always use properly expanded masks (bool arrays instead of single bool values)? I tried the shrink=False keyword, but that does not do what I want, and is not available for numpy.ma.zeros, which I conveniently use a lot. 2) Is there a method/function to request the mask, be it a single bool value or an array, as a properly sized array? I found shrink_mask but no opposite method, and shrink_mask seems to do something subtly different even. Regards, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] maskedarray: how to force mask to expand
Probably I'm just overlooking something obvious, but I'm having problems with maskedarrays (numpy.ma from svn: '1.3.0.dev5861'), the mask by default being a single bool value ('False') instead of a properly sized bool array. If I then try to mask one value by assigning values to certain mask positions (a.mask[0,0]=True) I get an error, logically. I know I can use mask_where, but I like the mask[...] idiom. And I have to expand the mask anyway, as I'm gonna write it to a file at the end. 1) Is there a way to have ma always use properly expanded masks (bool arrays instead of single bool values)? I tried the shrink=False keyword, but that does not do what I want, and is not available for numpy.ma.zeros, which I conveniently use a lot. 2) Is there a method/function to request the mask, be it a single bool value or an array, as a properly sized array? I found shrink_mask but no opposite method, and shrink_mask seems to do something subtly different even. Regards, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.2 tasks
David Huard wrote: On Mon, Aug 4, 2008 at 1:45 PM, Jarrod Millman [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: snip Question: Should histogram raise a warning by default (new=True) to warn users that the behaviour has changed ? Or warn only if new=False to remind that the old behaviour will be deprecated in 1.3 ? I think that users will prefer being annoyed at warnings than surprised by an unexpected change, but repeated warnings can become a nuisance. To minimize annoyance, we could also offer three possibilities: new=None (default) : Equivalent to True, print warning about change. new=True : Don't print warning. new=False : Print warning about future deprecation. So those who have already set new=True don't get warnings, and all others are warned. Feedback ? As a regular user of histogram I say: please warn! Your proposal above seems OK to me. I do have histogram in a lot of kind of old (and sometimes long-running) code of mine, and I certainly would prefer to be warned. Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] first recarray steps
Anne Archibald wrote: 2008/5/21 Vincent Schut [EMAIL PROTECTED]: Christopher Barker wrote: Also, if you image data is rgb, usually, that's a (width, height, 3) array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) array, then that's rrr... Some image libs may give you that, I'm not sure. My data is. In fact, this is a simplification of my situation; I'm processing satellite data, which usually has more (and other) bands than just rgb. But the data is definitely in shape (bands, y, x). You may find your life becomes easier if you transpose the data in memory. This can make a big difference to efficiency. Years ago I was working with enormous (by the standards of the day) MATLAB files on disk, storing complex data. The way (that version of) MATLAB represented complex data was the way you describe: matrix of real parts, matrix of imaginary parts. This meant that to draw a single pixel, the disk needed to seek twice... depending on what sort of operations you're doing, transposing your data so that each pixel is all in one place may improve cache coherency as well as making the use of record arrays possible. Anne Anne, thanks for the thoughts. In most cases, you'll probably be right. In this case, however, it won't give me much (if any) speedup, maybe even slowdown. Satellite images often are stored on disk in a band sequential manner. The library I use for IO is GDAL, which is a higly optimized c library for reading/writing almost any kind of satellite data type. It also features an internal caching mechanism. And it gives me my data as (y, x, bands). I'm not reading single pixels anyway. The amounts of data I have to process (enormous, even by the standards of today ;-)) require me to do this in chunks, in parallel, even on different cores/cpu's/computers. Every chunk usually is (chunkYSize, chunkXSize, allBands) with xsize and ysize being not so small (think from 64^2 to 1024^2) so that pretty much eliminates any performance issues regarding the data on disk. Furthermore, having to process on multiple computers forces me to have my data on networked storage. The latency and transfer rate of the network will probably eliminate any small speedup because my drive has to do less seeks... Now for the recarray part, that would indeed ease my life a bit :) However, having to transpose the data in memory on every read and write does not sound very attractive. It will spoil cycles, and memory, and be asking for bugs. I can live without recarrays, for sure. I only hoped they might make my live a bit easier and my code a bit more readable, without too much effort. Well, they won't, apparently... I'll just go on like I did before this little excercise. Thanks all for the inputs. Cheers, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] distance_matrix: how to speed up?
Emanuele Olivetti wrote: snip This solution is super-fast, stable and use little memory. It is based on the fact that: (x-y)^2*w = x*x*w - 2*x*y*w + y*y*w For size1=size2=dimensions=1000 requires ~0.6sec. to compute on my dual core duo. It is 2 order of magnitude faster than my previous solution, but 1-2 order of magnitude slower than using C with weave.inline. Definitely good enough for me. Emanuele Reading this thread, I remembered having tried scipy's sandbox.rbf (radial basis function) to interpolate a pretty large, multidimensional dataset, to fill in the missing data points. This however failed soon with out-of-memory errors, which, if I remember correctly, came from the pretty straightforward distance calculation between the different data points that is used in this package. Being no math wonder, I assumed that there simply was no simple way to calculate distances without using much memory, and ended my rbf experiments. To make a story short: correct me if I am wrong, but might it be an idea to use the above solution in scipy.sandbox.rbf? Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] distance_matrix: how to speed up?
Rob Hetland wrote: On May 22, 2008, at 9:45 AM, Vincent Schut wrote: snip Really, though, the rbf toolbox will not be limited by the memory of the distance matrix. Later on, you need to do a large linear algebra 'solve', like this: r = norm(x, x) # The distances between all of the ND points to each other. A = psi(r) # where psi is some divergent function, often the multiquadratic function : sqrt((self.epsilon*r)**2 + 1) coefs = linalg.solve(A, data) # where data is the length of x, one data point for each spatial point. # to find the interpolated data points at xi ri = norm(xi, x) Ai = psi(ri) di = dot(Ai, coefs) All in all, it is the 'linalg.solve' that kills you. Ah, indeed, my memory was faulty, I'm afraid. It was in this phase that it halted, not in the distance calculations. Vincent. -Rob Rob Hetland, Associate Professor Dept. of Oceanography, Texas AM University http://pong.tamu.edu/~rob phone: 979-458-0096, fax: 979-845-6331 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] first recarray steps
Christopher Barker wrote: Vincent Schut wrote: Lets say I have a rgb image of arbitrary size, as a normal ndarray (that's what my image reading lib gives me). Thus shape is (3,ysize,xsize), dtype = int8. How would I convert/view this as a recarray of shape (ysize, xsize) with the first dimension split up into 'r', 'g', 'b' fields? No need for 'x' and 'y' fields. Take a look in this list for a thread entitled recarray fun about a month ago -- you'll find some more discussion of approaches. Well, actually that thread was my inspiration to take a closer look into recarrays... Also, if you image data is rgb, usually, that's a (width, height, 3) array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) array, then that's rrr... Some image libs may give you that, I'm not sure. My data is. In fact, this is a simplification of my situation; I'm processing satellite data, which usually has more (and other) bands than just rgb. But the data is definitely in shape (bands, y, x). Also, you probably want a uint8 dtype, giving you 0-255 for each byte. Same story. In fact, in this case it's int16, but can actually be any data type, even floats, even complex. But thanks for the thoughts :-) -Chris ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] first recarray steps
Robert Kern wrote: On Wed, May 21, 2008 at 1:48 AM, Vincent Schut [EMAIL PROTECTED] wrote: Christopher Barker wrote: Also, if you image data is rgb, usually, that's a (width, height, 3) array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) array, then that's rrr... Some image libs may give you that, I'm not sure. My data is. In fact, this is a simplification of my situation; I'm processing satellite data, which usually has more (and other) bands than just rgb. But the data is definitely in shape (bands, y, x). I don't think record arrays will help you much, then. Individual records need to be contiguous (bar padding). You can't interleave them. Hmm, that was just what I was wondering about, when reading Stefan's reply. So in fact, recarrays aren't just another way to view some data, no matter in what shape it is. So his solution: x.T.reshape((-1,x.shape[0])).view(dt).reshape(x.shape[1:]).T won't work, than. Or, at least, won't give me a view on my original dat, but would give me a recarray with a copy of my data. I guess I was misled by this text on the recarray wiki page: We would like to represent a small colour image. The image is two pixels high and two pixels wide. Each pixel has a red, green and blue colour component, which is represented by a 32-bit floating point number between 0 and 1. Intuitively, we could represent the image as a 3x2x2 array, where the first dimension represents the color, and the last two the pixel positions, i.e. Note the 3x2x2, which suggested imho that this would work with an image with (bands,y,x) shape, not with (x,y,bands) shape. But I understand that it's not shape, but internal representation in memory (contiguous or not, C/Fortran, etc) that matters? I know I can change the wiki text, but I'm afraid I still don't feel confident on this matter... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] first recarray steps
Robert Kern wrote: On Wed, May 21, 2008 at 2:03 AM, Vincent Schut [EMAIL PROTECTED] wrote: Robert Kern wrote: On Wed, May 21, 2008 at 1:48 AM, Vincent Schut [EMAIL PROTECTED] wrote: Christopher Barker wrote: Also, if you image data is rgb, usually, that's a (width, height, 3) array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) array, then that's rrr... Some image libs may give you that, I'm not sure. My data is. In fact, this is a simplification of my situation; I'm processing satellite data, which usually has more (and other) bands than just rgb. But the data is definitely in shape (bands, y, x). I don't think record arrays will help you much, then. Individual records need to be contiguous (bar padding). You can't interleave them. Hmm, that was just what I was wondering about, when reading Stefan's reply. So in fact, recarrays aren't just another way to view some data, no matter in what shape it is. So his solution: x.T.reshape((-1,x.shape[0])).view(dt).reshape(x.shape[1:]).T won't work, than. Or, at least, won't give me a view on my original dat, but would give me a recarray with a copy of my data. Right. I guess I was misled by this text on the recarray wiki page: We would like to represent a small colour image. The image is two pixels high and two pixels wide. Each pixel has a red, green and blue colour component, which is represented by a 32-bit floating point number between 0 and 1. Intuitively, we could represent the image as a 3x2x2 array, where the first dimension represents the color, and the last two the pixel positions, i.e. Note the 3x2x2, which suggested imho that this would work with an image with (bands,y,x) shape, not with (x,y,bands) shape. Yes, the tutorial goes on to use record arrays as a view onto an (x,y,bands) array and also make a (bands,x,y) view from that, too. That is, in fact, quite a confusing presentation of the subject. Now, there is a way to use record arrays here; it's a bit ugly but can be quite useful when parsing data formats. Each item in the record can also be an array. So let's pretend we have a (3,nx,ny) RGB array. nbands, nx, ny = a.shape dtype = numpy.dtype([ ('r', a.dtype, [nx, ny]), ('g', a.dtype, [nx, ny]), ('b', a.dtype, [nx, ny]), ]) # The flatten() is necessary to pre-empt numpy from # trying to do too much interpretation of a's shape. rec = a.flatten().view(dtype) print rec['r'] print rec['g'] print rec['b'] Ah, now that is clarifying! Thanks a lot. I'll do some experiments to see whether this way of viewing my data is useful to me (in a sense that making may code more readable is already very useful). Cheers, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] first recarray steps
Hi, I'm trying to get into recarrays. Unfortunately documentation is a bit on the short side... Lets say I have a rgb image of arbitrary size, as a normal ndarray (that's what my image reading lib gives me). Thus shape is (3,ysize,xsize), dtype = int8. How would I convert/view this as a recarray of shape (ysize, xsize) with the first dimension split up into 'r', 'g', 'b' fields? No need for 'x' and 'y' fields. I tried creating a numpy dtype {names: ('r','g','b'), formats: (numpy.int8,)*3}, but when I try to raw_img.view(rgb_dtype) I get: ValueError: new type not compatible with array. Now this probably should not be too difficult, but I just don't see it... Thanks, Vincent. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] moving window in numpy
Matthew Perry wrote: Hi all, I'm not sure if my terminology is familiar but I'm trying to do a moving window analysis (ie a spatial filter or kernel) on a 2-D array representing elevation. For example, a 3x3 window centered on each cell is used to calculate the derivate slope of that cell. Can this easily be implemented using numpy? If I understand you correctly, scipy.ndimage does exactly what you ask for. See http://www.scipy.org/SciPyPackages/Ndimage. Cheers, VS. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy : your experiences?
Rahul Garg wrote: a) Can you guys tell me briefly about the kind of problems you are tackling with numpy and scipy? mainly timeseries of Remote Sensing data ('satellite images') processing. No really fancy math, but huge (sometimes multiple gigabytes) multidimensional (date, bands, y, x: order of magnitude: [hundreds, some, tenthousands or more, idem]) datasets. Some tasks are long-running, not because of the complex math involved, but because of the amount of data. b) Have you ever felt that numpy/scipy was slow and had to switch to C/C++/Fortran? Yes, but usually the overhead of programming the whole thing in C is not worth the speedup of processing, especially with modern fast computers and easy ways to parallelize stuff (ipython and/or parallelpython). c) Do you use any form of parallel processing? Multicores? SMPs? Clusters? If yes how did u utilize them? See above. Just started to use parallelpython here, currently not yet for production work, but in testing/prototyping phase. Using both smp, multicore and multiple machines ('cluster'). PP doesn't make any difference between them. How? Just cut the data in suitable pieces and throw them as a pp job to the cluster :-) It's really that simple nowadays. And most of our processing is very parallel in nature. Cheers, Vincent Schut. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ndim argsort question
Hi, I'm getting a bit lost in numpy's fancy indexing, using argsort on a slice of a 4d arrays, and then using the resulting indices to sort other slices of that same array, preferably in one go. I have a data[d, b, y, x] array (for the curious: date, band, y, x: satellite image) Then I create a sorting index from band 0: idx = data[:, 0, :, :].argsort(axis=0) Now I want to sort all 7 bands accoring to this index, without looping, or at least avoid looping as far as possible. Something like sortedData = data[idx], but then for each 'b'. However, I cant get multidim indexing using the argsort result to work. Any hints? Hard to explain. I hope someone understands it, otherwise please let me know and I'll try to refrase :) Cheers, Vincent Schut. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] strange problem compiling against ndarrayobject.h
Hmm, it seems my original message did not come through? Not in gmane, at least... Well, here's again: Hi numpy and/or gdal guru's, I'm suddenly getting into trouble compiling gdal's python extension, when it includes ndarrayobject.h from numpy. First it complains about my python not being unicode, though I am sure it is unicode, and then it bails out with all kinds of errors (see snippet below). Numpy people, please read on even though this might seem a gdal problem, because it clearly is numpy related, see the snippet... I've tried everything I could think of, but to no avail. I hope one of you can give me a hint, because I'm kind of clueless, and even don't know in what direction I should search... Here is a paste from the output from building (python setup.py build) in the gdal/swig/python dir. make[2]: Entering directory `/usr/local/src/gdal/swig/python' python setup.py build numpy include /usr/lib64/python2.5/site-packages/numpy/core/include running build running build_py creating build creating build/lib.linux-x86_64-2.5 copying gdal.py - build/lib.linux-x86_64-2.5 copying ogr.py - build/lib.linux-x86_64-2.5 copying osr.py - build/lib.linux-x86_64-2.5 copying gdalconst.py - build/lib.linux-x86_64-2.5 creating build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdalnumeric.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdalconst.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/__init__.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/osr.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdal_array.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdal.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/ogr.py - build/lib.linux-x86_64-2.5/osgeo running build_ext building 'osgeo._gdal' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC creating build/temp.linux-x86_64-2.5 creating build/temp.linux-x86_64-2.5/extensions compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/gdal_wrap.cpp extensions/gdal_wrap.cpp: In function ‘PyObject* _wrap_MajorObject_SetMetadata__SWIG_0(PyObject*, PyObject*)’: extensions/gdal_wrap.cpp:4570: warning: deprecated conversion from string constant to ‘char*’ x86_64-pc-linux-gnu-g++ -pthread -shared build/temp.linux-x86_64-2.5/extensions/gdal_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_gdal.so building 'osgeo._gdalconst' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/gdalconst_wrap.c x86_64-pc-linux-gnu-gcc -pthread -shared build/temp.linux-x86_64-2.5/extensions/gdalconst_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_gdalconst.so building 'osgeo._osr' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/osr_wrap.cpp x86_64-pc-linux-gnu-g++ -pthread -shared build/temp.linux-x86_64-2.5/extensions/osr_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_osr.so building 'osgeo._ogr' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/ogr_wrap.cpp x86_64-pc-linux-gnu-g++ -pthread -shared build/temp.linux-x86_64-2.5/extensions/ogr_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_ogr.so building 'osgeo._gdal_array' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/_gdal_array.cpp In file included from /usr/lib64/python2.5/site-packages/numpy/core/include/numpy/arrayobject.h:14, from extensions/gdal_array.h:10, from extensions/_gdal_array.cpp:33: /usr/lib64/python2.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:92:2: error: #error Must use Python with unicode enabled. In file included from /usr/include/python2.5/Python.h:62, from extensions/gdal_array.h:16, from extensions/_gdal_array.cpp:33: /usr/include/python2.5/pyport.h:105:1: warning: PY_SSIZE_T_MAX redefined In file included from
Re: [Numpy-discussion] strange problem compiling against ndarrayobject.h
Hi all, It appeared to be a gdal issue after all: the arrayobject header file was being included before the python headers... Glad it wasn't something like me having borked my numpy build :) Cheers, Vincent. Vincent Schut wrote: Hmm, it seems my original message did not come through? Not in gmane, at least... Well, here's again: Hi numpy and/or gdal guru's, I'm suddenly getting into trouble compiling gdal's python extension, when it includes ndarrayobject.h from numpy. First it complains about my python not being unicode, though I am sure it is unicode, and then it bails out with all kinds of errors (see snippet below). Numpy people, please read on even though this might seem a gdal problem, because it clearly is numpy related, see the snippet... I've tried everything I could think of, but to no avail. I hope one of you can give me a hint, because I'm kind of clueless, and even don't know in what direction I should search... Here is a paste from the output from building (python setup.py build) in the gdal/swig/python dir. make[2]: Entering directory `/usr/local/src/gdal/swig/python' python setup.py build numpy include /usr/lib64/python2.5/site-packages/numpy/core/include running build running build_py creating build creating build/lib.linux-x86_64-2.5 copying gdal.py - build/lib.linux-x86_64-2.5 copying ogr.py - build/lib.linux-x86_64-2.5 copying osr.py - build/lib.linux-x86_64-2.5 copying gdalconst.py - build/lib.linux-x86_64-2.5 creating build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdalnumeric.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdalconst.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/__init__.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/osr.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdal_array.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/gdal.py - build/lib.linux-x86_64-2.5/osgeo copying osgeo/ogr.py - build/lib.linux-x86_64-2.5/osgeo running build_ext building 'osgeo._gdal' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC creating build/temp.linux-x86_64-2.5 creating build/temp.linux-x86_64-2.5/extensions compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/gdal_wrap.cpp extensions/gdal_wrap.cpp: In function ‘PyObject* _wrap_MajorObject_SetMetadata__SWIG_0(PyObject*, PyObject*)’: extensions/gdal_wrap.cpp:4570: warning: deprecated conversion from string constant to ‘char*’ x86_64-pc-linux-gnu-g++ -pthread -shared build/temp.linux-x86_64-2.5/extensions/gdal_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_gdal.so building 'osgeo._gdalconst' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/gdalconst_wrap.c x86_64-pc-linux-gnu-gcc -pthread -shared build/temp.linux-x86_64-2.5/extensions/gdalconst_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_gdalconst.so building 'osgeo._osr' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/osr_wrap.cpp x86_64-pc-linux-gnu-g++ -pthread -shared build/temp.linux-x86_64-2.5/extensions/osr_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_osr.so building 'osgeo._ogr' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/ogr_wrap.cpp x86_64-pc-linux-gnu-g++ -pthread -shared build/temp.linux-x86_64-2.5/extensions/ogr_wrap.o -L../../.libs -L../.. -L/usr/lib64 -lgdal -lpython2.5 -o build/lib.linux-x86_64-2.5/osgeo/_ogr.so building 'osgeo._gdal_array' extension C compiler: x86_64-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC compile options: '-I../../port -I../../gcore -I../../alg -I../../ogr -I/usr/lib64/python2.5/site-packages/numpy/core/include -I/usr/include/python2.5 -c' x86_64-pc-linux-gnu-gcc: extensions/_gdal_array.cpp In file included from /usr/lib64/python2.5/site-packages/numpy/core/include/numpy/arrayobject.h:14, from extensions/gdal_array.h:10, from extensions/_gdal_array.cpp:33: /usr/lib64/python2.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:92:2: error