Re: [Numpy-discussion] Numpy logo in VTK
You're now reminding me of the old spinning SGI logo http://www.youtube.com/watch?v=Nqf6TjE49N8 Brennan On 27/06/2012 10:40 a.m., klo uo wrote: I continued in this mpl trip, with small animation sequence: # animation ax.view_init(90,-90) plt.ion() plt.draw() plt.show() for l in arange(25): ax.set_xlim3d(1.5-.1*l,2.5+.1*l) ax.set_ylim3d(1.5-.1*l,2.5+.1*l) ax.view_init(90-3*l, -90+l) plt.draw() plt.title(NumPy) plt.ioff() plt.show() Try it or check it out on YouTube: www.youtube.com/watch?v=mpYPS_zXAFw Whole script in attachment ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Statistical distributions on samples
You can use scipy.stats.truncnorm, can't you? Unless I misread, you want to sample a normal distribution but with generated values only being within a specified range? However you also say you want to do this with triangular and log normal and for these I presume the easiest way is to sample and then accept/reject. Brennan On 13/08/2011 2:53 a.m., Christopher Jordan-Squire wrote: Hi Andrea--An easy way to get something like this would be import numpy as np import scipy.stats as stats sigma = #some reasonable standard deviation for your application x = stats.norm.rvs(size=1000, loc=125, scale=sigma) x = x[x50] x = x[x200] That will give a roughly normal distribution to your velocities, as long as, say, sigma25. (I'm using the rule of thumb for the normal distribution that normal random samples lie 3 standard deviations away from the mean about 1 out of 350 times.) Though you won't be able to get exactly normal errors about your mean since normal random samples can theoretically be of any size. You can use this same process for any other distribution, as long as you've chosen a scale variable so that the probability of samples being outside your desired interval is really small. Of course, once again your random errors won't be exactly from the distribution you get your original samples from. -Chris JS On Fri, Aug 12, 2011 at 8:32 AM, Andrea Gavana andrea.gav...@gmail.com mailto:andrea.gav...@gmail.com wrote: Hi All, I am working on something that appeared to be a no-brainer issue (at the beginning), by my complete ignorance in statistics is overwhelming and I got stuck. What I am trying to do can be summarized as follows Let's assume that I have to generate a sample of a 1,000 values for a variable (let's say, velocity) using a normal distribution (but later I will have to do it with log-normal, triangular and a couple of others). The only thing I know about this velocity sample is the minimum and maximum values (let's say 50 and 200 respectively) and, obviously for the normal distribution (but not so for the other distributions), the mean value (125 in this case). Now, I would like to generate this sample of 1,000 points, in which none of the point has velocity smaller than 50 or bigger than 200, and the number of samples close to the mean (125) should be higher than the number of samples close to the minimum and the maximum, following some kind of normal distribution. What I have tried up to now is summarized in the code below, but as you can easily see, I don't really know what I am doing. I am open to every suggestion, and I apologize for the dumbness of my question. import numpy from scipy import stats import matplotlib.pyplot as plt minval, maxval = 50.0, 250.0 x = numpy.linspace(minval, maxval, 500) samp = stats.norm.rvs(size=len(x)) pdf = stats.norm.pdf(x) cdf = stats.norm.cdf(x) ppf = stats.norm.ppf(x) ax1 = plt.subplot(2, 2, 1) ax1.plot(range(len(x)), samp) ax2 = plt.subplot(2, 2, 2) ax2.plot(x, pdf) ax3 = plt.subplot(2, 2, 3) ax3.plot(x, cdf) ax4 = plt.subplot(2, 2, 4) ax4.plot(x, ppf) plt.show() Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Developer NumPy list versus User NumPy list
On 28/01/2011 1:07 p.m., Sturla Molden wrote: Den 28.01.2011 00:23, skrev Robert Kern: We've resisted it for years. I don't think the split has done scipy much good. The scope of NumPy is narrower development-wise and wider user-wise. While SciPy does not benefit, as use and development are still quite entangled, this is not be the case for NumPy. Perhaps we could split the NumPy list and merge the SciPy lists? As a user of NumPy and SciPy (and there are probably a lot of people who use both) why not have one developer list for both NumPy and SciPy and one user list for both NumPy and SciPy? It might not work from a developer point of view but I think it does from a user point of view - mind you I just put everything of interest from both mailing lists into one folder. Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] quadratic function
I have used both linear least squares and radial basis functions as a proxy equation, calculated from the results of computer simulations which are calculating some objective function value based on a number of varied input parameters. As an alternative option I want to add a quadratic function so if there are parameters/variables x,y,z then rather than just having a linear function f=a+bx+cy+dz I'll have f=a+bx+cx**2 + dxy + I'd like to have the option not to include all the different second order terms. Where should I start looking? Thanks Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] quadratic function
On 29/10/2010 2:34 a.m., Robert Kern wrote: On Thu, Oct 28, 2010 at 06:38, Brennan Williams brennan.willi...@visualreservoir.com wrote: I have used both linear least squares and radial basis functions as a proxy equation, calculated from the results of computer simulations which are calculating some objective function value based on a number of varied input parameters. As an alternative option I want to add a quadratic function so if there are parameters/variables x,y,z then rather than just having a linear function f=a+bx+cy+dz I'll have f=a+bx+cx**2 + dxy + I'd like to have the option not to include all the different second order terms. A = np.column_stack([ np.ones_like(x), x, y, z, x*x, y*y, z*z, x*y, y*z, x*z, ]) x, res, rank, s = np.linalg.lstsq(A, f) OK, so in other words, you can use linalg.lstsq for whatever higher order terms you want to include or exclude. Very nice. Thanks. On a related topic I also use the Rbf radial basis function as a proxy equation. I have one set of data that it fails to return an Rbf for and I've just realised that in my set of simulations that are used to build the proxy equation I have some duplicate equations. I'm wondering if Rbf doesn't like duplicate points? It obviously doesn't affect linalg.lstsq. Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] quadratic function
On 29/10/2010 6:35 a.m., Robert Kern wrote: On Thu, Oct 28, 2010 at 12:33, Brennan Williams brennan.willi...@visualreservoir.com wrote: On 29/10/2010 2:34 a.m., Robert Kern wrote: On Thu, Oct 28, 2010 at 06:38, Brennan Williams brennan.willi...@visualreservoir.comwrote: I have used both linear least squares and radial basis functions as a proxy equation, calculated from the results of computer simulations which are calculating some objective function value based on a number of varied input parameters. As an alternative option I want to add a quadratic function so if there are parameters/variables x,y,z then rather than just having a linear function f=a+bx+cy+dz I'll have f=a+bx+cx**2 + dxy + I'd like to have the option not to include all the different second order terms. A = np.column_stack([ np.ones_like(x), x, y, z, x*x, y*y, z*z, x*y, y*z, x*z, ]) x, res, rank, s = np.linalg.lstsq(A, f) OK, so in other words, you can use linalg.lstsq for whatever higher order terms you want to include or exclude. Very nice. Thanks. Right. Just as long as the problem is linear in the coefficients, the design matrix can be derived however you like. So I could optionally put log terms in if I thought it was linear in log(x) for example? On a related topic I also use the Rbf radial basis function as a proxy equation. I have one set of data that it fails to return an Rbf for and I've just realised that in my set of simulations that are used to build the proxy equation I have some duplicate equations. I'm wondering if Rbf doesn't like duplicate points? It obviously doesn't affect linalg.lstsq. Rbf doesn't like duplicate points. :-) OK, fair enough, I just need to add a bit of housekeeping to remove duplicate simulations. Thanks for the confirmation. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Interpolation question
Andrea Gavana wrote: Hi Chris and All, On 29 March 2010 22:35, Christopher Barker wrote: Andrea Gavana wrote: Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try. Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now... One other thought -- core to much engineering is dimensional analysis -- you know how we like those non-dimensional number! I think this situation is less critical, as you are interpolating, not optimizing or something, but many interpolation methods are built on the idea of some data points being closer than others to your point of interest. Who is to say if a point that is 2 hours away is closer or father than one 2 meters away? This is essentially what you are doing. Scaling everything to the same range is a start, but then you've still given them an implicit weighting. An alternative to to figure out a way to non-dimensionalize your parameters -- that *may* give you a more physically based scaling. And you might invent the Gavana Number in the process ;-) Might be :-D . At the moment I am pretty content with what I have got, it seems to be working fairly well although I didn't examine all the possible cases and it is very likely that my little tool will break disastrously for some combinations of parameters. However, I am not sure I am allowed to post an image comparing the real simulation with the prediction of the interpolated proxy model, but if you could see it, you would surely agree that it is a very reasonable approach. It seems to good to be true :-D . Again, this is mainly due to the fact that we have a very extensive set of simulations which cover a wide range of combinations of parameters, so the interpolation itself is only doing its job correctly. I don't think the technique can be applied blindly to whatever oil/gas/condensate /whatever reservoir, as non-linearities in fluid flow simulations appear where you least expect them, but it seems to be working ok (up to now) for our field (which is, by the way, one of the most complex and less understood condensate reservoir out there). And of course that proxy simulator only deals with the input variables that you decided on 1,000+ simulations ago. All you need is for someone to suggest something else like how about gas injection? and you're back to having to do more real simulation runs (which is where a good experimental design comes in). It would be interesting to know how well your proxy simulator compares to the real simulator for a combination of input variable values that is a good distance outside your original parameter space. Brennan Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ == Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html == ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Interpolation question
Andrea Gavana wrote: HI All, On 28 March 2010 19:22, Robert Kern wrote: On Sun, Mar 28, 2010 at 03:26, Anne Archibald peridot.face...@gmail.com wrote: On 27 March 2010 20:24, Andrea Gavana andrea.gav...@gmail.com wrote: Hi All, I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough. Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40). If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this. I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators. A problem that may be more serious is that the python loop over the 40 interpolators can be slow, while a C implementation would give vector-valued results rapidly. 40 is not a bad number to loop over. The thing I would be worried about is the repeated calculation of the (1000, 1000) radial function evaluation. I think that a small modification of Rbf could be made to handle the vector-valued case. I leave that as an exercise to Andrea, of course. :-) It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these: # Interpolate the cumulative oil production rbf = Rbf(x1, x2, x3, x4, x5, x6, final_oil_recovery) # Try to use existing combination of parameters to get back # the original result (more or less) possible_recovery = rbf(x1[10], x2[10], x3[10], x4[10], x5[10], x6[10]) 1) If I attempt to use the resulting interpolation function (rbf), inputting a single value for each x1, x2, ..., x6 that is *already present* in the original x1, x2, ..., x6 vectors, I get meaningless results (i.e., negative oil productions, 1000% error, and so on) in all cases with some (rare) exceptions when using the thin-plate interpolation option; 2) Using a combination of parameters x1, x2, ..., x6 that is *not* in the original set, I get non-physical (i.e., unrealistic) results: it appears that RBFs think that if I increase the number of production wells I get less oil recovered (!), sometimes by a factor of 10. I am pretty sure I am missing something in this whole interpolation thing (I am no expert at all), but honestly it is the first time I get such counter-intuitive results in my entire Python numerical life ;-) Hi Andrea, I'm looking at doing this as well, i.e. using RBF to create what is effectively a proxy simulator. I'm going to have a read through the scipy rbf documentation to get up to speed. I take it you are using an experimental design algorithm such as Plackett-Burman or similar? Brennan Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ == Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html == ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Interpolation question
Andrea Gavana wrote: Hi All, On 28 March 2010 22:14, Pierre GM wrote: On Mar 28, 2010, at 4:47 PM, Andrea Gavana wrote: HI All, On 28 March 2010 19:22, Robert Kern wrote: On Sun, Mar 28, 2010 at 03:26, Anne Archibald peridot.face...@gmail.com wrote: On 27 March 2010 20:24, Andrea Gavana andrea.gav...@gmail.com wrote: Hi All, I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough. Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these: Which is hardly surprising: you're working with a physical process, you must have some constraints on your parameters (whether dependence between parameters, bounds on the estimates...) that are not taken into account by the interpolation scheme you're using. So, it's back to the drawing board. Sorry, this laptop has gone mad and sent the message before I was finished... The curious thing is, when using the rbf interpolated function to find a new approximation, I am not giving RBFs input values that are outside the bounds of the existing parameters. Either they are exactly the same as the input ones (for a single simulation), or they are slightly different but always inside the bounds. I always thought that, at least for the same input values, it should give (approximatively) the same answer as the real one. What are you actually trying to achieve ? Find the best estimates of your 10 parameters to match an observed production timeline ? Find a space for your 10 parameters that gives some realistic production ? I have this bunch of simulations for possible future oil productions (forecasts), run over the past 4 years, which have different combination of number of oil producing wells, gas injection wells, oil plateau values, gas injection plateau values and gas production plateau values (6 parameters in total, not 10). Now, the combinations themselves are pretty nicely spread, meaning that I seem to have a solid base to build an interpolator that will give me a reasonable answer when I ask it for parameters inside the bounds (i.e., no extrapolation involved). Assuming that your 10 parameters are actually independent, did you run 1000**10 simulations to test all the possible combinations? I hope you're kidding :-D . Each of these simulations take from 7 to 12 hours, not counting the huge amount of time needed to build their data files by hand... Let's see a couple of practical examples (I can share the data if someone is interested). Definitely interested in helping solve this one so feel free to email the data (obviously not 1,000 Eclipse smspec and unsmry files although I can handle that without any problems). Example 1 # o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts) op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15] I take it all your inputs are float arrays? i.e. you just consider number of production wells as a floating point value. I suppose strictly speaking o2 and o3 are not independent of eachother but I would guess that for the purposes of using Rbf it wouldn't be an issue. I'm just having a look at the cookbook example (http://www.scipy.org/Cookbook/RadialBasisFunctions) and adding a couple more variables, bumping up the number of points to 1000 and setting the z values to be up around 1.0e+09 - so far it still seems to work although if you increase the sample points to 2000+ it falls over with a memory error. I've got a smaller bunch of Eclipse simulations I could try Rbf out on - 19 runs with 9 variables (it's a Tornado algorithm with -1,0,+1 values for each variable). FOPT's will be in a similar range ot yours. Brennan fi = rbf(op, gp, gi, o2w, o3w, inw) # I = KNOW = the answer to be close to +3.5e8 print fi [ -1.00663296e+08] (yeah right...) Example 2 Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells = less production) fi = rbf(op, gp, gi, o2w, o3w, inw) print fi [ 1.30023424e+08] And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8... Andrea. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
Re: [Numpy-discussion] Interpolation question
Andrea Gavana wrote: Hi Friedrich All, On 28 March 2010 23:51, Friedrich Romstedt wrote: 2010/3/28 Andrea Gavana andrea.gav...@gmail.com: Example 1 # o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts) op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15] fi = rbf(op, gp, gi, o2w, o3w, inw) # I = KNOW = the answer to be close to +3.5e8 print fi [ -1.00663296e+08] (yeah right...) Example 2 Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells = less production) fi = rbf(op, gp, gi, o2w, o3w, inw) print fi [ 1.30023424e+08] And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8... I want to put my2 cents in, fwiw ... What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.Rbf.html#scipy.interpolate.Rbf are three things: 1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic (3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless mean distances in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible. You can imagine it as the follwing ascii plot: * * * * * * * the x dimension is say the gas plateau dimension (~1e10), the y dimension is the year dimension (~1e1). In an isotropic plot, using the data lims and aspect = 1, they may be well homogenous, but on this scaling, as used by Rbf, they are lumped. I don't know if it's clear what I mean from my description? Are the corresponding parameter values spread completely randomly, or is there always varied only one axis? If random, you could try a fractal analysis to find out the optimal scaling of the axes for N-d coverage of N-d space. In the case above, is is something between points and lines, I guess. When scaling it properly, it becomes a plane-like coveage of the xy plane. When scaling further, it will become points again (lumped together). So you can find an optimum. There are quantitative methods to obtain the fractal dimension, and they are quite simple. I believe I need a technical dictionary to properly understand all that... :-D . Sorry, I am no expert at all, really, just an amateur with some imagination, but your suggestion about the different magnitude of the matrix is a very interesting one. Although I have absolutely no idea on how to re-scale them properly to avoid RBFs going crazy. As for your question, the parameter are not spread completely randomly, as this is a collection of simulations done over the years, trying manually different scenarios, without having in mind a proper experimental design or any other technique. Nor the parameter values vary only on one axis in each simulation (few of them are like that). I assume that there is a default norm that calculates the distance between points irrespective of the order of the input coordinates? So if that isn't working, leading to the spurious results, the next step is to normalise all the inputs so they are in the same range, e.g max-min=1.0 On a related note, what approach would be best if one of the input parameters wasn't continuous? e.g. I have three quite different geological distributions called say A,B and C. SO some of my simulations use distribution A, some use B and some use C. I could assign them the numbers 1,2,3 but a value of 1.5 is meaningless. Andrea, if you have 1TB of data for 1,000 simulation runs, then, if I assume you only mean the smspec/unsmry files, that means each of your summary files is 1GB in size? Are those o2w,o3w and inw figures the number of new wells only or existing+new? It's fun dealing with this amount of data isn't it? Brennan Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ == Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html == ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
Re: [Numpy-discussion] Interpolation question
Andrea Gavana wrote: On 29 March 2010 00:34, Robert Kern wrote: On Sun, Mar 28, 2010 at 18:30, Andrea Gavana andrea.gav...@gmail.com wrote: Hi Friedrich All, On 28 March 2010 23:51, Friedrich Romstedt wrote: 2010/3/28 Andrea Gavana andrea.gav...@gmail.com: Example 1 # o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts) op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15] fi = rbf(op, gp, gi, o2w, o3w, inw) # I = KNOW = the answer to be close to +3.5e8 print fi [ -1.00663296e+08] (yeah right...) Example 2 Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells = less production) fi = rbf(op, gp, gi, o2w, o3w, inw) print fi [ 1.30023424e+08] And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8... I want to put my2 cents in, fwiw ... What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.Rbf.html#scipy.interpolate.Rbf are three things: 1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic (3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless mean distances in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible. I believe I need a technical dictionary to properly understand all that... :-D . Sorry, I am no expert at all, really, just an amateur with some imagination, but your suggestion about the different magnitude of the matrix is a very interesting one. Although I have absolutely no idea on how to re-scale them properly to avoid RBFs going crazy. Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try. Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now... I get reasonable values for various combinations of parameters by scaling the input data using the standard deviation of each of them. It seems also that the other interpolation schemes are much less erratic now, and in fact (using input values equal to the original data) I get these range of errors for the various schemes: inverse multiquadric -15.6098482614 15.7194674906 linear-1.76157336073e-010 1.24949181055e-010 cubic-0.000709860285963 0.018385394661 gaussian -293.930336611 282.058111404 quintic -0.176381494531 5.37780806549 multiquadric -30.9515933446 58.3786105046 thin-plate -7.06755391536e-006 8.71407169821e-005 In percentage. Some of them are still off the mark, but you should have seen them before ;-) . That's great news. Going to give it a try myself. Interesting that the linear scheme gives the lowest error range. I'll do some more analysis tomorrow, and if it works I am going to try the bigger profile-over-time interpolation. Thank you so much guys for your suggestions. Andrea. Imagination Is The Only Weapon In The War Against Reality. http://xoomer.alice.it/infinity77/ == Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html == ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to efficiently build an array of x, y, z points
Christopher Barker wrote: Bruce Southey wrote: Christopher Barker provided some code last last year on appending ndarrays eg: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html yup, Id love someone else to pick that up and test/improve it. Anyway, that code only handles 1-d arrays, though that can be structured arrays. Id like to extend it to handlw n-d arrays, though you could only grow them in the first dimension, which may work for your case. As for performance: My numpy code is a bit slower than using python lists, if you add elements one at a time, and the elements are a standard python data type. It should use less memory though, if that matters. If you add the data in big enough chunks, my method gets better performance. Currently I'm adding all my corner point xyz's into a list and then converting to an array of shape (npoints,3) And I'm creating a celllist with the point indices for each cell and then converting that into an array of shape (nactivecells,8) Then I'm creating an unstructured grid. Ultimately I'm trying to build a tvtk unstructured grid to view in a Traits/tvtk/Mayavi app. I'd love to see that working, once you've got it! So will I. The grid is ni*nj*nk cells with 8 xyz's per cell (hexahedral cell with 6 faces). However some cells are inactive and therefore don't have geometry. Cells also have connectivity to other cells, usually to adjacent cells (e.g. cell i,j,k connected to cell i-1,j,k) but not always. I'm confused now -- what does the array need to look like in the end? Maybe: ni*nj*nk X 8 X 3 ? How is inactive indicated? I made a typo in my first posting. Each cell has 8 corners, each corner an x,y,z so yes, if all the cells in the grid are active then ni*nj*nk*8*3 but usually not all cells are active and it is optional whether to have inactive cell geometry written out to the grid file so it is actually nactivecells*8*3 Is the connectivity somehow in the same array, or is that stored separately? Bit of both - there is separate connectivity info and also implicit connectivity info. Often a cell will be fully connected to its adjacent cell(s) as they share a common face. But also, often there is not connectivity (e.g. a fault) and the +I face of a cell does not match up against the -I face of the adjacent cell. At the moment, I'm not removing duplicate points (of which there are a lot, probably 25-50% depending on the degree of faulting). One other thing I need to do is to re-order my xyz coordinates - in the attached image taken from the VTK file format pdf you can see the 0,1,2,3 and 4,5,6,7 node ordering. In my grid it is 0,1,3,2 and 4,5,7,6 so you can see that I need to swap round some of the coordinates. I need to do this for each cell and there may be 10,000 of them but there may be 2,000,000 of them. So I think it is probably best not to do it on a cell by cell basis but wait until I've built my full pointlist, then convert it to an array, probably of shape (nactivecells,8,3) and then somehow rearrange/reorder the 8 columns. Sound the right way to go? Brennan inline: hexahedron.png___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to efficiently build an array of x, y, z points
Bruce Southey wrote: On 03/02/2010 09:47 PM, David Goldsmith wrote: On Tue, Mar 2, 2010 at 6:59 PM, Brennan Williams brennan.willi...@visualreservoir.com mailto:brennan.willi...@visualreservoir.com wrote: David Goldsmith wrote: On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams brennan.willi...@visualreservoir.com mailto:brennan.willi...@visualreservoir.com mailto:brennan.willi...@visualreservoir.com mailto:brennan.willi...@visualreservoir.com wrote: I'm reading a file which contains a grid definition. Each cell in the grid, apart from having an i,j,k index also has 8 x,y,z coordinates. I'm reading each set of coordinates into a numpy array. I then want to add/append those coordinates to what will be my large points array. Due to the orientation/order of the 8 corners of each hexahedral cell I may have to reorder them before adding them to my large points array (not sure about that yet). Should I create a numpy array with nothing in it and then .append to it? But this is probably expensive isn't it as it creates a new copy of the array each time? Or should I create a zero or empty array of sufficient size and then put each set of 8 coordinates into the correct position in that big array? I don't know exactly how big the array will be (some cells are inactive and therefore don't have a geometry defined) but I do know what its maximum size is (ni*nj*nk,3). Someone will correct me if I'm wrong, but this problem - the best way to build a large array whose size is not known beforehand - came up in one of the tutorials at SciPyCon '09 and IIRC the answer was, perhaps surprisingly, build the thing as a Python list (which is optimized for this kind of indeterminate sequence building) and convert to a numpy array when you're done. Isn't that what was recommended, folks? Build a list of floating point values, then convert to an array and shape accordingly? Or build a list of small arrays and then somehow convert that into a big numpy array? My guess is that either way will be better than iteratively appending to an existing array. Hi, Christopher Barker provided some code last last year on appending ndarrays eg: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html A lot depends on your final usage of the array otherwise there are no suitable suggestions. That is do you need just to index the array using i, j, k indices (this gives you either an i by j by k array that contains the x, y, z coordinates) or do you also need to index the x, y, z coordinates as well (giving you an i by j by k by x by y by z array). If it is just plain storage then perhaps just a Python list, dict or sqlite object may be sufficient. Ultimately I'm trying to build a tvtk unstructured grid to view in a Traits/tvtk/Mayavi app. The grid is ni*nj*nk cells with 8 xyz's per cell (hexahedral cell with 6 faces). However some cells are inactive and therefore don't have geometry. Cells also have connectivity to other cells, usually to adjacent cells (e.g. cell i,j,k connected to cell i-1,j,k) but not always. I'll post more comments/questions as I go. Brennan There are also time and memory constraints as you can spend large effort just to get the input into a suitable format and memory usage. If you use a secondary storage like a Python list then you need memory to storage the list, the ndarray and all intermediate components and overheads. If you use scipy then you should look at using sparse arrays where space is only added as you need it. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] how to efficiently build an array of x, y, z points
I'm reading a file which contains a grid definition. Each cell in the grid, apart from having an i,j,k index also has 8 x,y,z coordinates. I'm reading each set of coordinates into a numpy array. I then want to add/append those coordinates to what will be my large points array. Due to the orientation/order of the 8 corners of each hexahedral cell I may have to reorder them before adding them to my large points array (not sure about that yet). Should I create a numpy array with nothing in it and then .append to it? But this is probably expensive isn't it as it creates a new copy of the array each time? Or should I create a zero or empty array of sufficient size and then put each set of 8 coordinates into the correct position in that big array? I don't know exactly how big the array will be (some cells are inactive and therefore don't have a geometry defined) but I do know what its maximum size is (ni*nj*nk,3). Thanks Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to efficiently build an array of x, y, z points
David Goldsmith wrote: On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams brennan.willi...@visualreservoir.com mailto:brennan.willi...@visualreservoir.com wrote: I'm reading a file which contains a grid definition. Each cell in the grid, apart from having an i,j,k index also has 8 x,y,z coordinates. I'm reading each set of coordinates into a numpy array. I then want to add/append those coordinates to what will be my large points array. Due to the orientation/order of the 8 corners of each hexahedral cell I may have to reorder them before adding them to my large points array (not sure about that yet). Should I create a numpy array with nothing in it and then .append to it? But this is probably expensive isn't it as it creates a new copy of the array each time? Or should I create a zero or empty array of sufficient size and then put each set of 8 coordinates into the correct position in that big array? I don't know exactly how big the array will be (some cells are inactive and therefore don't have a geometry defined) but I do know what its maximum size is (ni*nj*nk,3). Someone will correct me if I'm wrong, but this problem - the best way to build a large array whose size is not known beforehand - came up in one of the tutorials at SciPyCon '09 and IIRC the answer was, perhaps surprisingly, build the thing as a Python list (which is optimized for this kind of indeterminate sequence building) and convert to a numpy array when you're done. Isn't that what was recommended, folks? Build a list of floating point values, then convert to an array and shape accordingly? Or build a list of small arrays and then somehow convert that into a big numpy array? I've got 24 floating point values which I've got in an array of shape (8,3) but I could easily have those in a list rather than an array and then just keep appending each small list of values to a big list and then do the final conversion to the array - I'll try that and see how it goes. Brennan DG Thanks Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] problem with FortranFile
I'm using FortranFile to read a binary Fortran file. It has a bit of header data at the top of the file which I'm reading with a combination of readString and struct.unpack This is then followed by a number of lines/records, each of which has 20 double precision reals/floats. For some reason it reads the first 19 ok and then things start going wrong. To read the double precision data I'm using for il in range(0,nlines): try: darray=f.readReals('d') except: print 'problem reading well data line',il I've added print statements to my code and to fortranfile.py and the print output I get is line 19 readRecord:l= 160 readRecord:len(data_str)= 160 readRecord:check_size= 160 readReals:len(data_str)= 160 calcsizeprec= 8 num= 20 numbers (4.2843303680419922, 0.0, 0.0, 0.0, 4955.73974609375, 0.0, -1000.0, -1000.0, 0.0, 0.0, 0.0, 21.22749137878418, 0.0, 0.0, 0.0, 0.0, 0.2054678201675415, 0.0, 6386.78271484375, 6356.27001953125) il= 19 dsize= 20 darray= [ 4.28433037e+00 0.e+00 0.e+00 0.e+00 4.95573975e+03 0.e+00 -1.e+03 -1.e+03 0.e+00 0.e+00 0.e+00 2.12274914e+01 0.e+00 0.e+00 0.e+00 0.e+00 2.05467820e-01 0.e+00 6.38678271e+03 6.35627002e+03] - line 20 readRecord:l= 160 readRecord:len(data_str)= 132 problem reading well data line 20 line 19 is ok, readReals calls readRecord and the length of the data is 160 and check_size = l etc etc. line 20 is not ok, the 4-byte length value at the start of the record is 160 but the data_str=self.read(l) line only gets 132 bytes and not 160. The data file itself is ok as I've written a small Fortran program to read it. So, am I doing something wrong? I have a little test .py file and data file I can include if it helps. Regards Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] using FortranFile to read data from a binary Fortran file
Brennan Williams wrote: Hi I'm using FortranFile on 32 bit XP. The first record in the file has both 32 bit integer values and double precision float values. I've used readInts('i') to read the data into what I presume is a 32-bit integer numpy array. Items 0 and 1 of the array are the first double precision value. What's the best way to convert/cast these to get the double precision float value? I assume I need to do some sort of dpval=ival.astype('float64') so far f= FortranFile(fname,endian='') ihdr=f.readInts('i') ok I took a closer look at FortranFile and I'm now doing the following. Note that the first line in the file I'm reading has two double precision reals/floats followed by 8 32 bit integers. f=FortranFile(fullfilename,endian='') if f: hdr=f.readString() print 'hdr=',hdr print 'len=',len(hdr) t=struct.unpack('2d',hdr[0:16]) print 't=',t i=struct.unpack('8i',hdr[16:]) print 'i=',i This gives me... len=48 t=(0.0,2000.0) i=(0,0,0,5,0,0,1,213) which is correct. So is that the best way to do it, i.e. if I have a line of mixed data types, use readString and then do my own unpacking? Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] using FortranFile to read data from a binary Fortran file
Hi I'm using FortranFile on 32 bit XP. The first record in the file has both 32 bit integer values and double precision float values. I've used readInts('i') to read the data into what I presume is a 32-bit integer numpy array. Items 0 and 1 of the array are the first double precision value. What's the best way to convert/cast these to get the double precision float value? I assume I need to do some sort of dpval=ival.astype('float64') so far f= FortranFile(fname,endian='') ihdr=f.readInts('i') Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] speeding up getting a subset of a data array
Brennan Williams wrote: Hi No doubt asked many times before so apologies I'm pulling a subset array out of a data array where I have a list of the indices I want (could be an array rather than a list actually - I have it in both). Potentially the number of points and the number of times I do this can get very large so any saving in time is good. So, paraphrasing what I've currently got say I have... subsetpointerlist=[0,1,2,5,8,15,25...] subsetsize=len(subsetpointerlist) subsetarray=zeros(subsetsize,dtype=float) for index,pos in enumerate(subsetpointerlist): subsetarray[index]=dataarray[pos] How do I speed this up in numpy, i.e. by removing the for loop? It's not as simple as... subsetarray=dataarray[subsetpointerarray] is it? Do I set up some sort of a subsetpointerarray as a mask and then somehow apply that to dataarray to get the values into subsetarray? Thanks Brennan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] speeding up getting a subset of a data array
josef.p...@gmail.com wrote: On Mon, Aug 10, 2009 at 8:52 PM, Brennan Williamsbrennan.willi...@visualreservoir.com wrote: Hi No doubt asked many times before so apologies I'm pulling a subset array out of a data array where I have a list of the indices I want (could be an array rather than a list actually - I have it in both). Potentially the number of points and the number of times I do this can get very large so any saving in time is good. So, paraphrasing what I've currently got say I have... subsetpointerlist=[0,1,2,5,8,15,25...] subsetsize=len(subsetpointerlist) subsetarray=zeros(subsetsize,dtype=float) for index,pos in enumerate(subsetpointerlist): subsetarray[index]=dataarray[pos] How do I speed this up in numpy, i.e. by removing the for loop? Do I set up some sort of a subsetpointerarray as a mask and then somehow apply that to dataarray to get the values into subsetarray? Thanks Brennan looks to me like subsetarray = dataarray[subsetpointerlist] or with type conversion subsetarray = dataarray[subsetpointerlist].astype(float) Josef Thanks, with a little bit of googling/rtfm I'm getting there. Think I overdid my thinking on mask based on something else that Robert Kern helped me out with. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] npfile deprecation warning
Hi I'm using npfile which is giving me a deprecation warning. For the time being I want to continue using it but I would like to suppress the warning messages. Is it possible to trap the deprecation warning but still have the npfile go ahead? Thanks Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] npfile deprecation warning
Robert Kern wrote: On Mon, Jun 15, 2009 at 17:27, Brennan Williamsbrennan.willi...@visualreservoir.com wrote: Hi I'm using npfile which is giving me a deprecation warning. For the time being I want to continue using it but I would like to suppress the warning messages. Is it possible to trap the deprecation warning but still have the npfile go ahead? http://docs.python.org/library/warnings Thanks. OK I've put the following in my code... import warnings def fxn(): warnings.warn(deprecated, DeprecationWarning) with warnings.catch_warnings(): warnings.simplefilter(ignore) fxn() but I'm getting an invalid syntax error... with warnings.catch_warnings(): ^ SyntaxError: invalid syntax I haven't used with before. Is this supposed to go in the function def where I use npfile? I've put it near the top of my .py file after my imports and before my class definitions. btw I'm using Python 2.5.4 Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] npfile deprecation warning
Robert Kern wrote: On Mon, Jun 15, 2009 at 18:48, Brennan Williamsbrennan.willi...@visualreservoir.com wrote: Robert Kern wrote: On Mon, Jun 15, 2009 at 17:27, Brennan Williamsbrennan.willi...@visualreservoir.com wrote: Hi I'm using npfile which is giving me a deprecation warning. For the time being I want to continue using it but I would like to suppress the warning messages. Is it possible to trap the deprecation warning but still have the npfile go ahead? http://docs.python.org/library/warnings Thanks. OK I've put the following in my code... import warnings def fxn(): warnings.warn(deprecated, DeprecationWarning) with warnings.catch_warnings(): warnings.simplefilter(ignore) fxn() catch_warnings() was added in Python 2.6, as stated in the documentation. My mistake. I saw the new in 2.1 at the top of the page but didn't read all the way to the bottom where catch_warnings is documented (with new in 2.6). I recommend setting up the simplefilter in your main() function, and only for DeprecationWarnings. done and it works. Thanks. but I'm getting an invalid syntax error... with warnings.catch_warnings(): ^ SyntaxError: invalid syntax I haven't used with before. Is this supposed to go in the function def where I use npfile? I've put it near the top of my .py file after my imports and before my class definitions. You would use the with statement only around code that calls the function. btw I'm using Python 2.5.4 In Python 2.5, you need this at the top of your file (after docstrings but before any other code): from __future__ import with_statement ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] CUDA
Not a question really but just for discussion/pie-in-the-sky etc This is a news item on vizworld about getting Matlab code to run on a CUDA enabled GPU. http://www.vizworld.com/2009/05/cuda-enable-matlab-with-gpumat/ If the use of GPU's for numerical tasks takes off (has it already?) then Id be interested to know the views of the numpy experts out there. Cheers Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] replacing Nan's in a string array converted from a float array
I've created an array of strings using something like stringarray=self.karray.astype(|S8) If the array value is a Nan I get 1.#QNAN in my string array. For cosmetic reasons I'd like to change this to something else, e.g. invalid or inactive. My string array can be up to 100,000+ values. Is there a fast way to do this? Thanks Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] replacing Nan's in a string array converted from a float array
Charles R Harris wrote: On Thu, May 7, 2009 at 5:36 PM, Brennan Williams brennan.willi...@visualreservoir.com mailto:brennan.willi...@visualreservoir.com wrote: I've created an array of strings using something like stringarray=self.karray.astype(|S8) If the array value is a Nan I get 1.#QNAN in my string array. For cosmetic reasons I'd like to change this to something else, e.g. invalid or inactive. My string array can be up to 100,000+ values. Is there a fast way to do this? I think this is a bug. Making the printing of nans uniform was one of the goals of numpy 1.3, although a few bits were unfixable. However, this looks fixable. If you are using 1.3 please open a ticket and note the OS and numpy version. ok looks like numpy 1.3.0rc1 on winxp Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] recommendations on goodness of fit functions?
Hi numpy/scipy users, I'm looking to add some basic goodness-of-fit functions/plots to my app. I have a set of simulated y vs time data and a set of observed y vs time data. The time values aren't always the same, i.e. there are often fewer observed data points. Some variables will be in a 0.0...1.0 range, others in a 0.0.1.0e+12 range. I'm also hoping to update the calculated goodness of fit value at each simulated timestep, the idea being to allow the user to set a tolerance level which if exceeded stops the simulation (which otherwise can keep running for many hours/days). Thanks Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] recommendations on goodness of fit functions?
Charles R Harris wrote: On Mon, Apr 13, 2009 at 6:28 PM, Brennan Williams brennan.willi...@visualreservoir.com mailto:brennan.willi...@visualreservoir.com wrote: Hi numpy/scipy users, I'm looking to add some basic goodness-of-fit functions/plots to my app. I have a set of simulated y vs time data and a set of observed y vs time data. The time values aren't always the same, i.e. there are often fewer observed data points. Some variables will be in a 0.0...1.0 range, others in a 0.0.1.0e+12 range. I'm also hoping to update the calculated goodness of fit value at each simulated timestep, the idea being to allow the user to set a tolerance level which if exceeded stops the simulation (which otherwise can keep running for many hours/days). Before I try and answer the following, attached is an example of a suggested GOF function. Some questions. 1) What kind of fit are you doing? 2) What is the measurement model? 3) What do you know apriori about the measurement errors? 4) How is the time series modeled? The simulated data are output by a oil reservoir simulator. Time series is typically monthly or annual timesteps over anything from 5-30 years but it could also be in some cases 10 minute timesteps over 24 hours The timesteps output by the simulator are controlled by the user and are not always even, e.g. for a simulation over 30 years you may have annual timesteps from year 0 to year 25 and then 3 monthly from year 26-29 and then monthly for the most recent year. Not sure about measurement errors - the older the data the higher the errors due to changes in oil field measurement technology. And the error range varies depending on the data type as well, e.g. error range for a water meter is likely to be higher than that for an oil or gas meter. I'll try and find out more about that. Brennan Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion inline: goodnessoffit.png___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] trying to speed up the following....
Robert Kern wrote: On Wed, Mar 25, 2009 at 00:09, Brennan Williams brennan.willi...@visualreservoir.com wrote: Robert Kern wrote: On Tue, Mar 24, 2009 at 18:29, Brennan Williams brennan.willi...@visualreservoir.com wrote: I have an array (porvatt.yarray) of ni*nj*nk values. I want to create two further arrays. activeatt.yarray is of size ni*nj*nk and is a pointer array to an active cell number. If a cell is inactive then its activeatt.yarray value will be 0 ijkatt.yarray is of size nactive, the number of active cells (which I already know). ijkatt.yarray holds the ijk cell number for each active cell. My code looks something like... activeatt.yarray=zeros(ncells,dtype=int) ijkatt.yarray=zeros(nactivecells,dtype=int) iactive=-1 ni=currentgrid.ni nj=currentgrid.nj nk=currentgrid.nk for ijk in range(0,ni*nj*nk): if porvatt.yarray[ijk]0: iactive+=1 activeatt.yarray[ijk]=iactive ijkatt.yarray[iactive]=ijk I may often have a million+ cells. So the code above is slow. How can I speed it up? mask = (porvatt.yarray.flat 0) ijkatt.yarray = np.nonzero(mask) # This is not what your code does, but what I think you want. # Where porvatt.yarray is inactive, activeatt.yarray is -1. # 0 might be an active cell. activeatt.yarray = np.empty(ncells, dtype=int) activeatt.yarray.fill(-1) activeatt.yarray[mask] = ijkatt.yarray Thanks. Concise fast. This is what I've got so far (minor mods from the above) from numpy import * ... mask=porvatt.yarray0.0 ijkatt.yarray=nonzero(mask)[0] activeindices=arange(0,ijkatt.yarray.size) activeatt.yarray = empty(ncells, dtype=int) activeatt.yarray.fill(-1) activeatt.yarray[mask] = activeindices I have... ijkatt.yarray=nonzero(mask)[0] because it looks like nonzero returns a tuple of arrays rather than an array. Yes. Apologies. Apology accepted. Don't do it again. On a more serious note, it is clear that, as expected, operating on elements of an array inside a Python for loop is slow for large arrays. Soon I will be writing an import interface to read corner point grid geometries and I'm currently looking at vtk unstructured grids etc. Most of the numpy vectorization is aimed at relatively simply structured arrays on the basis that you'll never meet everyone's needs/data structures. So I presume that if I find I have a bottleneck in my code which looks specific to my data structures I should then look at offloading that to C or Fortran? (assuming I can't find it in numpy or scipy). I'm already doing this to read in data from Fortran binary files although I actually decided to code it in C rather than use Fortran. I used activeindices=arange(0,ijkatt.yarray.size) and activeatt.yarray[mask] = activeindices Yes. You are correct. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] trying to speed up the following....
I have an array (porvatt.yarray) of ni*nj*nk values. I want to create two further arrays. activeatt.yarray is of size ni*nj*nk and is a pointer array to an active cell number. If a cell is inactive then its activeatt.yarray value will be 0 ijkatt.yarray is of size nactive, the number of active cells (which I already know). ijkatt.yarray holds the ijk cell number for each active cell. My code looks something like... activeatt.yarray=zeros(ncells,dtype=int) ijkatt.yarray=zeros(nactivecells,dtype=int) iactive=-1 ni=currentgrid.ni nj=currentgrid.nj nk=currentgrid.nk for ijk in range(0,ni*nj*nk): if porvatt.yarray[ijk]0: iactive+=1 activeatt.yarray[ijk]=iactive ijkatt.yarray[iactive]=ijk I may often have a million+ cells. So the code above is slow. How can I speed it up? TIA Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] trying to speed up the following....
Robert Kern wrote: On Tue, Mar 24, 2009 at 18:29, Brennan Williams brennan.willi...@visualreservoir.com wrote: I have an array (porvatt.yarray) of ni*nj*nk values. I want to create two further arrays. activeatt.yarray is of size ni*nj*nk and is a pointer array to an active cell number. If a cell is inactive then its activeatt.yarray value will be 0 ijkatt.yarray is of size nactive, the number of active cells (which I already know). ijkatt.yarray holds the ijk cell number for each active cell. My code looks something like... activeatt.yarray=zeros(ncells,dtype=int) ijkatt.yarray=zeros(nactivecells,dtype=int) iactive=-1 ni=currentgrid.ni nj=currentgrid.nj nk=currentgrid.nk for ijk in range(0,ni*nj*nk): if porvatt.yarray[ijk]0: iactive+=1 activeatt.yarray[ijk]=iactive ijkatt.yarray[iactive]=ijk I may often have a million+ cells. So the code above is slow. How can I speed it up? mask = (porvatt.yarray.flat 0) ijkatt.yarray = np.nonzero(mask) # This is not what your code does, but what I think you want. # Where porvatt.yarray is inactive, activeatt.yarray is -1. # 0 might be an active cell. activeatt.yarray = np.empty(ncells, dtype=int) activeatt.yarray.fill(-1) activeatt.yarray[mask] = ijkatt.yarray Thanks. Concise fast. This is what I've got so far (minor mods from the above) from numpy import * ... mask=porvatt.yarray0.0 ijkatt.yarray=nonzero(mask)[0] activeindices=arange(0,ijkatt.yarray.size) activeatt.yarray = empty(ncells, dtype=int) activeatt.yarray.fill(-1) activeatt.yarray[mask] = activeindices I have... ijkatt.yarray=nonzero(mask)[0] because it looks like nonzero returns a tuple of arrays rather than an array. I used activeindices=arange(0,ijkatt.yarray.size) and activeatt.yarray[mask] = activeindices as I have 686000 cells of which 129881 are 'active' so my activeatt.yarray values range from -1 for inactive through 0 for the first active cell up to 129880 for the last active cell. About to test it out by replacing my old for loop. Looks like it will be about 20x faster for 1m cells. Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] populating an array
Ok... I'm using Traits and numpy. I have a 3D grid with directions I,J and K. I have NI,NJ,NK cells in the I,J,K directions so I have NI*NJ*NK cells overall. I have data arrays with a value for each cell in the grid. I'm going to store this as a 1D array, i.e. 1ncells where ncells=NI*NJ*NK rather than as a 3D array Apart from lots of other data arrays that will be read in from external files, I want to create I, J and K data arrays where the 'I' array contains the I index for the cell, the 'J' array the J index etc. I haven't used numpy extensively and I'm coming from a Fortran/C background so I'm hideously inefficient. At the moment I have something like self.iarray=zeros(self.ncells,dtype=int) for icell in arange(1,self.ncells+2): #icell=i+(j-1)*ni+(k-1)*ni*nj i,j,k=rn2ijk(icell,self.ni,self.nj) self.yarray[icell]=i rn2ijk is defined as... def rn2ijk(n,ni,nj): nij=ni*nj k=(n+nij-1)/nij ij=n-(k-1)*nij j=(ij+ni-1)/ni i=ij-(j-1)*ni return i,j,k Obviously I can improve my use of rn2ijk as for a start I'm recalculating nij for every cell (and I can have anything from a few thousand to a few million cells). But I'm sure I could do this more efficiently by probably starting off with a 3d array, looping over i,j,k and then reshaping it into a 1d array. Ideas? Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] astype
Nils Wagner wrote: On Sat, 14 Feb 2009 17:22:43 +0100 Nils Wagner nwag...@iam.uni-stuttgart.de wrote: Hi all, How can I convert an array with string elements to an array with float entries ? coord_info[:,1] array(['0,0', '100,0', '200,0', '300,0', '400,0', '500,0', '600,0', '700,0', '800,0', '0.0', '100.0', '200.0', '300.0', '400.0', '500.0', '600.0', '700.0', '800.0', '0.0', '100.0', '200.0', '300.0', '400.0', '500.0', '600.0', '700.0', '800.0', '0.0', '100.0', '200.0', '300.0', '400.0', '500.0', '600.0', '700.0', '800.0', '0.0', '100.0', '200.0', '300.0', '400.0', '500.0', '600.0', '700.0', '800.0'], dtype='|S50') coord_info[:,1].astype(float) Traceback (most recent call last): File stdin, line 1, in module ValueError: invalid literal for float(): 0,0 Nils ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion Sorry for the noise - dots and commas were mixed up in the input file. Mind you, it raises an interesting point. What if the string was an amount in currency, e.g. '$1,000.00' or perhaps just '1,000.00' etc. How would one convert a string array in that format to a float array? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checksum on numpy float array
OK so maybe I should (1) not add some sort of checksum type functionality to my read/write methods these read/write methods simply read/write numpy arrays to a binary file which contains one or more numpy arrays (and nothing else). (2) replace my binary files iwith either HDF5 or PyTables But my app is being used by clients on existing projects - in one case there are over 900 of these numpy binary files in just one project, albeit each file is pretty small (200KB or so) so.. questions. How can I tranparently (or at least with minimum user-pain) replace my existing read/write methods with PyTables or HDF5? My initial thoughts are... (a) have an app version number and a data format version number which i can check against. (b) if data format version 1.0 then read from old binary files (c) if app version number 1.0 then write to new PyTables or HDF5 files (d) get clients to open existing project and then save existing project to semi-transparently convert from old to new formats. Francesc Alted wrote: A Friday 05 December 2008, Andrew Collette escrigué: Another possibility would be to use HDF5 as a data container. It supports the fletcher32 filter [1] which basically computes a chuksum for evey data chunk written to disk and then always check that the data read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't complain, you are basically safe. There are at least two usable HDF5 interfaces for Python and NumPy: PyTables[2] and h5py [3]. PyTables does have support for that right out-of-the-box. Not sure about h5py though (a quick search in docs doesn't reveal nothing). [1] http://rfc.sunsite.dk/rfc/rfc1071.html [2] http://www.pytables.org [3] http://h5py.alfven.org Hope it helps, Just to confirm that h5py does in fact have fletcher32; it's one of the options you can specify when creating a dataset, although it could use better documentation: http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create _dataset My bad. I've searched for 'fletcher' instead of 'fletcher32'. I naively thought that the search tool in Sphinx allowed for partial name finding. In fact, it is a pity it does not. Cheers, ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] checksum on numpy float array
My app reads in one or more float arrays from a binary file. Sometimes due to network timeouts etc the array is not read correctly. What would be the best way of checking the validity of the data? Would some sort of checksum approach be a good idea? Would that work with an array of floating point values? Or are checksums more for int,byte,string type data? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checksum on numpy float array
[EMAIL PROTECTED] wrote: On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams [EMAIL PROTECTED] wrote: My app reads in one or more float arrays from a binary file. Sometimes due to network timeouts etc the array is not read correctly. What would be the best way of checking the validity of the data? Would some sort of checksum approach be a good idea? Would that work with an array of floating point values? Or are checksums more for int,byte,string type data? If you want to verify the file itself, then python provides several more or less secure checksums, my experience was that zlib.crc32 was pretty fast on moderate file sizes. crc32 is common inside archive files and for binary newsgroups. If you have large files transported over the network, e.g. GB size, I would work with par2 repair files, which verifies and repairs at the same time. The file has multiple arrays stored in it. So I want to have some sort of validity check on just the array that I'm reading. I will need to add a check on the file as well as of course network problems could affect writing to the file as well as reading from the file. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checksum on numpy float array
Thanks [EMAIL PROTECTED] wrote: I didn't check what this does behind the scenes, but try this import hashlib #standard python library import numpy as np m = hashlib.md5() m.update(np.array(range(100))) m.update(np.array(range(200))) m2 = hashlib.md5() m2.update(np.array(range(100))) m2.update(np.array(range(200))) print m.hexdigest() print m2.hexdigest() assert m.hexdigest() == m2.hexdigest() m3 = hashlib.md5() m3.update(np.array(range(100))) m3.update(np.array(range(199))) print m3.hexdigest() assert m.hexdigest() == m3.hexdigest() Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checksum on numpy float array
Robert Kern wrote: On Thu, Dec 4, 2008 at 18:54, Brennan Williams [EMAIL PROTECTED] wrote: Thanks [EMAIL PROTECTED] wrote: I didn't check what this does behind the scenes, but try this import hashlib #standard python library import numpy as np m = hashlib.md5() m.update(np.array(range(100))) m.update(np.array(range(200))) I would recommend doing this on the strings before you make arrays from them. You don't know if the network cut out in the middle of an 8-byte double. Of course, sending the lengths and other metadata first, then the data would let you check without needing to do expensivish hashes or checksums. If truncation is your problem rather than corruption, then that would be sufficient. You may also consider using the NPY format in numpy 1.2 to implement that. Thanks for the ideas. I'm definitely going to add some more basic checks on lengths etc as well. Unfortunately the problem is happening at a client site so (a) I can't reproduce it and (b) most of the time they can't reproduce it either. This is a Windows Python app running on Citrix reading/writing data to a Linux networked drive. Brennan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion