from:"Brennan Williams"

Re: [Numpy-discussion] Numpy logo in VTK

2012-06-27 Thread Brennan Williams


You're now reminding me of the old spinning SGI logo

http://www.youtube.com/watch?v=Nqf6TjE49N8

Brennan

On 27/06/2012 10:40 a.m., klo uo wrote:

I continued in this mpl trip, with small animation sequence:


# animation
ax.view_init(90,-90)
plt.ion()
plt.draw()
plt.show()

for l in arange(25):
 ax.set_xlim3d(1.5-.1*l,2.5+.1*l)
 ax.set_ylim3d(1.5-.1*l,2.5+.1*l)
 ax.view_init(90-3*l, -90+l)
 plt.draw()

plt.title(NumPy)
plt.ioff()
plt.show()


Try it or check it out on YouTube: www.youtube.com/watch?v=mpYPS_zXAFw

Whole script in attachment


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Statistical distributions on samples

2011-08-14 Thread Brennan Williams

You can use scipy.stats.truncnorm, can't you? Unless I misread, you want 
to sample a normal distribution but with generated values only being 
within a specified range? However you also say you want to do this with 
triangular and log normal and for these I presume the easiest way is to 
sample and then accept/reject.


Brennan

On 13/08/2011 2:53 a.m., Christopher Jordan-Squire wrote:

Hi Andrea--An easy way to get something like this would be

import numpy as np
import scipy.stats as stats

sigma = #some reasonable standard deviation for your application
x = stats.norm.rvs(size=1000, loc=125, scale=sigma)
x = x[x50]
x = x[x200]

That will give a roughly normal distribution to your velocities, as 
long as, say, sigma25. (I'm using the rule of thumb for the normal 
distribution that normal random samples lie 3 standard deviations away 
from the mean about 1 out of 350 times.) Though you won't be able to 
get exactly normal errors about your mean since normal random samples 
can theoretically be of any size.


You can use this same process for any other distribution, as long as 
you've chosen a scale variable so that the probability of samples 
being outside your desired interval is really small. Of course, once 
again your random errors won't be exactly from the distribution you 
get your original samples from.


-Chris JS

On Fri, Aug 12, 2011 at 8:32 AM, Andrea Gavana 
andrea.gav...@gmail.com mailto:andrea.gav...@gmail.com wrote:


Hi All,

I am working on something that appeared to be a no-brainer
issue (at the beginning), by my complete ignorance in statistics
is overwhelming and I got stuck.

What I am trying to do can be summarized as follows

Let's assume that I have to generate a sample of a 1,000 values
for a variable (let's say, velocity) using a normal distribution
(but later I will have to do it with log-normal, triangular and a
couple of others). The only thing I know about this velocity
sample is the minimum and maximum values (let's say 50 and 200
respectively) and, obviously for the normal distribution (but not
so for the other distributions), the mean value (125 in this case).

Now, I would like to generate this sample of 1,000 points, in
which none of the point has velocity smaller than 50 or bigger
than 200, and the number of samples close to the mean (125) should
be higher than the number of samples close to the minimum and the
maximum, following some kind of normal distribution.

What I have tried up to now is summarized in the code below, but
as you can easily see, I don't really know what I am doing. I am
open to every suggestion, and I apologize for the dumbness of my
question.

import numpy

from scipy import stats
import matplotlib.pyplot as plt

minval, maxval = 50.0, 250.0
x = numpy.linspace(minval, maxval, 500)

samp = stats.norm.rvs(size=len(x))
pdf = stats.norm.pdf(x)
cdf = stats.norm.cdf(x)
ppf = stats.norm.ppf(x)

ax1 = plt.subplot(2, 2, 1)
ax1.plot(range(len(x)), samp)

ax2 = plt.subplot(2, 2, 2)
ax2.plot(x, pdf)

ax3 = plt.subplot(2, 2, 3)
ax3.plot(x, cdf)

ax4 = plt.subplot(2, 2, 4)
ax4.plot(x, ppf)

plt.show()


Andrea.

Imagination Is The Only Weapon In The War Against Reality.
http://xoomer.alice.it/infinity77/


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Developer NumPy list versus User NumPy list

2011-01-27 Thread Brennan Williams

On 28/01/2011 1:07 p.m., Sturla Molden wrote:
 Den 28.01.2011 00:23, skrev Robert Kern:
 We've resisted it for years. I don't think the split has done scipy
 much good.
 The scope of NumPy is narrower development-wise and wider user-wise.
 While SciPy does not benefit, as use and development are still quite
 entangled, this is not be the case for NumPy.

 Perhaps we could split the NumPy list and merge the SciPy lists?
As a user of NumPy and SciPy (and there are probably a lot of people who 
use both) why not have one developer list for both NumPy and SciPy and 
one user list for both NumPy and SciPy? It might not work from a 
developer point of view but I think it does from a user point of view - 
mind you I just put everything of interest from both mailing lists into 
one folder.

Brennan

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] quadratic function

2010-10-28 Thread Brennan Williams

  I have used both linear least squares and radial basis functions as a 
proxy equation, calculated from the results of computer simulations 
which are calculating some objective function value based on a number of 
varied input parameters.

As an alternative option I want to add a quadratic function so if there 
are parameters/variables x,y,z then rather than just having a linear 
function f=a+bx+cy+dz I'll have f=a+bx+cx**2 + dxy +  I'd like to 
have the option not to include all the different second order terms.

Where should I start looking?

Thanks

Brennan


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] quadratic function

2010-10-28 Thread Brennan Williams

  On 29/10/2010 2:34 a.m., Robert Kern wrote:
 On Thu, Oct 28, 2010 at 06:38, Brennan Williams
 brennan.willi...@visualreservoir.com  wrote:
   I have used both linear least squares and radial basis functions as a
 proxy equation, calculated from the results of computer simulations
 which are calculating some objective function value based on a number of
 varied input parameters.

 As an alternative option I want to add a quadratic function so if there
 are parameters/variables x,y,z then rather than just having a linear
 function f=a+bx+cy+dz I'll have f=a+bx+cx**2 + dxy +  I'd like to
 have the option not to include all the different second order terms.
 A = np.column_stack([
  np.ones_like(x),
  x, y, z,
  x*x, y*y, z*z,
  x*y, y*z, x*z,
 ])
 x, res, rank, s = np.linalg.lstsq(A, f)

OK, so in other words, you can use linalg.lstsq for whatever higher 
order terms you want to include or exclude. Very nice. Thanks.

On a related topic I also use the Rbf radial basis function as a proxy 
equation. I have one set of data that it fails to return an Rbf for and 
I've just realised that in my set of simulations that are used to build 
the proxy equation I have some duplicate equations. I'm wondering if Rbf 
doesn't like duplicate points? It obviously doesn't affect linalg.lstsq.

Brennan


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] quadratic function

2010-10-28 Thread Brennan Williams

  On 29/10/2010 6:35 a.m., Robert Kern wrote:
 On Thu, Oct 28, 2010 at 12:33, Brennan Williams
 brennan.willi...@visualreservoir.com  wrote:
   On 29/10/2010 2:34 a.m., Robert Kern wrote:
 On Thu, Oct 28, 2010 at 06:38, Brennan Williams
 brennan.willi...@visualreservoir.comwrote:
I have used both linear least squares and radial basis functions as a
 proxy equation, calculated from the results of computer simulations
 which are calculating some objective function value based on a number of
 varied input parameters.

 As an alternative option I want to add a quadratic function so if there
 are parameters/variables x,y,z then rather than just having a linear
 function f=a+bx+cy+dz I'll have f=a+bx+cx**2 + dxy +  I'd like to
 have the option not to include all the different second order terms.
 A = np.column_stack([
   np.ones_like(x),
   x, y, z,
   x*x, y*y, z*z,
   x*y, y*z, x*z,
 ])
 x, res, rank, s = np.linalg.lstsq(A, f)

 OK, so in other words, you can use linalg.lstsq for whatever higher
 order terms you want to include or exclude. Very nice. Thanks.
 Right. Just as long as the problem is linear in the coefficients, the
 design matrix can be derived however you like.

So I could optionally put log terms in if I thought it was linear in 
log(x) for example?
 On a related topic I also use the Rbf radial basis function as a proxy
 equation. I have one set of data that it fails to return an Rbf for and
 I've just realised that in my set of simulations that are used to build
 the proxy equation I have some duplicate equations. I'm wondering if Rbf
 doesn't like duplicate points? It obviously doesn't affect linalg.lstsq.
 Rbf doesn't like duplicate points. :-)
OK, fair enough, I just need to add a bit of housekeeping to remove 
duplicate simulations.
Thanks for the confirmation.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Interpolation question

2010-03-29 Thread Brennan Williams

Andrea Gavana wrote:
 Hi Chris and All,

 On 29 March 2010 22:35, Christopher Barker wrote:
   
 Andrea Gavana wrote:
 
 Scaling each axis by its standard deviation is a typical first start.
 Shifting and scaling the values such that they each go from 0 to 1 is
 another useful thing to try.
   
 Ah, magnifico! Thank you Robert and Friedrich, it seems to be working
 now...
 
 One other thought -- core to much engineering is dimensional analysis --
 you know how we like those non-dimensional number!

 I think this situation is less critical, as you are interpolating, not
 optimizing or something, but many interpolation methods are built on the
 idea of some data points being closer than others to your point of interest.

 Who is to say if a point that is 2 hours away is closer or father than
 one 2 meters away? This is essentially what you are doing.

 Scaling everything to the same range is a start, but then you've still
 given them an implicit weighting.

 An alternative to to figure out a way to non-dimensionalize your
 parameters -- that *may* give you a more physically based scaling.

 And you might invent the Gavana Number in the process ;-)
 

 Might be :-D . At the moment I am pretty content with what I have got,
 it seems to be working fairly well although I didn't examine all the
 possible cases and it is very likely that my little tool will break
 disastrously for some combinations of parameters. However, I am not
 sure I am allowed to post an image comparing the real simulation
 with the prediction of the interpolated proxy model, but if you could
 see it, you would surely agree that it is a very reasonable approach.
 It seems to good to be true :-D .

 Again, this is mainly due to the fact that we have a very extensive
 set of simulations which cover a wide range of combinations of
 parameters, so the interpolation itself is only doing its job
 correctly. I don't think the technique can be applied blindly to
 whatever oil/gas/condensate /whatever reservoir, as non-linearities in
 fluid flow simulations appear where you least expect them, but it
 seems to be working ok (up to now) for our field (which is, by the
 way, one of the most complex and less understood condensate reservoir
 out there).

   
And of course that proxy simulator only deals with the input variables 
that you decided on 1,000+ simulations ago.
All you need is for someone to suggest something else like how about 
gas injection? and you're back to having to do
more real simulation runs (which is where a good experimental design 
comes in).

It would be interesting to know how well your proxy simulator compares 
to the real simulator for a combination of input variable values that
is a good distance outside your original parameter space.

Brennan

 Andrea.

 Imagination Is The Only Weapon In The War Against Reality.
 http://xoomer.alice.it/infinity77/

 == Never *EVER* use RemovalGroup for your house removal. You'll
 regret it forever.
 http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html ==
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

   


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Interpolation question

2010-03-28 Thread Brennan Williams

Andrea Gavana wrote:
 HI All,

 On 28 March 2010 19:22, Robert Kern wrote:
   
 On Sun, Mar 28, 2010 at 03:26, Anne Archibald peridot.face...@gmail.com 
 wrote:
 
 On 27 March 2010 20:24, Andrea Gavana andrea.gav...@gmail.com wrote:
   
 Hi All,

I have an interpolation problem and I am having some difficulties
 in tackling it. I hope I can explain myself clearly enough.

 Basically, I have a whole bunch of 3D fluid flow simulations (close to
 1000), and they are a result of different combinations of parameters.
 I was planning to use the Radial Basis Functions in scipy, but for the
 moment let's assume, to simplify things, that I am dealing only with
 one parameter (x). In 1000 simulations, this parameter x has 1000
 values, obviously. The problem is, the outcome of every single
 simulation is a vector of oil production over time (let's say 40
 values per simulation, one per year), and I would like to be able to
 interpolate my x parameter (1000 values) against all the simulations
 (1000x40) and get an approximating function that, given another x
 parameter (of size 1x1) will give me back an interpolated production
 profile (of size 1x40).
 
 If I understand your problem correctly, you have a function taking one
 value as input (or one 3D vector) and returning a vector of length 40.
 You want to know whether there are tools in scipy to support this.

 I'll say first that it's not strictly necessary for there to be: you
 could always just build 40 different interpolators, one for each
 component of the output. After all, there's no interaction in the
 calculations between the output coordinates. This is of course
 awkward, in that you'd like to just call F(x) and get back a vector of
 length 40, but that can be remedied by writing a short wrapper
 function that simply calls all 40 interpolators.

 A problem that may be more serious is that the python loop over the 40
 interpolators can be slow, while a C implementation would give
 vector-valued results rapidly.
   
 40 is not a bad number to loop over. The thing I would be worried
 about is the repeated calculation of the (1000, 1000) radial function
 evaluation. I think that a small modification of Rbf could be made to
 handle the vector-valued case. I leave that as an exercise to Andrea,
 of course. :-)
 

 It seems like this whole interpolation stuff is not working as I
 thought. In particular, considering scalar-valued interpolation (i.e.,
 looking at the final oil recovery only and not the time-based oil
 production profile), interpolation with RBFs is giving
 counter-intuitive and meaningless answers. The issues I am seeing are
 basically these:

 # Interpolate the cumulative oil production
 rbf = Rbf(x1, x2, x3, x4, x5, x6, final_oil_recovery)

 # Try to use existing combination of parameters to get back
 # the original result (more or less)
 possible_recovery = rbf(x1[10], x2[10], x3[10], x4[10], x5[10], x6[10])

 1) If I attempt to use the resulting interpolation function (rbf),
 inputting a single value for each x1, x2, ..., x6 that is *already
 present* in the original x1, x2, ..., x6 vectors, I get meaningless
 results (i.e., negative oil productions, 1000% error, and so on) in
 all cases with some (rare) exceptions when using the thin-plate
 interpolation option;
 2) Using a combination of parameters x1, x2, ..., x6 that is *not* in
 the original set, I get non-physical (i.e., unrealistic) results: it
 appears that RBFs think that if I increase the number of production
 wells I get less oil recovered (!), sometimes by a factor of 10.

 I am pretty sure I am missing something in this whole interpolation
 thing (I am no expert at all), but honestly it is the first time I get
 such counter-intuitive results in my entire Python numerical life ;-)

   
Hi Andrea, I'm looking at doing this as well, i.e. using RBF to create 
what is effectively a proxy simulator.
I'm going to have a read through the scipy rbf documentation to get up 
to speed. I take it you are using
an experimental design algorithm such as Plackett-Burman or similar?

Brennan
 Andrea.

 Imagination Is The Only Weapon In The War Against Reality.
 http://xoomer.alice.it/infinity77/

 == Never *EVER* use RemovalGroup for your house removal. You'll
 regret it forever.
 http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html ==
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

   


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Interpolation question

2010-03-28 Thread Brennan Williams

Andrea Gavana wrote:
 Hi All,

 On 28 March 2010 22:14, Pierre GM wrote:
   
 On Mar 28, 2010, at 4:47 PM, Andrea Gavana wrote:
 
 HI All,

 On 28 March 2010 19:22, Robert Kern wrote:
   
 On Sun, Mar 28, 2010 at 03:26, Anne Archibald peridot.face...@gmail.com 
 wrote:
 
 On 27 March 2010 20:24, Andrea Gavana andrea.gav...@gmail.com wrote:
   
 Hi All,

I have an interpolation problem and I am having some difficulties
 in tackling it. I hope I can explain myself clearly enough.

 Basically, I have a whole bunch of 3D fluid flow simulations (close to
 1000), and they are a result of different combinations of parameters.
 
 It seems like this whole interpolation stuff is not working as I
 thought. In particular, considering scalar-valued interpolation (i.e.,
 looking at the final oil recovery only and not the time-based oil
 production profile), interpolation with RBFs is giving
 counter-intuitive and meaningless answers. The issues I am seeing are
 basically these:
   
 Which is hardly surprising: you're working with a physical process, you must 
 have some constraints on your parameters (whether dependence between 
 parameters, bounds on the estimates...) that are not taken into account by 
 the interpolation scheme you're using. So, it's back to the drawing board.
 

 Sorry, this laptop has gone mad and sent the message before I was finished...

 The curious thing is, when using the rbf interpolated function to find
 a new approximation, I am not giving RBFs input values that are
 outside the bounds of the existing parameters. Either they are exactly
 the same as the input ones (for a single simulation), or they are
 slightly different but always inside the bounds. I always thought
 that, at least for the same input values, it should give
 (approximatively) the same answer as the real one.

   
 What are you actually trying to achieve ? Find the best estimates of your 10 
 parameters to match an observed production timeline ? Find a space for your 
 10 parameters that gives some realistic production ?
 

 I have this bunch of simulations for possible future oil productions
 (forecasts), run over the past 4 years, which have different
 combination of number of oil producing wells, gas injection wells, oil
 plateau values, gas injection plateau values and gas production
 plateau values (6 parameters in total, not 10). Now, the combinations
 themselves are pretty nicely spread, meaning that I seem to have a
 solid base to build an interpolator that will give me a reasonable
 answer when I ask it for parameters inside the bounds (i.e., no
 extrapolation involved).

   
 Assuming that your 10 parameters are actually independent, did you run 
 1000**10 simulations to test all the possible combinations?
 

 I hope you're kidding :-D . Each of these simulations take from 7 to
 12 hours, not counting the huge amount of time needed to build their
 data files by hand...

 Let's see a couple of practical examples (I can share the data if
 someone is interested).

   
Definitely interested in helping solve this one so feel free to email 
the data (obviously not 1,000 Eclipse smspec and unsmry files although I 
can handle that without any problems).
 Example 1

 # o2 and o3 are the number of production wells, split into 2
 # different categories
 # inj is the number of injection wells
 # fomts is the final oil recovery

 rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

 op = [50380]
 gp = [103014000]
 gi = [53151000]
 o2w = [45]
 o3w = [20]
 inw = [15]

   
I take it all your inputs are float arrays? i.e. you just consider 
number of production wells as a floating point value.
I suppose strictly speaking o2 and o3 are not independent of eachother 
but I would guess that for the purposes of using Rbf it wouldn't be an 
issue.

I'm just having a look at the cookbook example 
(http://www.scipy.org/Cookbook/RadialBasisFunctions) and adding a couple 
more variables, bumping up the number of points to 1000 and setting the 
z values to be up around 1.0e+09 - so far it still seems to work 
although if you increase the sample points to 2000+ it falls over with a 
memory error.

I've got a smaller bunch of Eclipse simulations I could try Rbf out on - 
19 runs with 9 variables (it's a Tornado algorithm with -1,0,+1 values 
for each variable). FOPT's will be in a similar range ot yours.

Brennan
 fi = rbf(op, gp, gi, o2w, o3w, inw)

 # I = KNOW = the answer to be close to +3.5e8

 print fi

 [ -1.00663296e+08]

 (yeah right...)


 Example 2

 Changing o2w from 45 to 25 (again, the answer should be close to 3e8,
 less wells = less production)

 fi = rbf(op, gp, gi, o2w, o3w, inw)

 print fi

 [  1.30023424e+08]

 And keep in mind, that nowhere I have such low values of oil recovery
 in my data... the lowest one are close to 2.8e8...

 Andrea.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org

Re: [Numpy-discussion] Interpolation question

2010-03-28 Thread Brennan Williams

Andrea Gavana wrote:
Hi Friedrich All,

On 28 March 2010 23:51, Friedrich Romstedt wrote:

2010/3/28 Andrea Gavana andrea.gav...@gmail.com:

Example 1

# o2 and o3 are the number of production wells, split into 2
# different categories
# inj is the number of injection wells
# fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380]
gp = [103014000]
gi = [53151000]
o2w = [45]
o3w = [20]
inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I = KNOW = the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8,
less wells = less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery
in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ...

What I see from
http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.Rbf.html#scipy.interpolate.Rbf
are three things:

1. Rbf uses some weighting based on the radial functions.
2. Rbf results go through the nodal points without *smooth* set to
some value != 0
3. Rbf is isotropic

(3.) is most important. I see from your e-mail that the values you
pass in to Rbf are of very different order of magnitude. But the
default norm used in Rbf is for sure isotropic, i.e., it will result
in strange and useless mean distances in R^N where there are N
parameters. You have to either pass in a *norm* which weights the
coords according to their extent, or to scale the data such that the
aspect ratios of the hypecube's edges are sensible.

You can imagine it as the follwing ascii plot:

* * *
* * * *

the x dimension is say the gas plateau dimension (~1e10), the y
dimension is the year dimension (~1e1). In an isotropic plot, using
the data lims and aspect = 1, they may be well homogenous, but on this
scaling, as used by Rbf, they are lumped. I don't know if it's clear
what I mean from my description?

Are the corresponding parameter values spread completely randomly, or
is there always varied only one axis?

If random, you could try a fractal analysis to find out the optimal
scaling of the axes for N-d coverage of N-d space. In the case above,
is is something between points and lines, I guess. When scaling it
properly, it becomes a plane-like coveage of the xy plane. When
scaling further, it will become points again (lumped together). So
you can find an optimum. There are quantitative methods to obtain the
fractal dimension, and they are quite simple.

I believe I need a technical dictionary to properly understand all
that... :-D . Sorry, I am no expert at all, really, just an amateur
with some imagination, but your suggestion about the different
magnitude of the matrix is a very interesting one. Although I have
absolutely no idea on how to re-scale them properly to avoid RBFs
going crazy.

As for your question, the parameter are not spread completely
randomly, as this is a collection of simulations done over the years,
trying manually different scenarios, without having in mind a proper
experimental design or any other technique. Nor the parameter values
vary only on one axis in each simulation (few of them are like that).

I assume that there is a default norm that calculates the distance
between points irrespective of the order of the input coordinates?

So if that isn't working, leading to the spurious results, the next step
is to normalise all the inputs so they are in the same range, e.g
max-min=1.0

On a related note, what approach would be best if one of the input
parameters wasn't continuous? e.g. I have three quite different
geological distributions called say A,B and C.
SO some of my simulations use distribution A, some use B and some use C.
I could assign them the numbers 1,2,3 but a value of 1.5 is meaningless.

Andrea, if you have 1TB of data for 1,000 simulation runs, then, if I
assume you only mean the smspec/unsmry files, that means each of your
summary files is 1GB in size?
Are those o2w,o3w and inw figures the number of new wells only or
existing+new? It's fun dealing with this amount of data isn't it?

Brennan
Andrea.

Imagination Is The Only Weapon In The War Against Reality.
http://xoomer.alice.it/infinity77/

== Never *EVER* use RemovalGroup for your house removal. You'll
regret it forever.
http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html ==
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org

Re: [Numpy-discussion] Interpolation question

2010-03-28 Thread Brennan Williams

Andrea Gavana wrote:
On 29 March 2010 00:34, Robert Kern wrote:

On Sun, Mar 28, 2010 at 18:30, Andrea Gavana andrea.gav...@gmail.com wrote:

Hi Friedrich All,

On 28 March 2010 23:51, Friedrich Romstedt wrote:

2010/3/28 Andrea Gavana andrea.gav...@gmail.com:

Example 1

# o2 and o3 are the number of production wells, split into 2
# different categories
# inj is the number of injection wells
# fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380]
gp = [103014000]
gi = [53151000]
o2w = [45]
o3w = [20]
inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I = KNOW = the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8,
less wells = less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery
in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ...

What I see from
http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.Rbf.html#scipy.interpolate.Rbf
are three things:

1. Rbf uses some weighting based on the radial functions.
2. Rbf results go through the nodal points without *smooth* set to
some value != 0
3. Rbf is isotropic

Scaling each axis by its standard deviation is a typical first start.
Shifting and scaling the values such that they each go from 0 to 1 is
another useful thing to try.

Ah, magnifico! Thank you Robert and Friedrich, it seems to be working
now... I get reasonable values for various combinations of parameters
by scaling the input data using the standard deviation of each of
them. It seems also that the other interpolation schemes are much less
erratic now, and in fact (using input values equal to the original
data) I get these range of errors for the various schemes:

inverse multiquadric -15.6098482614 15.7194674906
linear-1.76157336073e-010 1.24949181055e-010
cubic-0.000709860285963 0.018385394661
gaussian -293.930336611 282.058111404
quintic -0.176381494531 5.37780806549
multiquadric -30.9515933446 58.3786105046
thin-plate -7.06755391536e-006 8.71407169821e-005

In percentage. Some of them are still off the mark, but you should
have seen them before ;-) .

That's great news. Going to give it a try myself.
Interesting that the linear scheme gives the lowest error range.
I'll do some more analysis tomorrow, and if it works I am going to try
the bigger profile-over-time interpolation. Thank you so much guys for
your suggestions.

Andrea.

Imagination Is The Only Weapon In The War Against Reality.
http://xoomer.alice.it/infinity77/

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] how to efficiently build an array of x, y, z points

2010-03-04 Thread Brennan Williams


Christopher Barker wrote:

Bruce Southey wrote:
  
Christopher Barker provided some code last last year on appending 
ndarrays eg:

http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html



yup, Id love someone else to pick that up and test/improve it.

Anyway, that code only handles 1-d arrays, though that can be structured 
arrays. Id like to extend it to handlw n-d arrays, though you could 
only grow them in the first dimension, which may work for your case.


As for performance:

My numpy code is a bit slower than using python lists, if you add 
elements one at a time, and the elements are a standard python data 
type. It should use less memory though, if that matters.


If you add the data in big enough chunks, my method gets better performance.

  
Currently I'm adding all my corner point xyz's into a list and then 
converting to an array of shape (npoints,3)
And I'm creating a celllist with the point indices for each cell and 
then converting that into an array of shape (nactivecells,8)

Then I'm creating an unstructured grid.
  
Ultimately I'm trying to build a tvtk unstructured grid to view in a 
Traits/tvtk/Mayavi app.



I'd love to see that working, once you've got it!
  

So will I.

  The grid is ni*nj*nk cells with 8 xyz's per cell
  
(hexahedral cell with 6 faces). However some cells are inactive and 
therefore don't have geometry. Cells also have connectivity to other 
cells, usually to adjacent cells (e.g. cell i,j,k connected to cell 
i-1,j,k) but not always.



I'm confused now -- what does the array need to look like in the end? Maybe:

  ni*nj*nk X 8 X 3  ?

How is inactive indicated?
  
I made a typo in my first posting. Each cell has 8 corners, each corner 
an x,y,z so yes, if all the cells in the grid
are active then ni*nj*nk*8*3 but usually not all cells are active and it 
is optional whether to have inactive cell

geometry written out to the grid file so it is actually nactivecells*8*3

Is the connectivity somehow in the same array, or is that stored separately?
  
Bit of both - there is separate connectivity info and also implicit 
connectivity info. Often a cell will be fully connected to its adjacent 
cell(s) as they
share a common face. But also, often there is not connectivity (e.g. a 
fault) and the +I face of a cell does not match up against the -I face 
of the

adjacent cell.

At the moment, I'm not removing duplicate points (of which there are a 
lot, probably 25-50% depending on the degree of faulting).


One other thing I need to do is to re-order my xyz coordinates - in the 
attached image taken from the VTK file format pdf you can see the 0,1,2,3
and 4,5,6,7  node ordering.  In my grid  it is  0,1,3,2 and  4,5,7,6  so 
you can see that I need to swap round some of the coordinates. I need to 
do this
for each cell and there may be 10,000 of them  but there may be  
2,000,000 of them.


So I think it is probably best not to do it on a cell by cell basis but 
wait until I've built my full pointlist, then convert it to an array, 
probably of shape (nactivecells,8,3) and then somehow rearrange/reorder 
the 8 columns. Sound the right way to go?


Brennan

inline: hexahedron.png___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] how to efficiently build an array of x, y, z points

2010-03-03 Thread Brennan Williams

Bruce Southey wrote:
 On 03/02/2010 09:47 PM, David Goldsmith wrote:
 On Tue, Mar 2, 2010 at 6:59 PM, Brennan Williams 
 brennan.willi...@visualreservoir.com 
 mailto:brennan.willi...@visualreservoir.com wrote:

 David Goldsmith wrote:
 
  On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams
  brennan.willi...@visualreservoir.com
 mailto:brennan.willi...@visualreservoir.com
  mailto:brennan.willi...@visualreservoir.com
 mailto:brennan.willi...@visualreservoir.com wrote:
 
  I'm reading a file which contains a grid definition. Each
 cell in the
  grid, apart from having an i,j,k index also has 8 x,y,z
 coordinates.
  I'm reading each set of coordinates into a numpy array. I
 then want to
  add/append those coordinates to what will be my large
 points array.
  Due to the orientation/order of the 8 corners of each
 hexahedral
  cell I
  may have to reorder them before adding them to my large
 points array
  (not sure about that yet).
 
  Should I create a numpy array with nothing in it and then
 .append
  to it?
  But this is probably expensive isn't it as it creates a new
 copy
  of the
  array each time?
 
  Or should I create a zero or empty array of sufficient size and
  then put
  each set of 8 coordinates into the correct position in that
 big array?
 
  I don't know exactly how big the array will be (some cells are
  inactive
  and therefore don't have a geometry defined) but I do know
 what its
  maximum size is (ni*nj*nk,3).
 
 
  Someone will correct me if I'm wrong, but this problem - the best
  way to build a large array whose size is not known beforehand -
 came
  up in one of the tutorials at SciPyCon '09 and IIRC the answer was,
  perhaps surprisingly, build the thing as a Python list (which is
  optimized for this kind of indeterminate sequence building) and
  convert to a numpy array when you're done.  Isn't that what was
  recommended, folks?
 
 Build a list of floating point values, then convert to an array and
 shape accordingly? Or build a list of small arrays and then somehow
 convert that into a big numpy array?


 My guess is that either way will be better than iteratively 
 appending to an existing array.


 Hi,
 Christopher Barker provided some code last last year on appending 
 ndarrays eg:
 http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html

 A lot depends on your final usage of the array otherwise there are no 
 suitable suggestions. That is do you need just to index the array 
 using i, j, k indices (this gives you either an i by j by k array that 
 contains the x, y, z coordinates) or do you also need to index the x, 
 y, z coordinates as well (giving you an i by j by k by x by y by z 
 array). If it is just plain storage then perhaps just a Python list, 
 dict or sqlite object may be sufficient.

Ultimately I'm trying to build a tvtk unstructured grid to view in a 
Traits/tvtk/Mayavi app. The grid is ni*nj*nk cells with 8 xyz's per cell 
(hexahedral cell with 6 faces). However some cells are inactive and 
therefore don't have geometry. Cells also have connectivity to other 
cells, usually to adjacent cells (e.g. cell i,j,k connected to cell 
i-1,j,k) but not always.

I'll post more comments/questions as I go.

Brennan

 There are also time and memory constraints as you can spend large 
 effort just to get the input into a suitable format and memory usage. 
 If you use a secondary storage like a Python list then you need memory 
 to storage the list, the ndarray and all intermediate components and 
 overheads.

 If you use scipy then you should look at using sparse arrays where 
 space is only added as you need it.


 Bruce
 

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
   


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] how to efficiently build an array of x, y, z points

2010-03-02 Thread Brennan Williams

I'm reading a file which contains a grid definition. Each cell in the 
grid, apart from having an i,j,k index also has 8 x,y,z coordinates.
I'm reading each set of coordinates into a numpy array. I then want to 
add/append those coordinates to what will be my large points array.
Due to the orientation/order of the 8 corners of each hexahedral cell I 
may have to reorder them before adding them to my large points array 
(not sure about that yet).

Should I create a numpy array with nothing in it and then .append to it? 
But this is probably expensive isn't it as it creates a new copy of the 
array each time?

Or should I create a zero or empty array of sufficient size and then put 
each set of 8 coordinates into the correct position in that big array?

I don't know exactly how big the array will be (some cells are inactive 
and therefore don't have a geometry defined) but I do know what its 
maximum size is (ni*nj*nk,3).

Thanks

Brennan


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] how to efficiently build an array of x, y, z points

2010-03-02 Thread Brennan Williams

David Goldsmith wrote:


 On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams 
 brennan.willi...@visualreservoir.com 
 mailto:brennan.willi...@visualreservoir.com wrote:

 I'm reading a file which contains a grid definition. Each cell in the
 grid, apart from having an i,j,k index also has 8 x,y,z coordinates.
 I'm reading each set of coordinates into a numpy array. I then want to
 add/append those coordinates to what will be my large points array.
 Due to the orientation/order of the 8 corners of each hexahedral
 cell I
 may have to reorder them before adding them to my large points array
 (not sure about that yet).

 Should I create a numpy array with nothing in it and then .append
 to it?
 But this is probably expensive isn't it as it creates a new copy
 of the
 array each time?

 Or should I create a zero or empty array of sufficient size and
 then put
 each set of 8 coordinates into the correct position in that big array?

 I don't know exactly how big the array will be (some cells are
 inactive
 and therefore don't have a geometry defined) but I do know what its
 maximum size is (ni*nj*nk,3).


 Someone will correct me if I'm wrong, but this problem - the best 
 way to build a large array whose size is not known beforehand - came 
 up in one of the tutorials at SciPyCon '09 and IIRC the answer was, 
 perhaps surprisingly, build the thing as a Python list (which is 
 optimized for this kind of indeterminate sequence building) and 
 convert to a numpy array when you're done.  Isn't that what was 
 recommended, folks?

Build a list of floating point values, then convert to an array and 
shape accordingly? Or build a list of small arrays and then somehow 
convert that into a big numpy array?
I've got 24 floating point values which I've got in an array of shape 
(8,3) but I could easily have those in a list rather than an array and 
then just keep appending each small list of values to a big list and 
then do the final conversion to the array - I'll try that and see how it 
goes.

Brennan
 DG
  


 Thanks

 Brennan


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
   


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] problem with FortranFile

2009-11-08 Thread Brennan Williams

I'm using FortranFile to read a binary Fortran file.
It has a bit of header data at the top of the file which I'm reading 
with a combination of readString and struct.unpack
This is then followed by a number of lines/records, each of which has 20 
double precision reals/floats.
For some reason it reads the first 19 ok and then things start going wrong.

To read the double precision data I'm using

  for il in range(0,nlines):
  try:
  darray=f.readReals('d')
 except:
 print 'problem reading well data line',il


I've added print statements to my code and to fortranfile.py and the 
print output I get is

line 19
readRecord:l= 160
readRecord:len(data_str)= 160
readRecord:check_size= 160
readReals:len(data_str)= 160
calcsizeprec= 8
num= 20
numbers
(4.2843303680419922, 0.0, 0.0, 0.0, 4955.73974609375, 0.0, -1000.0, 
-1000.0, 0.0, 0.0, 0.0, 21.22749137878418, 0.0, 0.0, 0.0, 0.0, 
0.2054678201675415, 0.0, 6386.78271484375, 6356.27001953125)
il= 19 dsize= 20
darray= [  4.28433037e+00   0.e+00   0.e+00   0.e+00
   4.95573975e+03   0.e+00  -1.e+03  -1.e+03
   0.e+00   0.e+00   0.e+00   2.12274914e+01
   0.e+00   0.e+00   0.e+00   0.e+00
   2.05467820e-01   0.e+00   6.38678271e+03   6.35627002e+03]
-
line 20
readRecord:l= 160
readRecord:len(data_str)= 132
problem reading well data line 20


line 19 is ok, readReals calls readRecord and the length of the data is 
160 and check_size = l etc etc.

line 20 is not ok, the 4-byte length value at the start of the record is 
160 but the data_str=self.read(l) line only gets 132 bytes and not 160.

The data file itself is ok as I've written a small Fortran program to 
read it.

So, am I doing something wrong?

I have a little test .py file and data file I can include if it helps.

Regards

Brennan



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] using FortranFile to read data from a binary Fortran file

2009-11-03 Thread Brennan Williams

Brennan Williams wrote:
 Hi

 I'm using FortranFile on 32 bit XP.

 The first record in the file has both 32 bit integer values and double 
 precision float values.

 I've used readInts('i') to read the data into what I presume is a 32-bit 
 integer numpy array.

 Items 0 and 1 of the array are the first double precision value.

 What's the best way to convert/cast these to get the double precision 
 float value?

 I assume I need to do some sort of dpval=ival.astype('float64')

 so far

 f= FortranFile(fname,endian='')
 ihdr=f.readInts('i')

   
ok I took a closer look at FortranFile and I'm now doing the following. 
Note that the first line in the file I'm reading
has two double precision reals/floats followed by 8 32 bit integers.

  f=FortranFile(fullfilename,endian='')
  if f:
hdr=f.readString()
print 'hdr=',hdr
print 'len=',len(hdr)
t=struct.unpack('2d',hdr[0:16])
print 't=',t
i=struct.unpack('8i',hdr[16:])
print 'i=',i

This gives me...

len=48
t=(0.0,2000.0)
i=(0,0,0,5,0,0,1,213)

which is correct.

So is that the best way to do it, i.e. if I have a line of mixed data 
types, use readString and then do my own unpacking?

Brennan




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] using FortranFile to read data from a binary Fortran file

2009-11-03 Thread Brennan Williams

Hi

I'm using FortranFile on 32 bit XP.

The first record in the file has both 32 bit integer values and double 
precision float values.

I've used readInts('i') to read the data into what I presume is a 32-bit 
integer numpy array.

Items 0 and 1 of the array are the first double precision value.

What's the best way to convert/cast these to get the double precision 
float value?

I assume I need to do some sort of dpval=ival.astype('float64')

so far

f= FortranFile(fname,endian='')
ihdr=f.readInts('i')

Brennan

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] speeding up getting a subset of a data array

2009-08-10 Thread Brennan Williams

Brennan Williams wrote:
 Hi

 No doubt asked many times before so apologies

 I'm pulling a subset array out of a data array where I have a list of 
 the indices I want (could be an array rather than a list actually - I 
 have it in both).

 Potentially the number of points and the number of times I do this can 
 get very large so any saving in time is good.

 So, paraphrasing what I've currently got say I have...

 subsetpointerlist=[0,1,2,5,8,15,25...]
 subsetsize=len(subsetpointerlist)
 subsetarray=zeros(subsetsize,dtype=float)
 for index,pos in enumerate(subsetpointerlist):
   subsetarray[index]=dataarray[pos]

 How do I speed this up in numpy, i.e. by removing the for loop?

   
It's not as simple as...

subsetarray=dataarray[subsetpointerarray]

is it?
 Do I set up some sort of a subsetpointerarray as a mask and then somehow 
 apply that to dataarray to get the values into subsetarray?

 Thanks

 Brennan




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

   


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] speeding up getting a subset of a data array

2009-08-10 Thread Brennan Williams

josef.p...@gmail.com wrote:
 On Mon, Aug 10, 2009 at 8:52 PM, Brennan
 Williamsbrennan.willi...@visualreservoir.com wrote:
   
 Hi

 No doubt asked many times before so apologies

 I'm pulling a subset array out of a data array where I have a list of
 the indices I want (could be an array rather than a list actually - I
 have it in both).

 Potentially the number of points and the number of times I do this can
 get very large so any saving in time is good.

 So, paraphrasing what I've currently got say I have...

 subsetpointerlist=[0,1,2,5,8,15,25...]
 subsetsize=len(subsetpointerlist)
 subsetarray=zeros(subsetsize,dtype=float)
 for index,pos in enumerate(subsetpointerlist):
  subsetarray[index]=dataarray[pos]

 How do I speed this up in numpy, i.e. by removing the for loop?

 Do I set up some sort of a subsetpointerarray as a mask and then somehow
 apply that to dataarray to get the values into subsetarray?

 Thanks

 Brennan


 

 looks to me like

 subsetarray = dataarray[subsetpointerlist]

 or with type conversion

 subsetarray = dataarray[subsetpointerlist].astype(float)

 Josef
   
Thanks, with a little bit of googling/rtfm I'm getting there. Think I 
overdid my thinking on mask based on something else that Robert Kern 
helped me out with.
   
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

   


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] npfile deprecation warning

2009-06-15 Thread Brennan Williams

Hi

I'm using npfile which is giving me a deprecation warning. For the time 
being I want to continue using it but I would like to suppress
the warning messages. Is it possible to trap the deprecation warning but 
still have the npfile go ahead?

Thanks

Brennan


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] npfile deprecation warning

2009-06-15 Thread Brennan Williams

Robert Kern wrote:
 On Mon, Jun 15, 2009 at 17:27, Brennan
 Williamsbrennan.willi...@visualreservoir.com wrote:
   
 Hi

 I'm using npfile which is giving me a deprecation warning. For the time
 being I want to continue using it but I would like to suppress
 the warning messages. Is it possible to trap the deprecation warning but
 still have the npfile go ahead?
 

 http://docs.python.org/library/warnings

   
Thanks.
OK I've put the following in my code...

import warnings

def fxn():
warnings.warn(deprecated, DeprecationWarning)

with warnings.catch_warnings():
warnings.simplefilter(ignore)
fxn()




but I'm getting an invalid syntax error...

with warnings.catch_warnings():
   ^
SyntaxError: invalid syntax

I haven't used with before. Is this supposed to go in the function def 
where I use npfile? I've put it near the top of my .py file after my 
imports and before my class definitions.

btw I'm using Python 2.5.4
 
Brennan


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] npfile deprecation warning

2009-06-15 Thread Brennan Williams

Robert Kern wrote:
 On Mon, Jun 15, 2009 at 18:48, Brennan
 Williamsbrennan.willi...@visualreservoir.com wrote:
   
 Robert Kern wrote:
 
 On Mon, Jun 15, 2009 at 17:27, Brennan
 Williamsbrennan.willi...@visualreservoir.com wrote:

   
 Hi

 I'm using npfile which is giving me a deprecation warning. For the time
 being I want to continue using it but I would like to suppress
 the warning messages. Is it possible to trap the deprecation warning but
 still have the npfile go ahead?

 
 http://docs.python.org/library/warnings


   
 Thanks.
 OK I've put the following in my code...

 import warnings

 def fxn():
warnings.warn(deprecated, DeprecationWarning)

 with warnings.catch_warnings():
warnings.simplefilter(ignore)
fxn()
 

 catch_warnings() was added in Python 2.6, as stated in the
 documentation. 

My mistake. I saw the new in 2.1 at the top of the page but didn't 
read all the way to the bottom where catch_warnings is documented (with 
new in 2.6).
 I recommend setting up the simplefilter in your main()
 function, and only for DeprecationWarnings.

   
done and it works. Thanks.
 but I'm getting an invalid syntax error...

 with warnings.catch_warnings():
   ^
 SyntaxError: invalid syntax

 I haven't used with before. Is this supposed to go in the function def
 where I use npfile? I've put it near the top of my .py file after my
 imports and before my class definitions.
 

 You would use the with statement only around code that calls the function.

   
 btw I'm using Python 2.5.4
 

 In Python 2.5, you need this at the top of your file (after docstrings
 but before any other code):

 from __future__ import with_statement

   



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] CUDA

2009-05-26 Thread Brennan Williams

Not a question really but just for discussion/pie-in-the-sky etc

This is a news item on vizworld about getting Matlab code to run on a 
CUDA enabled GPU.

http://www.vizworld.com/2009/05/cuda-enable-matlab-with-gpumat/

If the use of GPU's for numerical tasks takes off (has it already?) then 
Id be interested to know the views
of the numpy experts out there.

Cheers

Brennan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] replacing Nan's in a string array converted from a float array

2009-05-07 Thread Brennan Williams

I've created an array of strings using something like

  stringarray=self.karray.astype(|S8)

If the array value is a Nan I get 1.#QNAN in my string array.

For cosmetic reasons I'd like to change this to something else, e.g. 
invalid or inactive.

My string array can be up to 100,000+ values.

Is there a fast way to do this?

Thanks

Brennan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] replacing Nan's in a string array converted from a float array

2009-05-07 Thread Brennan Williams

Charles R Harris wrote:


 On Thu, May 7, 2009 at 5:36 PM, Brennan Williams 
 brennan.willi...@visualreservoir.com 
 mailto:brennan.willi...@visualreservoir.com wrote:

 I've created an array of strings using something like

  stringarray=self.karray.astype(|S8)

 If the array value is a Nan I get 1.#QNAN in my string array.

 For cosmetic reasons I'd like to change this to something else, e.g.
 invalid or inactive.

 My string array can be up to 100,000+ values.

 Is there a fast way to do this?


 I think this is a bug.  Making the printing of nans uniform was one of 
 the goals of numpy 1.3, although a few bits were unfixable. However, 
 this looks fixable. If you are using 1.3 please open a ticket and note 
 the OS and numpy version.

ok looks like numpy 1.3.0rc1 on winxp
 Chuck


 

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] recommendations on goodness of fit functions?

2009-04-13 Thread Brennan Williams

Hi numpy/scipy users,

I'm looking to add some basic goodness-of-fit functions/plots to my app.

I have a set of simulated y vs time data and a set of observed y vs time 
data.

The time values aren't always the same, i.e. there are often fewer 
observed data points.

Some variables will be in a 0.0...1.0 range, others in a 0.0.1.0e+12 
range.

I'm also hoping to update the calculated goodness of fit value at each 
simulated timestep, the idea being to allow the user to  set a tolerance 
level which if exceeded stops the simulation (which otherwise can keep 
running for many hours/days).

Thanks

Brennan
 

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] recommendations on goodness of fit functions?

2009-04-13 Thread Brennan Williams


Charles R Harris wrote:



On Mon, Apr 13, 2009 at 6:28 PM, Brennan Williams 
brennan.willi...@visualreservoir.com 
mailto:brennan.willi...@visualreservoir.com wrote:


Hi numpy/scipy users,

I'm looking to add some basic goodness-of-fit functions/plots to
my app.

I have a set of simulated y vs time data and a set of observed y
vs time
data.

The time values aren't always the same, i.e. there are often fewer
observed data points.

Some variables will be in a 0.0...1.0 range, others in a
0.0.1.0e+12
range.

I'm also hoping to update the calculated goodness of fit value at each
simulated timestep, the idea being to allow the user to  set a
tolerance
level which if exceeded stops the simulation (which otherwise can keep
running for many hours/days).


Before I try and answer the following, attached is an example of a 
suggested GOF function.



Some questions.

1) What kind of fit are you doing?
2) What is the measurement model?
3) What do you know apriori about the measurement errors?
4) How is the time series modeled?


The simulated data are output by a oil reservoir simulator.

Time series is typically monthly or annual timesteps over anything from 
5-30 years

but it could also be in some cases 10 minute timesteps over 24 hours

The timesteps output by the simulator are controlled by the user and are 
not always even, e.g. for a simulation over 30 years you may
have annual timesteps from year 0 to year 25 and then 3 monthly from 
year 26-29 and then monthly for the most recent year.


Not sure about measurement errors - the older the data the higher the 
errors due to changes in oil field measurement technology.
And the error range varies depending on the data type as well, e.g. 
error range for a water meter is likely to be higher than that for an 
oil or gas meter.

I'll try and find out more about that.

Brennan


Chuck




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
  


inline: goodnessoffit.png___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] trying to speed up the following....

2009-03-25 Thread Brennan Williams

Robert Kern wrote:
 On Wed, Mar 25, 2009 at 00:09, Brennan Williams
 brennan.willi...@visualreservoir.com wrote:
   
 Robert Kern wrote:
 
 On Tue, Mar 24, 2009 at 18:29, Brennan Williams
 brennan.willi...@visualreservoir.com wrote:

   
 I have an array (porvatt.yarray) of ni*nj*nk values.
 I want to create two further arrays.

 activeatt.yarray is of size ni*nj*nk and is a pointer array to an active
 cell number. If a cell is inactive then its activeatt.yarray value will be  0

 ijkatt.yarray is of size nactive, the number of active cells (which I
 already know). ijkatt.yarray holds the ijk cell number for each active 
 cell.


 My code looks something like...

   activeatt.yarray=zeros(ncells,dtype=int)
   ijkatt.yarray=zeros(nactivecells,dtype=int)

iactive=-1
ni=currentgrid.ni
nj=currentgrid.nj
nk=currentgrid.nk
for ijk in range(0,ni*nj*nk):
  if porvatt.yarray[ijk]0:
iactive+=1
activeatt.yarray[ijk]=iactive
ijkatt.yarray[iactive]=ijk

 I may often have a million+ cells.
 So the code above is slow.
 How can I speed it up?

 
 mask = (porvatt.yarray.flat  0)
 ijkatt.yarray = np.nonzero(mask)

 # This is not what your code does, but what I think you want.
 # Where porvatt.yarray is inactive, activeatt.yarray is -1.
 # 0 might be an active cell.
 activeatt.yarray = np.empty(ncells, dtype=int)
 activeatt.yarray.fill(-1)
 activeatt.yarray[mask] = ijkatt.yarray



   
 Thanks. Concise  fast. This is what I've got so far (minor mods from
 the above)

 from numpy import *
 ...
 mask=porvatt.yarray0.0
 ijkatt.yarray=nonzero(mask)[0]
 activeindices=arange(0,ijkatt.yarray.size)
 activeatt.yarray = empty(ncells, dtype=int)
 activeatt.yarray.fill(-1)
 activeatt.yarray[mask] = activeindices

 I have...

 ijkatt.yarray=nonzero(mask)[0]

 because it looks like nonzero returns a tuple of arrays rather than an
 array.
 

 Yes. Apologies.

   
Apology accepted. Don't do it again.

On a more serious note, it is clear that, as expected, operating on 
elements of an array inside a Python for loop is slow for large arrays.
Soon I will be writing an import interface to read corner point grid 
geometries and I'm currently looking at vtk unstructured grids etc.
Most of the numpy vectorization is aimed at relatively simply structured 
arrays on the basis that you'll never meet everyone's needs/data structures.
So I presume that if I find I have a bottleneck in my code which looks 
specific to my data structures I should then look at offloading that to C or
Fortran? (assuming I can't find it in numpy or scipy). I'm already doing 
this to read in data from Fortran binary files although I actually 
decided to
code it in C rather than use Fortran.
 I used

 activeindices=arange(0,ijkatt.yarray.size)

 and

 activeatt.yarray[mask] = activeindices
 

 Yes. You are correct.

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] trying to speed up the following....

2009-03-24 Thread Brennan Williams

I have an array (porvatt.yarray) of ni*nj*nk values.
I want to create two further arrays.

activeatt.yarray is of size ni*nj*nk and is a pointer array to an active 
cell number. If a cell is inactive then its activeatt.yarray value will be 0

ijkatt.yarray is of size nactive, the number of active cells (which I 
already know). ijkatt.yarray holds the ijk cell number for each active cell.


My code looks something like...

   activeatt.yarray=zeros(ncells,dtype=int)
   ijkatt.yarray=zeros(nactivecells,dtype=int)

iactive=-1
ni=currentgrid.ni
nj=currentgrid.nj
nk=currentgrid.nk
for ijk in range(0,ni*nj*nk):
  if porvatt.yarray[ijk]0:
iactive+=1
activeatt.yarray[ijk]=iactive
ijkatt.yarray[iactive]=ijk

I may often have a million+ cells.
So the code above is slow.
How can I speed it up?

TIA

Brennan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] trying to speed up the following....

2009-03-24 Thread Brennan Williams

Robert Kern wrote:
 On Tue, Mar 24, 2009 at 18:29, Brennan Williams
 brennan.willi...@visualreservoir.com wrote:
   
 I have an array (porvatt.yarray) of ni*nj*nk values.
 I want to create two further arrays.

 activeatt.yarray is of size ni*nj*nk and is a pointer array to an active
 cell number. If a cell is inactive then its activeatt.yarray value will be 0

 ijkatt.yarray is of size nactive, the number of active cells (which I
 already know). ijkatt.yarray holds the ijk cell number for each active cell.


 My code looks something like...

   activeatt.yarray=zeros(ncells,dtype=int)
   ijkatt.yarray=zeros(nactivecells,dtype=int)

iactive=-1
ni=currentgrid.ni
nj=currentgrid.nj
nk=currentgrid.nk
for ijk in range(0,ni*nj*nk):
  if porvatt.yarray[ijk]0:
iactive+=1
activeatt.yarray[ijk]=iactive
ijkatt.yarray[iactive]=ijk

 I may often have a million+ cells.
 So the code above is slow.
 How can I speed it up?
 

 mask = (porvatt.yarray.flat  0)
 ijkatt.yarray = np.nonzero(mask)

 # This is not what your code does, but what I think you want.
 # Where porvatt.yarray is inactive, activeatt.yarray is -1.
 # 0 might be an active cell.
 activeatt.yarray = np.empty(ncells, dtype=int)
 activeatt.yarray.fill(-1)
 activeatt.yarray[mask] = ijkatt.yarray


   
Thanks. Concise  fast. This is what I've got so far (minor mods from 
the above)

from numpy import *
...
mask=porvatt.yarray0.0
ijkatt.yarray=nonzero(mask)[0]
activeindices=arange(0,ijkatt.yarray.size)
activeatt.yarray = empty(ncells, dtype=int)
activeatt.yarray.fill(-1)
activeatt.yarray[mask] = activeindices

I have...

ijkatt.yarray=nonzero(mask)[0]

because it looks like nonzero returns a tuple of arrays rather than an 
array.

I used

activeindices=arange(0,ijkatt.yarray.size)

and

activeatt.yarray[mask] = activeindices

as I have 686000 cells of which 129881 are 'active' so my 
activeatt.yarray values range from -1 for inactive
through 0 for the first active cell up to 129880 for the last active cell.

About to test it out by replacing my old for loop. Looks like it will be 
about 20x faster for 1m cells.


Brennan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] populating an array

2009-03-02 Thread Brennan Williams

Ok... I'm using Traits and numpy.
I have a 3D grid with directions I,J and K.
I have NI,NJ,NK cells in the I,J,K directions so I have NI*NJ*NK cells 
overall.
I have data arrays with a value for each cell in the grid.
I'm going to store this as a 1D array, i.e. 1ncells where 
ncells=NI*NJ*NK rather than as a 3D array
Apart from lots of other data arrays that will be read in from external 
files, I want to create I, J and K data arrays
where the 'I' array contains the I index for the cell, the 'J' array the 
J index etc.

I haven't used numpy extensively and I'm coming from a Fortran/C 
background so I'm hideously inefficient.
At the moment I have something like

  self.iarray=zeros(self.ncells,dtype=int)
  for icell in arange(1,self.ncells+2):
#icell=i+(j-1)*ni+(k-1)*ni*nj
i,j,k=rn2ijk(icell,self.ni,self.nj)
self.yarray[icell]=i


rn2ijk is defined as...

def rn2ijk(n,ni,nj):
nij=ni*nj
k=(n+nij-1)/nij
ij=n-(k-1)*nij
j=(ij+ni-1)/ni
i=ij-(j-1)*ni
return i,j,k

Obviously I can improve my use of rn2ijk as for a start I'm 
recalculating nij  for every cell (and I can have anything from a few 
thousand to a few million cells).

But I'm sure I could do this more efficiently by probably starting off 
with a 3d array, looping over i,j,k and then reshaping it into a 1d array.

Ideas?

Brennan


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] astype

2009-02-14 Thread Brennan Williams

Nils Wagner wrote:
 On Sat, 14 Feb 2009 17:22:43 +0100
   Nils Wagner nwag...@iam.uni-stuttgart.de wrote:
   
 Hi all,

 How can I convert an array with string elements to
 an array with float entries ?


  
 
 coord_info[:,1]
   
 array(['0,0', '100,0', '200,0', '300,0', '400,0', 
 '500,0', 
 '600,0', '700,0', '800,0', '0.0', '100.0', '200.0', 
 '300.0', '400.0', '500.0', '600.0',
'700.0', '800.0', '0.0', '100.0', '200.0', 
 '300.0', '400.0', '500.0', '600.0', '700.0', '800.0', 
 '0.0', '100.0', '200.0', '300.0', '400.0',
'500.0', '600.0', '700.0', '800.0', '0.0', 
 '100.0', '200.0', '300.0', '400.0', '500.0', '600.0', 
 '700.0', '800.0'],
   dtype='|S50')

 
 coord_info[:,1].astype(float)
   
 Traceback (most recent call last):
   File stdin, line 1, in module
 ValueError: invalid literal for float(): 0,0


 Nils
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
 


 Sorry for the noise -  dots and commas were mixed up in 
 the input file.

   
Mind you, it raises an interesting point. What if the string was an 
amount in currency, e.g. '$1,000.00' or perhaps just '1,000.00' etc.
How would one convert a string array in that format to a float array?


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] checksum on numpy float array

2008-12-06 Thread Brennan Williams

OK so maybe I should

(1) not add some sort of checksum type functionality to my read/write 
methods

  these read/write methods simply read/write numpy arrays to a 
binary file which contains one or more numpy arrays (and nothing else).

(2) replace my binary files iwith either HDF5 or PyTables

But

my app is being used by clients on existing projects - in one case there 
are over 900 of these numpy binary files in just one project, albeit 
each file is pretty small (200KB or so)

so.. questions.

How can I tranparently (or at least with minimum user-pain) replace my 
existing read/write methods with PyTables or HDF5?

My initial thoughts are...

(a) have an app version number and a data format version number which i 
can check against.

(b) if data format version  1.0  then read  from old  binary files

(c) if app version number  1.0 then write to new PyTables or HDF5 files

(d) get clients to open existing project and then save existing project 
to semi-transparently convert from old to new formats.



Francesc Alted wrote:
 A Friday 05 December 2008, Andrew Collette escrigué:
   
 Another possibility would be to use HDF5 as a data container.  It
 supports the fletcher32 filter [1] which basically computes a
 chuksum for evey data chunk written to disk and then always check
 that the data read satifies the checksum kept on-disk.  So, if the
 HDF5 layer doesn't complain, you are basically safe.

 There are at least two usable HDF5 interfaces for Python and NumPy:
 PyTables[2] and h5py [3].  PyTables does have support for that
 right out-of-the-box.  Not sure about h5py though (a quick search
 in docs doesn't reveal nothing).

 [1] http://rfc.sunsite.dk/rfc/rfc1071.html
 [2] http://www.pytables.org
 [3] http://h5py.alfven.org

 Hope it helps,
   
 Just to confirm that h5py does in fact have fletcher32; it's one of
 the options you can specify when creating a dataset, although it
 could use better documentation:

 http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create
 _dataset
 

 My bad.  I've searched for 'fletcher' instead of 'fletcher32'.  I 
 naively thought that the search tool in Sphinx allowed for partial name 
 finding.  In fact, it is a pity it does not.

 Cheers,

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] checksum on numpy float array

2008-12-04 Thread Brennan Williams

My app reads in one or more float arrays from a binary file.

Sometimes due to network timeouts etc the array is not read correctly.

What would be the best way of checking the validity of the data?

Would some sort of checksum approach be a good idea?
Would that work with an array of floating point values?
Or are checksums more for int,byte,string type data?



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] checksum on numpy float array

2008-12-04 Thread Brennan Williams

[EMAIL PROTECTED] wrote:
 On Thu, Dec 4, 2008 at 6:17 PM, Brennan Williams
 [EMAIL PROTECTED] wrote:
   
 My app reads in one or more float arrays from a binary file.

 Sometimes due to network timeouts etc the array is not read correctly.

 What would be the best way of checking the validity of the data?

 Would some sort of checksum approach be a good idea?
 Would that work with an array of floating point values?
 Or are checksums more for int,byte,string type data?

 

 If you want to verify the file itself, then python provides several
 more or less secure checksums, my experience was that zlib.crc32 was
 pretty fast on moderate file sizes. crc32 is common inside archive
 files and for binary newsgroups. If you have large files transported
 over the network, e.g. GB size, I would work with par2 repair files,
 which verifies and repairs at the same time.

   
The file has multiple arrays stored in it.

So I want to have some sort of validity check on just the array that I'm 
reading.

I will need to add a check on the file as well as of course network 
problems could affect writing to the file
as well as reading from the file.


 Josef
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] checksum on numpy float array

2008-12-04 Thread Brennan Williams

Thanks

[EMAIL PROTECTED] wrote:
 I didn't check what this does behind the scenes, but try this

   
import hashlib #standard python library
import numpy as np
 m = hashlib.md5()
 m.update(np.array(range(100)))
 m.update(np.array(range(200)))

 m2 = hashlib.md5()
 m2.update(np.array(range(100)))
 m2.update(np.array(range(200)))

 print m.hexdigest()
 print m2.hexdigest()

 assert  m.hexdigest() == m2.hexdigest()

 m3 = hashlib.md5()
 m3.update(np.array(range(100)))
 m3.update(np.array(range(199)))

 print m3.hexdigest()

 assert  m.hexdigest() == m3.hexdigest()

 Josef
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] checksum on numpy float array

2008-12-04 Thread Brennan Williams

Robert Kern wrote:
 On Thu, Dec 4, 2008 at 18:54, Brennan Williams
 [EMAIL PROTECTED] wrote:
   
 Thanks

 [EMAIL PROTECTED] wrote:
 
 I didn't check what this does behind the scenes, but try this


   
 import hashlib #standard python library
 import numpy as np
 
 m = hashlib.md5()
 m.update(np.array(range(100)))
 m.update(np.array(range(200)))
   

 I would recommend doing this on the strings before you make arrays
 from them. You don't know if the network cut out in the middle of an
 8-byte double.

 Of course, sending the lengths and other metadata first, then the data
 would let you check without needing to do expensivish hashes or
 checksums. If truncation is your problem rather than corruption, then
 that would be sufficient. You may also consider using the NPY format
 in numpy 1.2 to implement that.

   
Thanks for the ideas. I'm definitely going to add some more basic checks 
on lengths etc as well.
Unfortunately the problem is happening at a client site  so  (a) I can't 
reproduce it and (b) most of the
time they can't reproduce it either. This is a Windows Python app 
running on Citrix reading/writing data
to a Linux networked drive.

Brennan


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

38 matches

Mail list logo