Re: [Numpy-discussion] Py3k: making a py3k compat header available in installed numpy for scipy

2010-03-30 Thread Pauli Virtanen
2010/3/30 David Cournapeau da...@silveregg.co.jp
 Pauli Virtanen wrote:
[clip]
  At least, I don't see what I would like to change there. The only thing
  I wouldn't perhaps like to have in the long run are the PyString and
  possibly PyInt redefinition macros.

 I would also prefer a new name, instead of macro redefinition, but I can
 do it.

For strings, the new names are PyBytes, PyUnicode and PyUString
(unicode on Py3, bytes on Py2). One of these should be used instead of PyString
in all places, but redefining the macro reduced the work in making a
quick'n'dirty
port of numpy.

For integers, perhaps a separate PyInt on Py2, PyLong on Py3 type should
be defined. Not sure if this would reduce work in practice.

 The header would not be part of the public API proper anyway (I
 will put it somewhere else than numpy/include), it should never be
 pulled implicitly in when including one of the .h in numpy/include.

Agreed.

Pauli
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 2to3 distutils command ?

2010-03-30 Thread Pauli Virtanen
2010/3/30 David Cournapeau da...@silveregg.co.jp:
 Currently, when building numpy with python 3, the 2to3 conversion
 happens before calling any distutils command. Was there a reason for
 doing it as it is done now ?

This allowed 2to3 to also port the various setup*.py files and
numpy.distutils, and implementing it this way required the minimum
amount of work and understanding of distutils -- you need to force it
to proceed with the build using the set of output files from 2to3.

 I would like to make a proper numpy.distutils command for it, so that it
 can be more finely controlled (in particular, using the -j option). It
 would also avoid duplication in scipy.

Are you sure you want to mix distutils in this? Wouldn't it only
obscure how things work?

If the aim is in making the 2to3 processing reusable, I'd rather
simply move tools/py3tool.py under numpy.distutils (+ perhaps do some
cleanups), and otherwise keep it completely separate from distutils.
It could be nice to have the 2to3 conversion parallelizable, but there
are probably simple ways to do it without mixing distutils in.

But if you think this is really worth doing, go ahead.

Pauli
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.distutils/f2py: forcing 8-bit reals

2010-03-30 Thread Dag Sverre Seljebotn
David Warde-Farley wrote:
 Hi,

 In my setup.py, I have
 from numpy.distutils.misc_util import Configuration

 fflags= '-fdefault-real-8 -ffixed-form'
 config = Configuration(
  'foo',
  parent_package=None,
  top_path=None,
  f2py_options='--f77flags=\'%s\' --f90flags=\'%s\'' % (fflags,  
 fflags)
 )

 However I am still getting stuff returned in 'real' variables as  
 dtype=float32. Am I doing something wrong?
   
Unless f2py is (too) smart, it probably just pass along --f77flags and 
--f90flags to the Fortran compiler, but don't use them when creating the 
wrapping C Python extension. So, Fortran uses real(8) and the type in C 
is float -- and what NumPy ends up seeing is float32.

Unless you're dealing with arrays, you are likely blowing your stack here...

I wouldn't think there's a way around this except fixing the original 
source. f2py is very much based on assumptions about type sizes which 
you then violate when passing -fdefault-real-8.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.distutils/f2py: forcing 8-bit reals

2010-03-30 Thread Dag Sverre Seljebotn
Dag Sverre Seljebotn wrote:
 David Warde-Farley wrote:
 Hi,

 In my setup.py, I have
 from numpy.distutils.misc_util import Configuration

 fflags= '-fdefault-real-8 -ffixed-form'
 config = Configuration(
  'foo',
  parent_package=None,
  top_path=None,
  f2py_options='--f77flags=\'%s\' --f90flags=\'%s\'' % (fflags,  
 fflags)
 )

 However I am still getting stuff returned in 'real' variables as  
 dtype=float32. Am I doing something wrong?
   
 Unless f2py is (too) smart, it probably just pass along --f77flags and 
 --f90flags to the Fortran compiler, but don't use them when creating 
 the wrapping C Python extension. So, Fortran uses real(8) and the type 
 in C is float -- and what NumPy ends up seeing is float32.

 Unless you're dealing with arrays, you are likely blowing your stack 
 here...

 I wouldn't think there's a way around this except fixing the original 
 source. f2py is very much based on assumptions about type sizes which 
 you then violate when passing -fdefault-real-8.
Well, you can pass -fdefault-real-8 and then write .pyf headers where 
real(8) is always given explicitly.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 2to3 distutils command ?

2010-03-30 Thread Ryan May
On Tue, Mar 30, 2010 at 1:09 AM, Pauli Virtanen p...@iki.fi wrote:
 2010/3/30 David Cournapeau da...@silveregg.co.jp:
 Currently, when building numpy with python 3, the 2to3 conversion
 happens before calling any distutils command. Was there a reason for
 doing it as it is done now ?

 This allowed 2to3 to also port the various setup*.py files and
 numpy.distutils, and implementing it this way required the minimum
 amount of work and understanding of distutils -- you need to force it
 to proceed with the build using the set of output files from 2to3.

 I would like to make a proper numpy.distutils command for it, so that it
 can be more finely controlled (in particular, using the -j option). It
 would also avoid duplication in scipy.

 Are you sure you want to mix distutils in this? Wouldn't it only
 obscure how things work?

 If the aim is in making the 2to3 processing reusable, I'd rather
 simply move tools/py3tool.py under numpy.distutils (+ perhaps do some
 cleanups), and otherwise keep it completely separate from distutils.
 It could be nice to have the 2to3 conversion parallelizable, but there
 are probably simple ways to do it without mixing distutils in.

Out of curiosity, is there something wrong with the support for 2to3
that already exists within distutils? (Other than it just being
distutils)

http://bruynooghe.blogspot.com/2010/03/using-lib2to3-in-setuppy.html

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making a 2to3 distutils command ?

2010-03-30 Thread Pauli Virtanen
ti, 2010-03-30 kello 07:18 -0600, Ryan May kirjoitti:
 Out of curiosity, is there something wrong with the support for 2to3
 that already exists within distutils? (Other than it just being
 distutils)
 
 http://bruynooghe.blogspot.com/2010/03/using-lib2to3-in-setuppy.html

That AFAIK converts only those Python files that will be installed, and
I don't know how to tell it to disable some conversion on a per-file
basis. (Numpy also contains some files necessary for building that will
not be installed...)

But OK, to be honest, I didn't look closely if one could make it work
using the bundled 2to3 command. Things might be cleaner if it can be
made to work.

Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] SciPy 2010: Vote on Tutorials Sprints

2010-03-30 Thread Amenity Applewhite

Email not displaying correctly? View it in your browser.

Hello Amenity,

Spring is upon us and arrangements for SciPy 2010 are in full swing.   
We're already nearing on some important deadlines for conference  
participants: April 11th is the deadline for submitting an abstract  
for a paper, and April 15th is the deadline for submitting a tutorial  
proposal.


Help choose tutorials for SciPy 2010...
We set up a UserVoice page to brainstorm tutorial topics last week and  
we already have some great ideas. The top ones at the moment are:


Effective multi-core programming with Cython and Python
Building your own app with Mayavi
High performance computing with Python
Propose your own or vote on the existing suggestions here.

...Or instruct a tutorial and cover your conference costs.

Did you know that we're awarding generous stipends to tutorial  
instructors this year?  So if you believe you could lead a tutorial,  
by all means submit your proposal — soon! They're due April 15th.


Call for Papers Continues
Submitting a paper for to present at SciPy 2010 is easy, so remember  
to prepare one and have your friends and colleagues follow suit. Send  
us your abstract before April 11th and let us know whether you'd like  
to speak at the main conference or one of the specialized tracks.   
Details here.


Have you registered?

Booking your tickets early should save you money — not to mention the  
early registration prices you will qualify for if you register before  
May 10th.


Best,

The SciPy 2010 Team
@SciPy2010 on Twitter

You are receiving this email because you have registered for the SciPy  
2010 conference in Austin, TX.


Unsubscribe amen...@enthought.com from this list | Forward to a friend  
| Update your profile

Our mailing address is:
Enthought, Inc.
515 Congress Ave.
Austin, TX 78701

Add us to your address book

Copyright (C) 2010 Enthought, Inc. All rights reserved.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] indexing question

2010-03-30 Thread Tom K.

This one bit me again, and I am trying to understand it better so I can
anticipate when it will happen.

What I want to do is get rid of singleton dimensions, and index into the
last dimension with an array.  

In [1]: import numpy as np

In [2]: x=np.zeros((10,1,1,1,14,1024))

In [3]: x[:,0,0,0,:,[1,2,3]].shape
Out[3]: (3, 10, 14)

Whoa!  Trimming my array to a desired number ends up moving the last
dimension to the first!

In [4]: np.__version__
Out[4]: '1.3.0'

...
In [7]: x[:,:,:,:,:,[1,2,3]].shape
Out[7]: (10, 1, 1, 1, 14, 3)

This looks right...

In [8]: x[...,[1,2,3]].shape
Out[8]: (10, 1, 1, 1, 14, 3)

and this...

In [9]: x[...,[1,2,3]][:,0,0,0].shape
Out[9]: (10, 14, 3)

...
In [11]: x[:,0,0,0][...,[1,2,3]].shape
Out[11]: (10, 14, 3)

Either of the last 2 attempts above results in what I want, so I can do
that... I just need some help deciphering when and why the first thing
happens.

-- 
View this message in context: 
http://old.nabble.com/indexing-question-tp28083162p28083162.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] indexing question

2010-03-30 Thread Alan G Isaac
On 3/30/2010 10:13 AM, Tom K. wrote:
 What I want to do is get rid of singleton dimensions, and index into the
 last dimension with an array.

 x=np.zeros((10,1,1,1,14,1024))
 np.squeeze(x).shape
(10, 14, 1024)

hth,
Alan Isaac
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] indexing question

2010-03-30 Thread Charles R Harris
On Tue, Mar 30, 2010 at 8:13 AM, Tom K. t...@kraussfamily.org wrote:


 This one bit me again, and I am trying to understand it better so I can
 anticipate when it will happen.

 What I want to do is get rid of singleton dimensions, and index into the
 last dimension with an array.

 In [1]: import numpy as np

 In [2]: x=np.zeros((10,1,1,1,14,1024))

 In [3]: x[:,0,0,0,:,[1,2,3]].shape
 Out[3]: (3, 10, 14)


Hmm... That doesn't look right.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] indexing question

2010-03-30 Thread Robert Kern
On Tue, Mar 30, 2010 at 09:46, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Tue, Mar 30, 2010 at 8:13 AM, Tom K. t...@kraussfamily.org wrote:

 This one bit me again, and I am trying to understand it better so I can
 anticipate when it will happen.

 What I want to do is get rid of singleton dimensions, and index into the
 last dimension with an array.

 In [1]: import numpy as np

 In [2]: x=np.zeros((10,1,1,1,14,1024))

 In [3]: x[:,0,0,0,:,[1,2,3]].shape
 Out[3]: (3, 10, 14)


 Hmm... That doesn't look right.

It's a known feature. Slicing and list indexing are separate
subsystems. The list indexing takes priority so the list-indexed axes
end up first in the result. The sliced axes follow them.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] indexing question

2010-03-30 Thread josef . pktd
On Tue, Mar 30, 2010 at 10:13 AM, Tom K. t...@kraussfamily.org wrote:

 This one bit me again, and I am trying to understand it better so I can
 anticipate when it will happen.

 What I want to do is get rid of singleton dimensions, and index into the
 last dimension with an array.

 In [1]: import numpy as np

 In [2]: x=np.zeros((10,1,1,1,14,1024))

 In [3]: x[:,0,0,0,:,[1,2,3]].shape
 Out[3]: (3, 10, 14)

 Whoa!  Trimming my array to a desired number ends up moving the last
 dimension to the first!

 In [4]: np.__version__
 Out[4]: '1.3.0'

 ...
 In [7]: x[:,:,:,:,:,[1,2,3]].shape
 Out[7]: (10, 1, 1, 1, 14, 3)

 This looks right...

 In [8]: x[...,[1,2,3]].shape
 Out[8]: (10, 1, 1, 1, 14, 3)

 and this...

 In [9]: x[...,[1,2,3]][:,0,0,0].shape
 Out[9]: (10, 14, 3)

 ...
 In [11]: x[:,0,0,0][...,[1,2,3]].shape
 Out[11]: (10, 14, 3)

 Either of the last 2 attempts above results in what I want, so I can do
 that... I just need some help deciphering when and why the first thing
 happens.

An explanation about the surprising behavior when slicing and fancy
indexing is mixed with more than 2 dimensions is in this thread
http://www.mail-archive.com/numpy-discussion@scipy.org/msg16299.html

More examples show up every once in a while on the mailing list.

Josef


 --
 View this message in context: 
 http://old.nabble.com/indexing-question-tp28083162p28083162.html
 Sent from the Numpy-discussion mailing list archive at Nabble.com.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Sean Mulcahy
Hello all,

I'm relatively new to numpy.  I'm working with text images as 512x512 arrays.  
I would like to set elements of the array whose value fall within a specified 
range to zero (eg 23  x  45).  Any advice is much appreciated.

Sean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Alan G Isaac
On 3/30/2010 12:56 PM, Sean Mulcahy wrote:
 512x512 arrays.  I would like to set elements of the array whose value fall 
 within a specified range to zero (eg 23  x  45).

x[(23x)*(x45)]=0

hth,
Alann Isaac

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Ryan May
On Tue, Mar 30, 2010 at 11:12 AM, Alan G Isaac ais...@american.edu wrote:
 On 3/30/2010 12:56 PM, Sean Mulcahy wrote:
 512x512 arrays.  I would like to set elements of the array whose value fall 
 within a specified range to zero (eg 23  x  45).

 x[(23x)*(x45)]=0

Or a version that seems a bit more obvious (doing a multiply between
boolean arrays to get an AND operator seems a tad odd):

x[(23x)  (x45)] = 0

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.distutils/f2py: forcing 8-bit reals

2010-03-30 Thread David Warde-Farley
Hey Dag,

On 30-Mar-10, at 5:02 AM, Dag Sverre Seljebotn wrote:

 Well, you can pass -fdefault-real-8 and then write .pyf headers where
 real(8) is always given explicitly.


Actually I've gotten it to work this way, with real(8) in the wrappers.

BUT... for some reason it requires me to set the environment variable  
G77='gfortran -fdefault-real-8'. Simply putting it in f2py_options='-- 
f77flags=\'-fdefault-real-8\' --f90flags=\'-fdefault-real-8\'' doesn't  
seem to do the trick. I don't really understand why.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.distutils/f2py: forcing 8-bit reals

2010-03-30 Thread David Warde-Farley

On 30-Mar-10, at 2:14 PM, David Warde-Farley wrote:

 Hey Dag,

 On 30-Mar-10, at 5:02 AM, Dag Sverre Seljebotn wrote:

 Well, you can pass -fdefault-real-8 and then write .pyf headers where
 real(8) is always given explicitly.


 Actually I've gotten it to work this way, with real(8) in the  
 wrappers.

 BUT... for some reason it requires me to set the environment  
 variable G77='gfortran -fdefault-real-8'. Simply putting it in  
 f2py_options='--f77flags=\'-fdefault-real-8\' --f90flags=\'-fdefault- 
 real-8\'' doesn't seem to do the trick. I don't really understand why.

Sorry, that should say F77='gfortran -fdefault-real-8'. Setting G77  
obviously doesn't do anything. ;)

Without that environment variable, I think it is blowing the stack;  
the program just hangs.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Replacing NANs

2010-03-30 Thread Vishal Rana
Hi,

In an array I want to replace all NANs with some number say 100, I found a
method* **nan_to_num *but it only replaces with zero.
Any solution for this?
*
*Thanks
Vishal
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Replacing NANs

2010-03-30 Thread Angus McMorland
On 30 March 2010 14:59, Vishal Rana ranavis...@gmail.com wrote:
 Hi,
 In an array I want to replace all NANs with some number say 100, I found a
 method nan_to_num but it only replaces with zero.
 Any solution for this?

ar[np.isnan(ar)] = my_num

where ar is your array and my_num is the number you want to replace nans with.

Angus.

 Thanks
 Vishal
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion





-- 
AJC McMorland
Post-doctoral research fellow
Neurobiology, University of Pittsburgh
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Replacing NANs

2010-03-30 Thread Zachary Pincus
 In an array I want to replace all NANs with some number say 100, I  
 found a method nan_to_num but it only replaces with zero.
 Any solution for this?

Indexing with a mask is one approach here:

a[numpy.isnan(a)] = 100

also cf. numpy.isfinite as well in case you want the same with infs.

Zach


On Mar 30, 2010, at 2:59 PM, Vishal Rana wrote:

 Hi,

 In an array I want to replace all NANs with some number say 100, I  
 found a method nan_to_num but it only replaces with zero.
 Any solution for this?

 Thanks
 Vishal
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Replacing NANs

2010-03-30 Thread Vishal Rana
That was quick!

Thanks Angus and Zachary

On Tue, Mar 30, 2010 at 11:59 AM, Vishal Rana ranavis...@gmail.com wrote:

 Hi,

 In an array I want to replace all NANs with some number say 100, I found a
 method* **nan_to_num *but it only replaces with zero.
 Any solution for this?
 *
 *Thanks
 Vishal

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Interpolation question

2010-03-30 Thread Kevin Dunn
Andrea Gavana andrea.gavana at gmail.com writes:

 
 Hi Kevin,
 
 On 29 March 2010 01:38, Kevin Dunn wrote:
  Message: 5
  Date: Sun, 28 Mar 2010 00:24:01 +
  From: Andrea Gavana andrea.gavana at gmail.com
  Subject: [Numpy-discussion] Interpolation question
  To: Discussion of Numerical Python numpy-discussion at scipy.org
  Message-ID:
         d5ff27201003271724o6c82ec75v225d819c84140b46 at mail.gmail.com
  Content-Type: text/plain; charset=ISO-8859-1
 
  Hi All,
 
     I have an interpolation problem and I am having some difficulties
  in tackling it. I hope I can explain myself clearly enough.
 
  Basically, I have a whole bunch of 3D fluid flow simulations (close to
  1000), and they are a result of different combinations of parameters.
  I was planning to use the Radial Basis Functions in scipy, but for the
  moment let's assume, to simplify things, that I am dealing only with
  one parameter (x). In 1000 simulations, this parameter x has 1000
  values, obviously. The problem is, the outcome of every single
  simulation is a vector of oil production over time (let's say 40
  values per simulation, one per year), and I would like to be able to
  interpolate my x parameter (1000 values) against all the simulations
  (1000x40) and get an approximating function that, given another x
  parameter (of size 1x1) will give me back an interpolated production
  profile (of size 1x40).
 
  [I posted the following earlier but forgot to change the subject - it
  appears as a new thread called NumPy-Discussion Digest, Vol 42, Issue
  85 - please ignore that thread]
 
  Andrea, may I suggest a different approach to RBF's.
 
  Realize that your vector of 40 values for each row in y are not
  independent of each other (they will be correlated).  First build a
  principal component analysis (PCA) model on this 1000 x 40 matrix and
  reduce it down to a 1000 x A matrix, called your scores matrix, where
  A is the number of independent components. A is selected so that it
  adequately summarizes Y without over-fitting and you will find A 
  40, maybe A = 2 or 3. There are tools, such as cross-validation, that
  will help select a reasonable value of A.
 
  Then you can relate your single column of X to these independent
  columns in A using a tool such as least squares: one least squares
  model per column in the scores matrix.  This works because each column
  in the score vector is independent (contains totally orthogonal
  information) to the others.  But I would be surprised if this works
  well enough, unless A = 1.
 
  But it sounds like your don't just have a single column in your
  X-variables (you hinted that the single column was just for
  simplification).  In that case, I would build a projection to latent
  structures model (PLS) model that builds a single latent-variable
  model that simultaneously models the X-matrix, the Y-matrix as well as
  providing the maximal covariance between these two matrices.
 
  If you need some references and an outline of code, then I can readily
  provide these.
 
  This is a standard problem with data from spectroscopic instruments
  and with batch processes.  They produce hundreds, sometimes 1000's of
  samples per row. PCA and PLS are very effective at summarizing these
  down to a much smaller number of independent columns, very often just
  a handful, and relating them (i.e. building a predictive model) to
  other data matrices.
 
  I also just saw the suggestions of others to center the data by
  subtracting the mean from each column in Y and scaling (by dividing
  through by the standard deviation).  This is a standard data
  preprocessing step, called autoscaling and makes sense for any data
  analysis, as you already discovered.
 
 I have got some success by using time-based RBFs interpolations, but I
 am always open to other possible implementations (as the one I am
 using can easily fail for strange combinations of input parameters).
 Unfortunately, my understanding of your explanation is very very
 limited: I am not an expert at all, so it's a bit hard for me to
 translate the mathematical technical stuff in something I can
 understand. If you have an example code (even a very trivial one) for
 me to study so that I can understand what the code is actually doing,
 I would be more than grateful for your help 


PCA can be calculated in several different ways.  The code below works well
enough in many cases, but it can't handle missing data.  If you need that
capability, let me know.

You can read more about PCA here (shameless plug for the course I teach on this
material):
http://stats4.eng.mcmaster.ca/wiki/Latent_variable_methods_and_applications 

And here is some Python code for a fake data set and an actual data set.  Hope
that gets you started, Kevin.

import numpy as np

use_fake_data = False

if use_fake_data:
N = 1000
K = 40
A = 3   # extract 3 principal components

# Replace this garbage data with your actual, raw data
X_raw = 

Re: [Numpy-discussion] numpy.distutils/f2py: forcing 8-bit reals

2010-03-30 Thread David Warde-Farley

On 30-Mar-10, at 5:02 AM, Dag Sverre Seljebotn wrote:

 Well, you can pass -fdefault-real-8 and then write .pyf headers where
 real(8) is always given explicitly.

Okay, the answer (without setting the F77 environment variable) is  
basically to expect real-8's in the .pyf file and compile the whole  
package with

python setup.py config_fc --fcompiler=gnu95 --f77flags='-fdefault- 
real-8' --f90flags='-fdefault-real-8' build

Still not sure if there's a sane way to make this the default in my  
setup.py (I found some old mailing list posts where Pearu suggests  
modifying sys.argv but I think this is widely considered a bad idea).  
The right way might involve instantiating a GnuF95Compiler object and  
messing with its fields or something like that. Anyway, this works for  
now.

Thanks,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Interpolation question

2010-03-30 Thread Friedrich Romstedt
2010/3/30 Andrea Gavana andrea.gav...@gmail.com:
 On 29 March 2010 23:44, Friedrich Romstedt wrote:
 When you have nice results using 40 Rbfs for each time instant, this
 procedure means that the values for one time instant will not be
 influenced by adjacent-year data.  I.e., you would probably get the
 same result using a norm extraordinary blowing up the time coordinate.
  To make it clear in code, when the time is your first coordinate, and
 you have three other coordinates, the *norm* would be:

 def norm(x1, x2):
    return numpy.sqrtx1 - x2) * [1e3, 1, 1]) ** 2).sum())

 In this case, the epsilon should be fixed, to avoid the influence of
 the changing distances on the epsilon determination inside of Rbf,
 which would spoil the whole thing.

Of course, it are here two and not three other variables.

 I have an idea how to tune your model:  Take, say, the half or three
 thirds of your simulation data as interpolation database, and try to
 reproduce the remaining part.  I have some ideas how to tune using
 this in practice.

Here, of course it are three quarters and not three thirds :-)

 This is a very good idea indeed: I am actually running out of test
 cases (it takes a while to run a simulation, and I need to do it every
 time I try a new combination of parameters to check if the
 interpolation is good enough or rubbish). I'll give it a go tomorrow
 at work and I'll report back (even if I get very bad results :-D ).

I refined the idea a bit.  Select one simulation, and use the complete
rest as the interpolation base.  Then repreat this for each
simualation.  Calculate some joint value for all the results, the
simplest would maybe be, to calculate:

def joint_ln_density(simulation_results, interpolation_results):
return -((interpolation_results - simulation_results) ** 2) /
(simulation_results ** 2)

In fact, this calculates the logarithm of the Gaussians centered at
*simulation_results* and taken at the obervations
*interpolation_results*.  It is the logarithms of the product of this
Gaussians.  The standard deviation of the Gaussians is assumed to be
the value of the *simulation_results*, which means, that I assume that
low-valued outcomes are much more precise in absolute numbers than
high-outcome values, but /relative/ to their nominal value they are
all the same precise.  (NB: A scaling of the stddevs wouldn't make a
significant difference /for you/.  Same the neglected coefficients of
the Gaussians.)

I don't know, which method you like the most.  Robert's and Kevin's
proposals are hard to compete with ...

You could optimise (maximise) the joint_ln_density outcome as a
function of *epsilon* and the different scalings.  afaic, scipy comes
with some optimisation algorithms included.  I checked it:
http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpose
.

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Interpolation question

2010-03-30 Thread Andrea Gavana
Hi Friedrich  All,

On 30 March 2010 21:48, Friedrich Romstedt wrote:
 2010/3/30 Andrea Gavana andrea.gav...@gmail.com:
 On 29 March 2010 23:44, Friedrich Romstedt wrote:
 When you have nice results using 40 Rbfs for each time instant, this
 procedure means that the values for one time instant will not be
 influenced by adjacent-year data.  I.e., you would probably get the
 same result using a norm extraordinary blowing up the time coordinate.
  To make it clear in code, when the time is your first coordinate, and
 you have three other coordinates, the *norm* would be:

 def norm(x1, x2):
    return numpy.sqrtx1 - x2) * [1e3, 1, 1]) ** 2).sum())

 In this case, the epsilon should be fixed, to avoid the influence of
 the changing distances on the epsilon determination inside of Rbf,
 which would spoil the whole thing.

 Of course, it are here two and not three other variables.

 I have an idea how to tune your model:  Take, say, the half or three
 thirds of your simulation data as interpolation database, and try to
 reproduce the remaining part.  I have some ideas how to tune using
 this in practice.

 Here, of course it are three quarters and not three thirds :-)

 This is a very good idea indeed: I am actually running out of test
 cases (it takes a while to run a simulation, and I need to do it every
 time I try a new combination of parameters to check if the
 interpolation is good enough or rubbish). I'll give it a go tomorrow
 at work and I'll report back (even if I get very bad results :-D ).

 I refined the idea a bit.  Select one simulation, and use the complete
 rest as the interpolation base.  Then repreat this for each
 simualation.  Calculate some joint value for all the results, the
 simplest would maybe be, to calculate:

 def joint_ln_density(simulation_results, interpolation_results):
        return -((interpolation_results - simulation_results) ** 2) /
 (simulation_results ** 2)

 In fact, this calculates the logarithm of the Gaussians centered at
 *simulation_results* and taken at the obervations
 *interpolation_results*.  It is the logarithms of the product of this
 Gaussians.  The standard deviation of the Gaussians is assumed to be
 the value of the *simulation_results*, which means, that I assume that
 low-valued outcomes are much more precise in absolute numbers than
 high-outcome values, but /relative/ to their nominal value they are
 all the same precise.  (NB: A scaling of the stddevs wouldn't make a
 significant difference /for you/.  Same the neglected coefficients of
 the Gaussians.)

 I don't know, which method you like the most.  Robert's and Kevin's
 proposals are hard to compete with ...

 You could optimise (maximise) the joint_ln_density outcome as a
 function of *epsilon* and the different scalings.  afaic, scipy comes
 with some optimisation algorithms included.  I checked it:
 http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpose

I had planned to show some of the results based on the suggestion you
gave me yesterday: I took two thirds ( :-D ) of the simulations
database to use them as interpolation base and tried to reproduce the
rest using the interpolation. Unfortunately it seems like my computer
at work has blown up (maybe a BSOD, I was doing wy too many heavy
things at once) and I can't access it from home at the moment. I can't
show the real field profiles, but at least I can show you how good or
bad the interpolation performs (in terms of relative errors), and I
was planning to post a matplotlib composite graph to do just that. I
am still hopeful my PC will resurrect at some point :-D

However, from the first 100 or so interpolated simulations, I could
gather these findings:

1) Interpolations on *cumulative* productions on oil and gas are
extremely good, with a maximum range of relative error of -3% / +2%:
most of them (95% more or less) show errors  1%, but for few of them
I get the aforementioned range of errors in the interpolations;
2) Interpolations on oil and gas *rates* (time dependent), I usually
get a -5% / +3% error per timestep, which is very good for my
purposes. I still need to check these values but the first set of
results were very promising;
3) Interpolations on gas injection (cumulative and rate) are a bit
more shaky (15% error more or less), but this is essentially due to a
particular complex behaviour of the reservoir simulator when it needs
to decide (based on user input) if the gas is going to be re-injected,
sold, used as excess gas and a few more; I am not that worried about
this issue for the moment.

I'll be off for a week for Easter (I'll leave for Greece in few
hours), but I am really looking forward to try the suggestion Kevin
posted and to investigate Robert's idea: I know about kriging, but I
initially thought it wouldn't do a good job in this case. I'll
reconsider for sure. And I'll post a screenshot of the results as soon
as my PC get out of the Emergency Room.

Thank you again!

Andrea.


Re: [Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Friedrich Romstedt
2010/3/30 Ryan May rma...@gmail.com:
 On Tue, Mar 30, 2010 at 11:12 AM, Alan G Isaac ais...@american.edu wrote:
 On 3/30/2010 12:56 PM, Sean Mulcahy wrote:
 512x512 arrays.  I would like to set elements of the array whose value fall 
 within a specified range to zero (eg 23  x  45).

 x[(23x)*(x45)]=0

 Or a version that seems a bit more obvious (doing a multiply between
 boolean arrays to get an AND operator seems a tad odd):

 x[(23x)  (x45)] = 0

We recently found out that it executes faster using:

x *= ((x = 23) | (x = 45))  .

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fourier transform

2010-03-30 Thread Pascal
Le Mon, 29 Mar 2010 16:12:56 -0600,
Charles R Harris charlesr.har...@gmail.com a écrit :

 On Mon, Mar 29, 2010 at 3:00 PM, Pascal pascal...@parois.net wrote:
 
  Hi,
 
  Does anyone have an idea how fft functions are implemented? Is it
  pure python? based on BLAS/LAPACK? or is it using fftw?
 
  I successfully used numpy.fft in 3D. I would like to know if I can
  calculate a specific a plane using the numpy.fft.
 
  I have in 3D:
  r(x, y, z)=\sum_h^N-1 \sum_k^M-1 \sum_l^O-1 f_{hkl}
   \exp(-2\pi \i (hx/N+ky/M+lz/O))
 
  So for the plane, z is no longer independant.
  I need to solve the system:
  ax+by+cz+d=0
  r(x, y, z)=\sum_h^N-1 \sum_k^M-1 \sum_l^O-1 f_{hkl}
   \exp(-2\pi \i (hx/N+ky/M+lz/O))
 
  Do you think it's possible to use numpy.fft for this?
 
 
 I'm not clear on what you want to do here, but note that the term in
 the in the exponent is of the form k, x, i.e., the inner product of
 the vectors k and x. So if you rotate x by O so that the plane is
 defined by z = 0, then k, Ox = O.T, x. That is, you can apply the
 transpose of the rotation to the result of the fft.

In other words, z is no longer independent but depends on x and y.

Apparently, nobody is calculating the exact plane but they are making a
slice in the 3D grid and doing some interpolation.

However, your answer really help me on something completely different :)

Thanks,
Pascal
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Ryan May
On Tue, Mar 30, 2010 at 3:16 PM, Friedrich Romstedt
friedrichromst...@gmail.com wrote:
 2010/3/30 Ryan May rma...@gmail.com:
 On Tue, Mar 30, 2010 at 11:12 AM, Alan G Isaac ais...@american.edu wrote:
 On 3/30/2010 12:56 PM, Sean Mulcahy wrote:
 512x512 arrays.  I would like to set elements of the array whose value 
 fall within a specified range to zero (eg 23  x  45).

 x[(23x)*(x45)]=0

 Or a version that seems a bit more obvious (doing a multiply between
 boolean arrays to get an AND operator seems a tad odd):

 x[(23x)  (x45)] = 0

 We recently found out that it executes faster using:

 x *= ((x = 23) | (x = 45))  .

Interesting. In an ideal world, I'd love to see why exactly that is,
because I don't think multiplication should be faster than a boolean
op.  If you need speed, then by all means go for it.  But if you don't
need speed I'd use the  since that will be more obvious to the person
who ends up reading your code later and has to spend time decoding
what that line does.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Interpolation question

2010-03-30 Thread Friedrich Romstedt
2010/3/30 Andrea Gavana andrea.gav...@gmail.com:
 However, from the first 100 or so interpolated simulations, I could
 gather these findings:

 1) Interpolations on *cumulative* productions on oil and gas are
 extremely good, with a maximum range of relative error of -3% / +2%:
 most of them (95% more or less) show errors  1%, but for few of them
 I get the aforementioned range of errors in the interpolations;
 2) Interpolations on oil and gas *rates* (time dependent), I usually
 get a -5% / +3% error per timestep, which is very good for my
 purposes. I still need to check these values but the first set of
 results were very promising;
 3) Interpolations on gas injection (cumulative and rate) are a bit
 more shaky (15% error more or less), but this is essentially due to a
 particular complex behaviour of the reservoir simulator when it needs
 to decide (based on user input) if the gas is going to be re-injected,
 sold, used as excess gas and a few more; I am not that worried about
 this issue for the moment.

Have a nice time in Greece, and what you write makes me laughing. :-)
When you are back, you should maybe elaborate a bit on what gas
injections, wells, re-injected gas and so on is, I don't know about
it.

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Robert Kern
On Tue, Mar 30, 2010 at 16:35, Ryan May rma...@gmail.com wrote:
 On Tue, Mar 30, 2010 at 3:16 PM, Friedrich Romstedt
 friedrichromst...@gmail.com wrote:

 x *= ((x = 23) | (x = 45))  .

 Interesting. In an ideal world, I'd love to see why exactly that is,
 because I don't think multiplication should be faster than a boolean
 op.

Branch prediction failures are really costly in modern CPUs.

http://en.wikipedia.org/wiki/Branch_prediction

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Friedrich Romstedt
2010/3/30 Ryan May rma...@gmail.com:
 On Tue, Mar 30, 2010 at 3:16 PM, Friedrich Romstedt
 friedrichromst...@gmail.com wrote:
 We recently found out that it executes faster using:

 x *= ((x = 23) | (x = 45))  .

 Interesting. In an ideal world, I'd love to see why exactly that is,
 because I don't think multiplication should be faster than a boolean
 op.  If you need speed, then by all means go for it.  But if you don't
 need speed I'd use the  since that will be more obvious to the person
 who ends up reading your code later and has to spend time decoding
 what that line does.

Hmm, I'm not familiar with numpy's internals, but I guess, it is
because numpy doesn't have to evaluate the indexing boolean array?
When the indexing array is applied to x via x[...] = 0?

Robert's reply just came in when writing this.

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Set values of a matrix within a specified range to zero

2010-03-30 Thread Ryan May
On Tue, Mar 30, 2010 at 3:40 PM, Robert Kern robert.k...@gmail.com wrote:
 On Tue, Mar 30, 2010 at 16:35, Ryan May rma...@gmail.com wrote:
 On Tue, Mar 30, 2010 at 3:16 PM, Friedrich Romstedt
 friedrichromst...@gmail.com wrote:

 x *= ((x = 23) | (x = 45))  .

 Interesting. In an ideal world, I'd love to see why exactly that is,
 because I don't think multiplication should be faster than a boolean
 op.

 Branch prediction failures are really costly in modern CPUs.

 http://en.wikipedia.org/wiki/Branch_prediction

That makes sense.

I still maintain that for 95% of code, easy to understand code is more
important than performance differences due to branch misprediction.
(And more importantly, we don't want to be teaching new users to code
like that from the beginning.)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion