Re: [Numpy-discussion] add xirr to numpy financial functions?

2009-05-25 Thread Pierre GM

On May 25, 2009, at 8:06 PM, josef.p...@gmail.com wrote:

 The problem is, if the functions are enhanced in the current numpy,
 then scikits.timeseries is not (yet) available.

 Mmh, I'm not following you here...

 The original question was how we can enhance numpy.financial, eg.  
 np.irr
 So we are restricted to use only what is available in numpy and in
 standard python.

Ah OK. But it seems that you're now running into a pb w/ dates  
handling, which might be a bit too specialized for numpy. Anyway, the  
call isn't mine.


 I looked at your moving functions, autocorrelation function and so on
 a while ago. That's were I learned how to use np.correlate or the
 scipy versions of it, and the filter functions. I've written the
 standard array versions for the moving functions and acf, ccf, in one
 of my experiments.

The moving functions were written in C and they work even w/  
timeseries (they work quite OK w/ pure MaskedArraysP. We put them in  
scikits.timeseries because it was easier to have them there than in  
scipy, for example.


 If Skipper has enough time in his google summer of code, we would like
 to include some basic timeseries econometrics (ARMA, VAR, ...?)
 however most likely only for regularly spaced data.

Well, we can easily restrict the functions to the case were there's no  
missing data nor missing dates. Checking the mask is easy, and we have  
a method to chek the dates (is_valid)


 Anyhow, if the pb you have are just to specify dates, I really think
 you should give the scikits a try. And send feedback, of course...

 Skipper intends to write some examples to show how to work with the
 extensions to scipy.stats, which, I think, will include examples using
 time series, besides recarrays, and other array types.


Dealing with TimeSeries is pretty much the same thing as dealing with  
MaskedArray, with the extra convenience of converting from one  
frequency to another and so forth Quite often, an analysis can be  
performed by dropping the .dates part,  working on the .series part  
(the underlying MaskedArray), and repatching the dates at the end...



 Is there a time line for including the timeseries scikits in numpy/ 
 scipy?
 With code that is intended for incorporation in numpy/scipy, we are
 restricted in our external dependencies.

I can't tell, because the decision is not mine. For what I understood,  
there could be an inclusion in scipy if there's a need for it. For  
that, we need more users end more feedback If you catch my drift...



 Josef





___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] List/location of consecutive integers

2009-05-22 Thread Pierre GM

On May 22, 2009, at 12:31 PM, Andrea Gavana wrote:

 Hi All,

this should be a very easy question but I am trying to make a
 script run as fast as possible, so please bear with me if the solution
 is easy and I just overlooked it.

 I have a list of integers, like this one:

 indices = [1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004]

 From this list, I would like to find out which values are consecutive
 and store them in another list of tuples (begin_consecutive,
 end_consecutive) or a simple list: as an example, the previous list
 will become:

 new_list = [(1, 9), (255, 258), (10001, 10004)]


Josef's and Chris's solutions are pretty neat in this case. I've been  
recently working on a more generic case where integers are grouped  
depending on some condition (equals, differing by 1 or 2...). A  
version in pure Python/numpy, the `Cluster` class is available in  
scikits.hydroclimpy.core.tools (hydroclimpy.sourceforge.net).  
Otherwise, here's a Cython version of the same class.
Let me know if it works. And I'm not ultra happy with the name, so if  
you have any suggestions...


cdef class Brackets:
 
 Groups consecutive data from an array according to a clustering  
condition.

 A cluster is defined as a group of consecutive values differing  
by at most the
 increment value.

 Missing values are **not** handled: the input sequence must  
therefore be free
 of missing values.

 Parameters
 --
 darray : ndarray
 Input data array to clusterize.
 increment : {float}, optional
 Increment between two consecutive values to group.
 By default, use a value of 1.
 operator : {function}, optional
 Comparison operator for the definition of clusters.
 By default, use :func:`numpy.less_equal`.


 Attributes
 --
 inishape
 Shape of the argument array (stored for resizing).
 inisize
 Size of the argument array.
 uniques : sequence
 List of unique cluster values, as they appear in  
chronological order.
 slices : sequence
 List of the slices corresponding to each cluster of data.
 starts : ndarray
 Lists of the indices at which the clusters start.
 ends : ndarray
 Lists of the indices at which the clusters end.
 clustered : list
 List of clustered data.


 Examples
 
  A = [0, 0, 1, 2, 2, 2, 3, 4, 3, 4, 4, 4]
  klust = cluster(A,0)
  [list(_) for _ in klust.clustered]
 [[0, 0], [1], [2, 2, 2], [3], [4], [3], [4, 4, 4]]
  klust.uniques
 array([0, 1, 2, 3, 4, 3, 4])

  x = [ 1.8, 1.3, 2.4, 1.2, 2.5, 3.9, 1. , 3.8, 4.2, 3.3,
 ...   1.2, 0.2, 0.9, 2.7, 2.4, 2.8, 2.7, 4.7, 4.2, 0.4]
  Brackets(x,1).starts
 array([ 0,  2,  3,  4,  5,  6,  7, 10, 11, 13, 17, 19])
  Brackets(x,1.5).starts
 array([ 0,  6,  7, 10, 13, 17, 19])
  Brackets(x,2.5).starts
 array([ 0,  6,  7, 19])
  Brackets(x,2.5,greater).starts
 array([ 0,  1,  2,  3,  4,  5,  8,  9, 10,
 ...11, 12, 13, 14, 15, 16, 17, 18])
  y = [ 0, -1, 0, 0, 0, 1, 1, -1, -1, -1, 1, 1, 0, 0, 0, 0, 1,  
1, 0, 0]
  Brackets(y,1).starts
 array([ 0,  1,  2,  5,  7, 10, 12, 16, 18])

 

 cdef readonly double increment
 cdef readonly np.ndarray data
 cdef readonly list _starts
 cdef readonly list _ends

 def __init__(Brackets self, object data, double increment=1,
  object operator=np.less_equal):
 
 
 cdef int i, n, ifirst, ilast, test
 cdef double last
 cdef list starts, ends
 #
 self.increment = increment
 self.data = np.asanyarray(data)
 data = np.asarray(data)
 #
 n = len(data)
 starts = []
 ends = []
 #
 last = data[0]
 ifirst = 0
 ilast = 0
 for 1 = i  n:
 test = operator(abs(data[i] - last), increment)
 ilast = i
 if not test:
 starts.append(ifirst)
 ends.append(ilast-1)
 ifirst = i
 last = data[i]
 starts.append(ifirst)
 ends.append(n-1)
 self._starts = starts
 self._ends = ends

 def __len__(self):
 return len(self.starts)

 property starts:
 #
 def __get__(Brackets self):
 return np.asarray(self._starts)

 property ends:
 #
 def __get__(Brackets self):
 return np.asarray(self._ends)

 property sizes:
 #
 def __get__(Brackets self):
 return np.asarray(self._ends) - np.asarray(self._firsts)


 property slices:
 #
 def __get__(Brackets self):
 cdef int i
 cdef list starts = self._starts, ends = self._ends
 cdef list slices = []
 for 0 = i  len(starts):
 

Re: [Numpy-discussion] skiprows option in loadtxt

2009-05-20 Thread Pierre GM

On May 20, 2009, at 11:04 AM, Nils Wagner wrote:

 Hi all,

 Is the value of skiprows in loadtxt restricted to values
 in [0-10] ?

 It doesn't work for skiprows=11.

Please post an example
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] view takes no keyword arguments exception

2009-05-20 Thread Pierre GM

On May 20, 2009, at 9:57 PM, Jochen Schroeder wrote:
 Sorry maybe I phrased my question wrongly.  I don't want to change  
 the code (This was just a short example).
 I just want to know why it is failing on his system and what he
 can do so that a.view(dtype='...') is working. I suspected it was an  
 old
 numpy installation but the person is saying that he installed a new
 version and is still seeing the same problem (or does he just have an
 old version of numpy floating around).

Likely to be the second possibiity, the ghost of a previous  
installation. AFAIR, the keywords in .view were introduced in 1.2 or  
just after.
A safe way to check would be to install numpy 1.3 in a virtualenv and  
check that it works. If it does (expected), then you may want to ask  
your user to start afresh (remove 1.1.1 and 1.3 and then reinstall 1.3  
from a clean slate).
My 2c.
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Are masked arrays slower for processing than ndarrays?

2009-05-13 Thread Pierre GM
All,
I just committed (r6994) some modifications to numpy.ma.getdata (Eric  
Firing's patch) and to the ufunc wrappers that were too slow with  
large arrays. We're roughly 3 times faster than we used to, but still  
slower than the equivalent classic ufuncs (no surprise here).

Here's the catch: it's basically cheating. I got rid of the pre- 
processing (where a mask was calculated depending on the domain and  
the input set to a filling value depending on this mask, before the  
actual computation). Instead, I  force  
np.seterr(divide='ignore',invalid='ignore') before calling the ufunc  
on the .data part, then mask the invalid values (if any) and reset the  
corresponding entries in .data to the input. Finally, I reset the  
error status. All in all, we're still data-friendly, meaning that the  
value below a masked entry is the same as the input, but we can't say  
that values initially masked are discarded (they're used in the  
computation but reset to their initial value)...

This playing around with the error status may (or may not, I don't  
know) cause some problems down the road.
It's still faaar faster than computing the domain (especially  
_DomainSafeDivide) when the inputs are large...
  I'd be happy if you could give it a try and send some feedback.

Cheers
P.

On May 9, 2009, at 8:17 PM, Eric Firing wrote:

 Eric Firing wrote:

 Pierre,

 ... I pressed send too soon.  There are test failures with the  
 patch I attached to my last message.  I think the basic ideas are  
 correct, but evidently there are wrinkles to be worked out.  Maybe  
 putmask() has to be used instead of where() (putmask is much faster)  
 to maintain the ability to do *= and similar, and maybe there are  
 other adjustments. Somehow, though, it should be possible to get  
 decent speed for simple multiplication and division; a 10x penalty  
 relative to ndarray operations is just too much.

 Eric


 Eli Bressert wrote:
 Hi,

 I'm using masked arrays to compute large-scale standard deviation,
 multiplication, gaussian, and weighted averages. At first I thought
 using the masked arrays would be a great way to sidestep looping
 (which it is), but it's still slower than expected. Here's a snippet
 of the code that I'm using it for.
 [...]
 # Like the spatial_weight section, this takes about 20 seconds
 W = spatial_weight / Rho2

 # Takes less than one second.
 Ave = np.average(av_good,axis=1,weights=W)

 Any ideas on why it would take such a long time for processing?
 A part of the slowdown is what looks to me like unnecessary copying  
 in _MaskedBinaryOperation.__call__.  It is using getdata, which  
 applies numpy.array to its input, forcing a copy.  I think the copy  
 is actually unintentional, in at least one sense, and possibly two:  
 first, because the default argument of getattr is always evaluated,  
 even if it is not needed; and second, because the call to np.array  
 is used where np.asarray or equivalent would suffice.
 The first file attached below shows the kernprof in the case of  
 multiplying two masked arrays, shape (10,50), with no masked  
 elements; 2/3 of the time is taken copying the data.
 Now, if there are actually masked elements in the arrays, it gets  
 much worse: see the second attachment.  The total time has  
 increased by more than a factor of 3, and the culprit is  
 numpy.which(), a very slow function.  It looks to me like it is  
 doing nothing useful at all; the numpy binary operation is still  
 being executed for all elements, regardless of mask, contrary to  
 the intention implied by the comment in the code.
 The third attached file has a patch that fixes the getdata problem  
 and eliminates the which().
 With this patch applied we get the profile in the 4th file, to be  
 compared to the second profile.  Much better.  I am pretty sure it  
 could still be sped up quite a bit, though.  It looks like the  
 masks are essentially being calculated twice for no good reason,  
 but I don't completely understand all the mask considerations, so  
 at this point I am not trying to fix that problem.
 Eric
 Especially the spatial_weight and W variables? Would there be a  
 faster
 way to do this? Or is there a way that numpy.std can process ignore
 nan's when processing?

 Thanks,

 Eli Bressert
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Are masked arrays slower for processing than ndarrays?

2009-05-13 Thread Pierre GM

On May 13, 2009, at 7:36 PM, Matt Knox wrote:

 Here's the catch: it's basically cheating. I got rid of the pre-
 processing (where a mask was calculated depending on the domain and
 the input set to a filling value depending on this mask, before the
 actual computation). Instead, I  force
 np.seterr(divide='ignore',invalid='ignore') before calling the ufunc

 This isn't a thread safe approach and could cause wierd side effects  
 in a
 multi-threaded application. I think modifying global options/ 
 variables inside
 any function where it generally wouldn't be expected by the user is  
 a bad idea.

Whine. I was afraid of something like that...
2 options, then:
* We revert to computing a mask beforehand. That looks like the part  
that takes the most time w/ domained operations (according to Robert  
K's profiler. Robert, you deserve a statue for this tool). And that  
doesn't solve the pb of power, anyway: how do you compute the domain  
of power ?
* We reimplement masked versions of the ufuncs in C. Won't happen from  
me anytime soon (this fall or winter, maybe...)
Also, importing numpy.ma currently calls numpy.seterr(all='ignore')  
anyway...

So that's a -1 from Matt. Anybody else ?

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Are masked arrays slower for processing than ndarrays?

2009-05-13 Thread Pierre GM

On May 13, 2009, at 8:07 PM, Matt Knox wrote:

 hmm. While this doesn't affect me personally... I wonder if everyone  
 is aware of
 this. Importing modules generally shouldn't have side effects either  
 I would
 think. Has this always been the case for the masked array module?

Well, can't remember, actually... I was indeed surprised to see it was  
there. I guess I must have added when working on the power section. I  
will get of rid on the next commit, this is clearly bad practice from  
my part. Bad, bad Pierre.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to merge or SQL join record arrays in Python?

2009-05-11 Thread Pierre GM

On May 11, 2009, at 5:44 PM, Wei Su wrote:

 Coming from SAS and R, this is probably the first thing I want to do  
 now that I can convert my data into record arrays. But I could not  
 find any clues after googling for a while. Any hint or suggestions  
 will be great!

That depends what you want, actually, ut this should get you started
http://docs.scipy.org/doc/numpy/user/basics.rec.html

Note the slight difference between a structured array (fields  
accessible as items) and a recarray (fields accessible as items and  
attributes). 
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to merge or SQL join record arrays in Python?

2009-05-11 Thread Pierre GM

On May 11, 2009, at 6:18 PM, Wei Su wrote:

 Thanks for the reply. I can now actually turn a big list into a  
 record array. My question is actually how to join related record  
 arrays in Python.. This is done in SAS by MERGE and PROC SQL and by  
 merge() in R. But I have no idea how to do it in Python.

OK. Try numpy.lib.recfunctions.join_by, and let me know if you have  
any problem. It's a rewritten version of an equivalent function in  
matplotlib (matplotlib.mlab.rec_join), that should work (maybe not,  
there hasn't been enough testing feedback to judge...)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to merge or SQL join record arrays in Python?

2009-05-11 Thread Pierre GM

On May 11, 2009, at 6:36 PM, Skipper Seabold wrote:

 On Mon, May 11, 2009 at 6:18 PM, Wei Su taste_o...@yahoo.com wrote:

 Hi, Pierre:

 Thanks for the reply. I can now actually turn a big list into a  
 record
 array. My question is actually how to join related record arrays in  
 Python..
 This is done in SAS by MERGE and PROC SQL and by merge() in R. But  
 I have no
 idea how to do it in Python.

 Thanks.

 Wei Su


 Does merge_arrays in numpy.lib.recfunctions do what you want?

Probably not. merge_arrays is close to concatenate, and will raise an  
exception if 2 fields have the same name (in the flattened version).  
Testing R's merge(), join_by looks like the corresponding function.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Are masked arrays slower for processing than ndarrays?

2009-05-09 Thread Pierre GM
Short answer to the subject: Oh yes.
Basically, MaskedArrays in its current implementation is more of a  
convenience class than anything. Most of the functions manipulating  
masked arrays create a lot of temporaries. When performance is needed,  
I must advise you to work directly on the data and the mask.

For example, let's examine the division of 2 MaskedArrays a  b.
* We take the 2 ndarrays of data (da and db) and the 2 ndarrays of  
mask (ma and mb)
* we create a new array for db using np.where, putting 1 where db==0  
and keeping db otherwise (if we were not doing that, we would get some  
NaNs down the road)
* we create a new mask by combining ma and mb
* we create the result array using np.where, using da where m is True,  
da/db otherwise (if we were not doing that, we would be processing the  
masked data and we may not want that)
* Then, we add the mask to the result array.

I suspect that the np.where functions are sub-optimal, and there might  
be a smarter way to achieve the same result while keeping all the  
functionalities (no NaNs (even masked) in the result, data kept when  
it should). I agree that these functionalities might be a bit overkill  
in simpler cases, such as yours. You may then want to use something like

  ma.masked_array(a.data/b.data, mask=(a.mask | b.mask | (b.data==0))

Using Eric's example, I have 229ms/loop when dividing 2 ndarrays,  
2.83s/loop when dividing 2 masked arrays, and down to 493ms/loop when  
using the quick-and-dirty function above). So anyway, you'll still be  
slower using MA than ndarrays, but not as slow...





On May 9, 2009, at 5:22 PM, Eli Bressert wrote:

 Hi,

 I'm using masked arrays to compute large-scale standard deviation,
 multiplication, gaussian, and weighted averages. At first I thought
 using the masked arrays would be a great way to sidestep looping
 (which it is), but it's still slower than expected. Here's a snippet
 of the code that I'm using it for.

 # Computing nearest neighbor distances.
 # Output will be about 270,000 rows long for the index
 # and 270,000x50 for the dist array.
 tree = ann.kd_tree(np.column_stack([l,b]))
 index, dist = tree.search(np.column_stack([l,b]),k=nth)

 # Clipping bad values by replacing them acceptable values
 av[np.where(av=-10)] = -10
 av[np.where(av=50)] = 50

 # Distance clipping and creating mask
 dist_arcsec = np.sqrt(dist)*3600
 mask = dist_arcsec = d_thresh

 # Creating masked array
 av_good = ma.array(av[index],mask=mask)
 dist_good = ma.array(dist_arcsec,mask=mask)

 # Reason why I'm using masked arrays. If these were
 # ndarrays with nan's, then the output would be nan.
 Std = np.array(np.std(av_good,axis=1))
 Var = Std*Std

 Rho = np.zeros( (len(av), nth) )
 Rho2  = np.zeros( (len(av), nth) )

 dist_std = np.std(dist_good,axis=1)

 for j in range(nth):
 Rho[:,j] = dist_std
 Rho2[:,j] = Var

 # This part takes about 20 seconds to compute for a 270,000x50  
 masked array.
 # Using ndarrays of the same size takes about 2 second
 spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp( - dist_good /
 (2*Rho**2))

 # Like the spatial_weight section, this takes about 20 seconds
 W = spatial_weight / Rho2

 # Takes less than one second.
 Ave = np.average(av_good,axis=1,weights=W)

 Any ideas on why it would take such a long time for processing?
 Especially the spatial_weight and W variables? Would there be a faster
 way to do this? Or is there a way that numpy.std can process ignore
 nan's when processing?

 Thanks,

 Eli Bressert
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Are masked arrays slower for processing than ndarrays?

2009-05-09 Thread Pierre GM

On May 9, 2009, at 8:17 PM, Eric Firing wrote:

 Eric Firing wrote:


 A part of the slowdown is what looks to me like unnecessary copying  
 in _MaskedBinaryOperation.__call__.  It is using getdata, which  
 applies numpy.array to its input, forcing a copy.  I think the copy  
 is actually unintentional, in at least one sense, and possibly two:  
 first, because the default argument of getattr is always evaluated,  
 even if it is not needed; and second, because the call to np.array  
 is used where np.asarray or equivalent would suffice.

Yep, good call. the try/except should be better, and yes, I forgot to  
force copy=False (thought it was on by default...). I didn't know that  
getattr always evaluated the default, the docs are scarce on that  
subject...

 Pierre,

 ... I pressed send too soon.  There are test failures with the  
 patch I attached to my last message.  I think the basic ideas are  
 correct, but evidently there are wrinkles to be worked out.  Maybe  
 putmask() has to be used instead of where() (putmask is much faster)  
 to maintain the ability to do *= and similar, and maybe there are  
 other adjustments. Somehow, though, it should be possible to get  
 decent speed for simple multiplication and division; a 10x penalty  
 relative to ndarray operations is just too much.

Quite agreed. It was a shock to realize that we were that slow. I  
gonna have to start testing w/ large arrays...

I'm confident we can significantly speed up the _MaskedOperations  
without losing any of the features. Yes, putmask may be a better  
option. We could probably use the following MO:
* result = a.data/b.data
* putmask(result, m, a)

However, I gonna need a good couple of weeks before being able to  
really look into it...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to download data directly from SQL into NumPy as a record array or structured array.

2009-05-05 Thread Pierre GM

On May 5, 2009, at 2:42 PM, Wei Su wrote:


 Hi, Everyone:

 This is what I need to do everyday. Now I have to first save data  
 as .csv file and the use csv2rec() to read the data as a record  
 array. Anybody can give me some advice on how to directly get the  
 data as record arrays? It will save me tons of time.

Wei,
Have a look to numpi.lib.io.genfromtxt, that should give you some ideas.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] after building from source, how to register numpy with synaptic?

2009-04-25 Thread Pierre GM

On Apr 25, 2009, at 5:36 AM, Gael Varoquaux wrote:

 On Fri, Apr 24, 2009 at 10:11:07PM -0400, Chris Colbert wrote:
   Like the subject says, is there a way to register numpy with  
 synaptic
   after building numpy from source?

 Don't play with the system's packaging system unless you really know  
 what
 you are doing.

 Just install the numpy you are building outside of /usr/lib/... (you
 should never be installing home-build stuff in there).


One link:
http://www.doughellmann.com/projects/virtualenvwrapper/

I became a fan of virtualenvs, which lets you install different  
packages (not always compatible) without messing up the system's  
Python. Quite useful for tests and/or having multiple numpy versions  
in parallel.







 For instance
 install it in /usr/local:

sudo python setup.py install --prefix /usr/local

 Now it will override the system's numpy. So you can install  
 matplotlib,
 which will drag along the system's numpy, but you won't see it.

 On a side note, I tend to install home-built packages that overide  
 system
 packages only in my home. I have a $HOME/usr directory, with a small
 directory hierarchy (usr/lib, usr/bin, ...), it is added in my PATH  
 and
 PYTHONPATH, and I install there.

 Gaël
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread Pierre GM

On Apr 22, 2009, at 5:21 PM, Gökhan SEVER wrote:

 Hello,

 Could you please give me some hints about how to mask an array using  
 another arrays like in the following example.

What about that ?
numpy.logical_or.reduce([a==i for i in b])


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread Pierre GM

On Apr 22, 2009, at 9:03 PM, josef.p...@gmail.com wrote:

 I prefer broad casting to list comprehension in numpy:

Pretty neat! I still dont have the broadcasting reflex. Now, any idea  
which one is more efficient in terms of speed? in terms of temporaries?

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dtype field renaming

2009-04-08 Thread Pierre GM

On Apr 8, 2009, at 5:57 PM, Elaine Angelino wrote:

 hi there --

 for a numpy.recarray, is it possible to rename the fields in the  
 dtype?

Take a new view:
  a = np.array([(1,1)],dtype=[('a',int),('b',int)])
  b = a.view([(A,int), ('b', int)])

or:

use numpy.lib.recfunctions.rename_fields
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dtype field renaming

2009-04-08 Thread Pierre GM

On Apr 8, 2009, at 6:18 PM, Stéfan van der Walt wrote:

 2009/4/9 Pierre GM pgmdevl...@gmail.com:
 for a numpy.recarray, is it possible to rename the fields in the
 dtype?

 Take a new view:
   a = np.array([(1,1)],dtype=[('a',int),('b',int)])
   b = a.view([(A,int), ('b', int)])

 or:

 use numpy.lib.recfunctions.rename_fields

 Or change the names tuple:

 a.dtype.names = ('c', 'd')

Now that's wicked neat trick ! I love it ! Faster than taking a view  
for sure.
Note that rename_fields should work also w/ nested fields (not that  
common, true).
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Construct symmetric matrix

2009-04-04 Thread Pierre GM
All,
I'm trying to build a relatively complicated symmetric matrix. I can  
build the upper-right block without pb. What is the fastest way to get  
the LL corner from the UR one ?
Thanks a lot in advance for any idea.
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] trouble building docs with sphinx-0.6.1

2009-04-01 Thread Pierre GM

On Apr 1, 2009, at 11:57 AM, David Cournapeau wrote:

 preparing documents... done
 Exception occurred:  0%] contents
  File /usr/lib64/python2.6/site-packages/docutils/nodes.py, line
 471, in __getitem__
return self.attributes[key]
 KeyError: 'entries'
 The full traceback has been saved in /tmp/sphinx-err-RDe0NL.log, if
 you want to report the issue to the author.
 Please also report this if it was a user error, so that a better  
 error
 message can be provided next time.
 Send reports to sphinx-...@googlegroups.com
 mailto:sphinx-...@googlegroups.com. Thanks!

 This often happens for non-clean build. The only solution I got so far
 was to start the doc build from scratch...

David, won't work here, there's a bug indeed.
Part of it comes from numpydoc, that isn't completely compatible w/  
Sphin-0.6.1. In particular, the code doesn't know what to do w/ this  
'entries' parameter.
part of it comes from Sphinx. Georg said he made the 'entries'  
parameter optional, but that doesn't solve everything. Matt Knox  
actually came across the 'best' solution
Edit Sphinx/environment.py, L1051:
replace
refs = [(e[0], str(e[1])) for e in toctreenode['entries'])]
by
refs = [(e[0], str(e[1])) for e in toctreenode.get('entries', [])]



 Has anyone else tried building numpy's docs with sphinx-0.6.1? Is
 there any interest in sorting these issues out before 1.3 is  
 released?

 I am afraid it is too late for the 1.3 release,

 cheers,

 David
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Summer of Code: Proposal for Implementing date/time types in NumPy

2009-03-25 Thread Pierre GM
Ciao Marty,
Great idea indeed !  However, I'd really like to have an easy way to  
plug the suggested dtype w/ the existing Date class from the  
scikits.timeseries package (Date is implemented in C, you can find the  
sources through the link on http://pytseries.sourceforge.net). I agree  
that this particular aspect is not a priority, but it'd be nice to  
keep it in a corner of the mind.
In any case, keep me in the loop.
Cheers,
P.


On Mar 25, 2009, at 12:06 PM, Francesc Alted wrote:

 Hello Marty,

 A Tuesday 24 March 2009, Marty Fuhry escrigué:
 Hello,

 Sorry for any overlap, as I've been referred here from the scipi-dev
 mailing list.
 I was reading through the Summer of Code ideas and I'm terribly
 interested in date/time proposal
 (http://projects.scipy.org/numpy/browser/trunk/doc/neps/datetime-prop
 osal3.rst). I would love to work on this for a Google Summer of Code
 project. I'm a sophmore studying Computer Science and Mathematics at
 Kent State University in Ohio, so this project directly relates to my
 studies. Is there anyone looking into this proposal yet?

 To my knowledge, nobody is actively working on this anymore.  As a
 matter of fact, during the discussions that led to the proposal, many
 people showed a real interested on the implementation of data/time
 types in NumPy.  So it would be great if you can have a stab on this.

 Luck!

 -- 
 Francesc Alted
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Help on subclassing numpy.ma: __array_wrap__

2009-03-03 Thread Pierre GM
Kevin,
Sorry for the delayed answer.

 (a) Is MA intended to be subclassed?


Yes, that's actually the reason why the class was rewritten, to  
simplify subclassing. As Josef suggested, you can check the  
scikits.timeseries package that makes an extensive use of MaskedArray  
as baseclass.


 (b) If so, perhaps I'm missing something to make this work.  Any  
 pointers will be appreciated.


As you've run a debugger on your sources, you must have noticed the  
calls to MaskedArray._update_from. In your case, the simplest is to  
define DTMA._update_from as such:
_
 def _update_from(self, obj):
 ma.MaskedArray._update_from(self, obj)
 self._attr = getattr(obj, '_attr', {'EmptyDict':[]})
_

Now, because MaskedArray.__array_wrap__() itself calls _update_from,  
you don't actually need a specific DTMA.__array_wrap__ (unless you  
have some specific operations to perform, but it doesn't seem to be  
the case).

Now for a word of explanation:
__array_wrap__ is intended to transform the output of a numpy function  
to an object of your class. When we use the numpy.ma functions, we  
don't need that, we just need to retrieve some of the attributes of  
the initial MA. That's why _update_from was introduced.
Of course, I'm to blame not to have make that aspect explicit in the  
doc. I gonna try to correct that.
In any case, let me know how it goes.
P.



On Mar 1, 2009, at 10:37 AM, Kevin Dunn wrote:

 Hi everyone,

 I'm subclassing Numpy's MaskedArray to create a data class that  
 handles missing data, but adds some extra info I need to carrry  
 around. However I've been having problems keeping this extra info  
 attached to the subclass instances after performing operations on  
 them.
 The bare-bones script that I've copied here shows the basic issue: 
 http://pastebin.com/f69b979b8 
   There are 2 classes: one where I am able to subclass numpy (with  
 help from the great description at http://www.scipy.org/Subclasses),  
 and the other where I subclass numpy.ma, using the same ideas again.

 When stepping through the code in a debugger, lines 76 to 96, I can  
 see that the numpy subclass, called DT, calls DT.__array_wrap__()  
 after it completes unary and binary operations. But the numpy.ma  
 subclass, called DTMA, does not seem to call DTMA.__array_wrap__(),  
 especially line 111.

 Just to test this idea, I overrode the __mul__ function in my DTMA  
 subclass to call DTMA.__array_wrap__() and it returns my extra  
 attributes, in the same way that Numpy did.

 My questions are:

 (b) If so, perhaps I'm missing something to make this work.  Any  
 pointers will be appreciated.

 So far it seems the only way for me to sub-class numpy.ma is to  
 override all numpy.ma functions of interest for my class and add a  
 DTMA.__array_wrap() call to the end of them.  Hopefully there is an  
 easier way.
 Related to this question, was there are particular outcome from this  
 archived discussion (I only joined the list recently): 
 http://article.gmane.org/gmane.comp.python.numeric.general/24315 
   because that dictionary object would be exactly what I'm after here.
 Thanks,

 Kevin



 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed schedule for numpy 1.3.0

2009-03-03 Thread Pierre GM
David,
 I also started to update the release notes:

 http://projects.scipy.org/scipy/numpy/browser/trunk/doc/release/1.3.0-notes.rst

I get a 404.

Anyhow, on the ma side:
* structured arrays should now be fully supported by MaskedArray
   (r6463, r6324, r6305, r6300, r6294...)
* Minor bug fixes (r6356, r6352, r6335, r6299, r6298)
* Improved support for __iter__ (r6326)
* made baseclass, sharedmask and hardmask accesible to the user (but  
read-only)

+ doc update
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] problem with assigning to recarrays

2009-02-27 Thread Pierre GM
As a follow-up to Robert's answer:

  r[r.field1 == 1].field2 = 1
doesn't work, but

 r.field2[r.field1==1] = 1
does.


 So far, so good.
 Now I want to change the value of field2 for those same elements:

   In [128]: r[where(r.field1 == 1.)].field2 = 1







 Ok, so now the values of field 2 have been changed, for those elements
 right?

   In [129]: r.field2

   Out[129]: array([ 0.,  0.,  0.,  0.,  0.])

 Wait.  What?
 That can't be right.  Let's check again:

   In [130]: print r[where(r.field1 == 1.)].field2
   [ 0. 0.]

 Ok, so it appears that I can *access* fields in this array with an
 array of indices, but I can't assign new values to fields so
 accessed.  However, I *can* change the values if I use a scalar
 index.  This is different from the behavior of ordinary arrays, for
 which I can reassign elements' values either way.

 Moreover, when I try to reassign record array fields by indexing with
 an array of indices, it would appear that nothing at all happens.
 This syntax is equivalent to the pass command.

 So, my question is this:  is there some reason for this behavior in
 record arrays, which is unexpectedly different from the behavior of
 normal arrays, and rather confusing.   If so, why does the attempt to
 assign values to fields of an indexed subarray not raise some kind of
 error, rather than doing nothing?  I think it's unlikely that I've
 actually found a bug in numpy, but this behavior does not make sense
 to me.


 Thanks for any insights,

 Brian



 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] possible bug: __array_wrap__ is not called during arithmetic operations in some cases

2009-02-22 Thread Pierre GM

On Feb 22, 2009, at 6:21 PM, Eric Firing wrote:

 Darren Dale wrote:
 Does anyone know why __array_wrap__ is not called for subclasses  
 during
 arithmetic operations where an iterable like a list or tuple  
 appears to
 the right of the subclass? When I do mine*[1,2,3], array_wrap is  
 not
 called and I get an ndarray instead of a MyArray. [1,2,3]*mine is
 fine, as is mine*array([1,2,3]). I see the same issue with  
 division,

 The masked array subclass does not show this behavior:

Because MaskedArray.__mul__ and others are redefined.

Darren, you can fix your problem by redefining MyArray.__mul__ as:

 def __mul__(self, other):
 return np.ndarray.__mul__(self, np.asanyarray(other))

forcing the second term to be a ndarray (or a subclass of). You can do  
the same thing for the other functions (__add__, __radd__, ...)
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Filling gaps

2009-02-12 Thread Pierre GM

On Feb 12, 2009, at 8:22 PM, A B wrote:

 Hi,
 Are there any routines to fill in the gaps in an array. The simplest
 would be by carrying the last known observation forward.
 0,0,10,8,0,0,7,0
 0,0,10,8,8,8,7,7
 Or by somehow interpolating the missing values based on the previous
 and next known observations (mean).
 Thanks.


The functions `forward_fill` and `backward_fill` in scikits.timeseries  
should do what you want. They work also on MaskedArray objects,  
meaning that you don't need to have actual series.
The catch is that you need to install scikits.timeseries, of course.  
More info here:http://pytseries.sourceforge.net/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt: dtype=None and unpack=True

2009-02-11 Thread Pierre GM

On Feb 11, 2009, at 11:38 PM, Ryan May wrote:

 Pierre,

 I noticed that using dtype=None with a heterogeneous set of data,  
 trying to use unpack=True to get the columns into separate arrays  
 (instead of a structured array) doesn't work.  I've attached a patch  
 that, in the case of dtype=None, unpacks the fields in the final  
 array into a list of separate arrays.  Does this seem like a good  
 idea to you?

Nope, as it breaks consistency: depending on some input parameters,  
you either get an array or a list. I think it's better to leave it as  
it is, maybe adding an extra line in the doc precising that  
unpack=True doesn't do anything for structured arrays. 
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ERROR: Test flat on masked_matrices

2009-02-07 Thread Pierre GM

On Feb 7, 2009, at 8:03 AM, Nils Wagner wrote:


 ==
 ERROR: Test flat on masked_matrices
 --
 Traceback (most recent call last):
   File
 /usr/local/lib64/python2.5/site-packages/numpy/ma/tests/ 
 test_core.py,
 line 1127, in test_flat
 test = ma.array(np.matrix([[1, 2, 3]]), mask=[0, 0,
 1])
 NameError: global name 'ma' is not defined

Oops, sorry about that...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about ufuncs

2009-02-06 Thread Pierre GM

On Feb 6, 2009, at 4:25 PM, Darren Dale wrote:


 I've been looking at how ma implements things like multiply() and  
 MaskedArray.__mul__. I'm surprised that MaskedArray.__mul__ actually  
 calls ma.multiply() rather than calling  
 super(MaskedArray,self).__mul__().

There's some under-the-hood machinery to deal with the data, and we  
need to be able to manipulate it *before* the operation takes place.  
The super() approach calls __array_wrap__ on the result, so *after*  
the operation took place, and that's not what we wanted...

 Maybe that is the way ndarray does it, but I don't think this is the  
 right approach for my quantity subclasses. If I want to make a  
 MaskedQuantity (someday), MaskedQuantity.__mul__ should be calling  
 super(MaskedQuantity,self).__mul__(), not reimplementations of  
 numpy.multiply or ma.multiply, right?

You'll end up calling ma.multiply anyway  
(super(MaskedQuantity,self).__mul__ will call MaskedArray.__mul__  
which calls ma.multiply... So yes, I think you can stick to the  
super() approach in your case


 There are some cases where the default numpy function expects  
 certain units on the way in, like the trig functions, which I think  
 would have to be reimplemented.

And you can probably define a generic class to deal with that instead  
of reimplementing the functions individually (and we're back to the  
initial advice).


 But aside from that, is there anything wrong with taking this  
 approach? It seems to allow quantities to integrate pretty well with  
 the numpy builtins.

Go and try, the problems (if any) will show up...

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-05 Thread Pierre GM

On Feb 5, 2009, at 6:08 PM, Travis E. Oliphant wrote:


 Hi all,

 I've been fairly quiet on this list for awhile due to work and family
 schedule, but I think about how things can improve regularly.One
 feature that's been requested by a few people is the ability to select
 multiple fields from a structured array.


 [...]

+1 for #2.

Note that we now have a drop_fields function in np.lib.recfunctions, a  
reimplementation of the equivalent function in matplotlib. It works  
along the lines of your proposition #1 (create a new array w/ a new  
dtype and fill it)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] poly1d left versus right multiplication with np numbers

2009-02-04 Thread Pierre GM

On Feb 4, 2009, at 11:00 AM, josef.p...@gmail.com wrote:

 I just had a hard to find bug in my program. poly1d  treats numpy
 scalars differently than python numbers when left or right
 multiplication is used.

 Essentially, if the first term is the numpy scalar, multiplied by a
 polynomial, then the result is an np.array.
 If the order is reversed, then the result is an instance of np.poly1d.
 The return types are also the same for numpy arrays, which is at least
 understandable, although a warning would be good)

 When using plain (python) numbers, then both left and right
 multiplication of the number with the polynomial returns a polynomial.

 Is this a bug or a feature? I didn't see it mentioned in the docs.

Looks like yet another example of ticket #826:
http://scipy.org/scipy/numpy/ticket/826
This one is getting quite a problem, and I have no idea how to fix  
it... 
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ImportError: No module named dateutil.parser

2009-02-04 Thread Pierre GM

On Feb 4, 2009, at 3:56 PM, Robert Kern wrote:


 No, rewrite the test to not use external libraries, please. Test the
 functionality without needing dateutils.

OK then, should be fixed in r6340.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Renaming a field of an object array

2009-02-04 Thread Pierre GM
All,
I'm a tad puzzled by the following behavior (I'm trying to correct a  
bug in genfromtxt):

I'm creating an empty structured ndarray, using np.object as dtype.

  a = np.empty(1,dtype=[('',np.object)])
array([(None,)],
   dtype=[('f0', '|O4')])

Now, I'd like to rename the field:
  a.view([('NAME',np.object)])
TypeError: Cannot change data-type for object array.

I understand why I can't change the *type* of the field, but not why I  
can't change its name that way. What would be an option that wouldn't  
involve creating a new array ?
Thx in advance.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt view with object dtype

2009-02-04 Thread Pierre GM
OK, Brent, try r6341.
I fixed genfromtxt for cases like yours (explicit dtype involving a  
np.object).
Note that the fix won't work if the dtype is nested and involves  
np.objects (as we would hit the pb of renaming fields we observed...).
Let me know how it goes.
P.

On Feb 4, 2009, at 4:03 PM, Brent Pedersen wrote:

 On Wed, Feb 4, 2009 at 9:36 AM, Pierre GM pgmdevl...@gmail.com  
 wrote:

 On Feb 4, 2009, at 12:09 PM, Brent Pedersen wrote:

 hi, i am using genfromtxt, with a dtype like this:
 [('seqid', '|S24'), ('source', '|S16'), ('type', '|S16'), ('start',
 'i4'), ('end', 'i4'), ('score', 'f8'), ('strand', '|S1'),  
 ('phase',
 'i4'), ('attrs', '|O4')]

 Brent,
 Please post a simple, self-contained example with a few lines of the
 file you want to load.

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 hi pierre, here is an example.
 thanks,
 -brent

 ##

 import numpy as np
 from cStringIO import StringIO

 gffstr = \
 ##gff-version 3
 1\tucb\tgene\t2234602\t2234702\t.\t-\t. 
 \tID 
 = 
 grape_1_2234602_2234702 
 ;match 
 = 
 EVM_prediction_supercontig_1.248,EVM_prediction_supercontig_1.248.mRNA
 1\tucb\tgene\t2300292\t2302123\t.\t+\t. 
 \tID=grape_1_2300292_2302123;match=EVM_prediction_supercontig_244.8
 1\tucb\tgene\t2303615\t2303967\t.\t+\t. 
 \tID=grape_1_2303615_2303967;match=EVM_prediction_supercontig_244.8
 1\tucb\tgene\t2303616\t2303966\t.\t+\t. 
 \tParent=grape_1_2303615_2303967
 1\tucb\tgene\t3596400\t3596503\t.\t-\t. 
 \tID=grape_1_3596400_3596503;match=evm.TU.supercontig_167.27
 1\tucb\tgene\t3600651\t3600977\t.\t-\t. 
 \tmatch=evm.model.supercontig_1217.1,evm.model.supercontig_1217.1.mRNA
 

 dtype = {'names' :
  ('seqid', 'source', 'type', 'start', 'end',
'score', 'strand', 'phase', 'attrs') ,
'formats':
  ['S24', 'S16', 'S16', 'i4', 'i4', 'f8',
  'S1', 'i4', 'S128']}

 #OK with S128 for attrs
 print np.genfromtxt(StringIO(gffstr), dtype = dtype)



 def _attr(kvstr):
pairs = [kv.split(=) for kv in kvstr.split(;)]
return dict(pairs)

 # change S128 to object to have col attrs as dictionary
 dtype['formats'][-1] = 'O'
 converters = {8: _attr }
 #NOT OK
 print np.genfromtxt(StringIO(gffstr), dtype = dtype,  
 converters=converters)
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy 1.3 release date ?

2009-02-03 Thread Pierre GM
All,
When can we expect numpy 1.3 to be released ?
Sincerely,
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt question

2009-02-03 Thread Pierre GM

On Feb 3, 2009, at 11:24 AM, Ryan May wrote:

 Pierre,

 Should the following work?

 import numpy as np
 from StringIO import StringIO

 converter = {'date':lambda s: datetime.strptime(s,'%Y-%m-%d %H:%M: 
 %SZ')}
 data = np.ndfromtxt(StringIO('2009-02-03 12:00:00Z,72214.0'),  
 delimiter=',',
 names=['date','stid'], dtype=None, converters=converter)

Well, yes, it should work. That's indeed a problem with the  
getsubdtype method of the converter.
The problem is that we need to estimate the datatype of the output of  
the converter. In most cases, trying to convert '0' works properly,  
not in yours however. In r6338, I force the type to object if  
converting '0' does not work. That's a patch till the next corner  
case...

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Operations on masked items

2009-02-03 Thread Pierre GM

On Feb 3, 2009, at 4:00 PM, Ryan May wrote:

 Well, I guess I hit send too soon.  Here's one easy solution  
 (consistent with
 what you did for __radd__), change the code for __rmul__ to do:

   return multiply(self, other)

 instead of:

   return multiply(other, self)

 That fixes it for me, and I don't see how it would break anything.

Good call, but once again: Thou shalt not put trust in ye masked  
values [1].

  a = np.ma.array([1,2,3],mask=[0,1,0])
  b = np.ma.array([10, 20, 30], mask=[0,1,0])
  (a*b).data
array([10,  2, 90])
  (b*a).data
array([10, 20, 90])

So yes, __mul__ is not commutative when you deal w/ masked arrays (at  
least, when you try to access the data under a mask). Nothing I can  
do. Remember that preventing the underlying data to be modified is  
NEVER guaranteed...

[1] Epistle of Paul (Dubois).
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about ufuncs

2009-02-01 Thread Pierre GM

On Feb 1, 2009, at 6:32 PM, Darren Dale wrote:


 Is there an analog to __array_wrap__ for preprocessing arrays on  
 their way *into* a ufunc? For example, it would be nice if one could  
 do something like:

 numpy.sin([1,2,3]*arcseconds)

 where we have the opportunity to inspect the context, convert the  
 Quantity to units of radians, and then actually call the ufunc. Is  
 this possible, or does one have to reimplement such functions?

Just an idea: look at the code for numpy.ma ufuncs (in numpy.ma.core).  
By defining a few classes for unary, binary and domained functions,  
you could probably do what you want, without having to recode all the  
functions by hand.
Another idea would be to define some specific __mul__ or __rmul__  
rules for your units, so that the list would be transformed into a  
UnitArray...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] puzzle: generate index with many ranges

2009-01-30 Thread Pierre GM

On Jan 30, 2009, at 1:11 PM, Raik Gruenberg wrote:


 Mhm, I got this far. But how do I get from here to a single index  
 array

 [ 4, 5, 6, ... 10, 0, 1, 2, 3, 11, 12, 13, 14 ] ?

np.concatenate([np.arange(aa,bb) for (aa,bb) in zip(a,b)])
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] puzzle: generate index with many ranges

2009-01-30 Thread Pierre GM

On Jan 30, 2009, at 1:53 PM, Raik Gruenberg wrote:

 Pierre GM wrote:
 On Jan 30, 2009, at 1:11 PM, Raik Gruenberg wrote:

 Mhm, I got this far. But how do I get from here to a single index
 array

 [ 4, 5, 6, ... 10, 0, 1, 2, 3, 11, 12, 13, 14 ] ?

 np.concatenate([np.arange(aa,bb) for (aa,bb) in zip(a,b)])

 exactly! Now, the question was, is there a way to do this only using  
 numpy
 functions (sum, repeat, ...), that means without any python for  
 loop?

Can't really see it right now. Make np.arange(max(b)) and take the  
slices you need ? But you still have to look in 2 arrays to find the  
beginning and end of slices, so...


 Sorry about being so insistent on this one but, in my experience,  
 eliminating
 those for loops makes a huge difference in terms of speed. The zip  
 is probably
 also quite costly on a very large data set.

yeah, but it's in a list comprehension, which may make things a tad  
faster. If you prefer, use itertools.izip instead of zip, but I wonder  
where the advantage would be. Anyway, are you sure this particular  
part is your bottleneck ? You know the saying about premature  
optimization...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Documentation: objects.inv ?

2009-01-29 Thread Pierre GM

On Jan 29, 2009, at 3:17 AM, Pauli Virtanen wrote:

 Thu, 29 Jan 2009 00:28:46 -0500, Pierre GM wrote:
 Is there an objects.inv lying around for the numpy reference guide,  
 or
 should I start one from scratch ?

 It's automatically generated by Sphinx, and can be found at

   http://docs.scipy.org/doc/numpy/objects.inv

 Let's make the promise that it shall be found there in the future,  
 too.


Got it, thanks a lot.
Pauli, how often is the documentation on docs.scipy.org updated from  
SVN ?
Thx again
P.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] optimise operation in array with datetime objects

2009-01-28 Thread Pierre GM

On Jan 28, 2009, at 3:56 PM, Timmie wrote:
 ### this is the loop I would like to optimize:
 ### looping over arrays is considered inefficient.
 ### what could be a better way?
 hours_array = dates_array.copy()
 for i in range(0, dates_array.size):
   hours_array[i] = dates_array[i].hour

You could try:
np.fromiter((_.hour for _ in dates_li), dtype=np.int)
or
np.array([_.hour for _ in dates_li], dtype=np.int)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] optimise operation in array with datetime objects

2009-01-28 Thread Pierre GM

On Jan 28, 2009, at 5:43 PM, Timmie wrote:

 You could try:
 np.fromiter((_.hour for _ in dates_li), dtype=np.int)
 or
 np.array([_.hour for _ in dates_li], dtype=np.int)

 I used dates_li only for the preparation of example data.

 So let's suppose I have the array dates_array returned from a
 a function.

Just use dates_array instead of dates_li, then.


 hours_array = dates_array.copy()
 for i in range(0, dates_array.size):
   hours_array[i] = dates_array[i].hour


* What's the point of making a copy of dates_array ? dates_array is a  
ndarray of object, right ? And you want to take the hours, so you  
should have an ndarray of integers for hours_array.
* The issue I have with this part is that you have several calls to  
__getitem__ at each iteration. It might be faster to use create  
hours_array as a block:
hours_array=np.array([_.hour for _ in dates_array], dtype=np.int)


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recfunctions.stack_arrays

2009-01-27 Thread Pierre GM
[Some background: we're talking about numpy.lib.recfunctions, a set of  
functions to manipulate structured arrays]

Ryan,
If the two files have the same structure, you can use that fact and  
specify the dtype of the output directly with the dtype parameter of  
mafromtxt. That way, you're sure that the two arrays will have the  
same dtype. If you don't know the structure beforehand, you could try  
to load one array and use its dtype as input of mafromtxt to load the  
second one.
Now, we could also try to modify stack_arrays so that it would take  
the largest dtype when several fields have the same name. I'm not  
completely satisfied by this approach, as it makes dtype conversions  
under the hood. Maybe we could provide the functionality as an option  
(w/ a forced_conversion boolean input parameter) ?
I'm a bit surprised by the error message you get. If I try:

  a = ma.array([(1,2,3)], mask=[(0,1,0)], dtype=[('a',int),  
('b',bool), ('c',float)])
  b = ma.array([(4, 5, 6)], dtype=[('a', int), ('b', float), ('c',  
float)])
  test = np.stack_arrays((a, b))

I get a TypeError instead (the field 'b' hasn't the same type in a and  
b). Now, I get the 'two fields w/ the same name' when I use  
np.merge_arrays (with the flatten option). Could you send a small  
example ?


 P.S.  Thanks so much for your work on putting those utility  
 functions in
 recfunctions.py  It makes it so much easier to have these functions  
 available in
 the library itself rather than needing to reinvent the wheel over  
 and over.

Indeed. Note that most of the job had been done by John Hunter and the  
matplotlib developer in their matplotlib.mlab module, so you should  
thank them and not me. I just cleaned up some of the functions.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recfunctions.stack_arrays

2009-01-27 Thread Pierre GM

On Jan 27, 2009, at 4:23 PM, Ryan May wrote:


 I definitely wouldn't advocate magic by default, but I think it  
 would be nice to
 be able to get the functionality if one wanted to.

OK. Put on the TODO list.


 There is one problem I
 noticed, however.  I found common_type and lib.mintypecode, but both  
 raise errors
 when trying to find a dtype to match both bool and float.  I don't  
 know if
 there's another function somewhere that would work for what I want.

I'm not familiar with these functions, I'll check that.

 Apparently, I get my error as a result of my use of titles in the  
 dtype to store
 an alternate name for the field.  (If you're not familiar with  
 titles, they're
 nice because you can get fields by either name, so for the following  
 example,
 a['a'] and a['A'] both return array([1]).)  The following version of  
 your case
 gives me the ValueError:

Ah OK. You found a bug. There's a frustrating feature of dtypes:  
dtype.names doesn't always match [_[0] for _ in dtype.descr].


 As a side question, do you have some local mods to your numpy SVN so  
 that some of
 the functions in recfunctions are available in numpy's top level?

Probably. I used the develop option of setuptools to install numpy on  
a virtual environment.

 On mine, I
 can't get to them except by importing them from  
 numpy.lib.recfunctions.  I don't
 see any mention of recfunctions in lib/__init__.py.


Well, till some problems are ironed out, I'm not really in favor of  
advertising them too much...

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Academic citation ?

2009-01-26 Thread Pierre GM
JH,
Thx for the links, but I'm afraid I need something more basic than  
that. For example, I'm referring to Python as:

van Rossum, G. and Drake, F. L. (eds), 2006. Python Reference Manual,  
Python Software Foundation,. http://docs.python.org/ref/ref.html.

I could indeed use http://www.scipy.org/Citing_SciPy to cite Scipy  
(although the citation is incomplete), and define something similar  
for Numpy... Or refer to the Computing in Science and Engineering  
special issue. I'm just a bit surprised there's no official standard.

Thx,
P.



On Jan 26, 2009, at 10:56 AM, j...@physics.ucf.edu wrote:

 What is the most up-to-date way to cite Numpy and Scipy in an  
 academic
 journal ?

 Cite our conference articles here:

 http://conference.scipy.org/proceedings/SciPy2008/index.html

 It would be nice if someone involved in the proceedings could post a
 bibtex on the citations page.  And link the citations page
 to...something...easily navigated to from the front page.

 This brings up a related point:

 When someone goes to scipy.org, there is no way to navigate to
 conferences.scipy.org from scipy.org except by finding the link buried
 in the intro text.  Ipython and all the whatever.scipy.org domains,
 except for docs.scipy.org, are completely absent; you have to know
 about them to find them.  I don't even know where to find a complete
 list of these.  They should all have a presence on at least the front
 page and maybe the navigation.

 --jh--
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug with mafromtxt

2009-01-26 Thread Pierre GM

On Jan 24, 2009, at 6:23 PM, Ryan May wrote:


 Ok, thanks.  I've dug a little further, and it seems like the  
 problem is that a
 column of all missing values ends up as a column of all None's.   
 When you create
 a (masked) array from a list of None's, you end up with an object  
 array.  On one
 hand I'd love for things to behave differently in this case, but on  
 the other I
 understand why things work this way.

Ryan,
Mind giving r6434 a try? As usual, don't hesitate to report any problem.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Academic citation ?

2009-01-25 Thread Pierre GM
All,
What is the most up-to-date way to cite Numpy and Scipy in an academic  
journal ?
Thanks a lot in advance
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Academic citation ?

2009-01-25 Thread Pierre GM
David,
Thanks, but that's only part of what I need. I could also refer to  
Travis O's paper in Computing in Science and Engineering, but I  
wondered whether there wasn't something more up-to-date.
So, other answers are still welcome.
P.


On Jan 25, 2009, at 8:17 PM, David Warde-Farley wrote:

 I believe this is what you're looking for:

   http://www.scipy.org/Citing_SciPy


 On 25-Jan-09, at 6:45 PM, Pierre GM wrote:

 All,
 What is the most up-to-date way to cite Numpy and Scipy in an  
 academic
 journal ?
 Thanks a lot in advance
 P.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug with mafromtxt

2009-01-24 Thread Pierre GM
Ryan,
Thanks for reporting. An idea would be to force the dtype of the  
masked column to the largest dtype of the other columns (in your  
example, that would be int). I'll try to see how easily it can be done  
early next week. Meanwhile, you can always give an explicit dtype at  
creation.

On Jan 24, 2009, at 5:58 PM, Ryan May wrote:

 Pierre,

 I've found what I consider to be a bug in the new mafromtxt (though  
 apparently it
 existed in earlier versions as well).  If you have an entire column  
 of data in a
 file that contains only masked data, and try to get mafromtxt to  
 automatically
 choose the dtype, the dtype gets selected to be object type.  In  
 this case, I'd
 think the better behavior would be float, but I'm not sure how hard  
 it would be
 to make this the case.  Here's a test case:

 import numpy as np
 from StringIO import StringIO
 s = StringIO('1 2 3\n4 5 6\n')
 a = np.mafromtxt(s, missing='2,5', dtype=None)
 print a.dtype

 Ryan

 -- 
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.array and subok kwarg

2009-01-22 Thread Pierre GM
Darren,


 The type returned by np.array is ndarray, unless I specifically set  
 subok=True, in which case I get a MyArray. The default value of  
 subok is True, so I dont understand why I have to specify subok  
 unless I want it to be False. Is my subclass missing something  
 important?

Blame the doc: the default for subok in array is False, as explicit in  
the _array_fromobject Cfunction (in multiarray). So no, you're not  
doing anything wrong. Note that by default subok=True for  
numpy.ma.array.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Failures in test_recfunctions.py

2009-01-22 Thread Pierre GM
Interesting.
The tests pass on my machine
OS X,
Python version 2.5.4 (r254:67916, Dec 29 2008, 17:02:44) [GCC 4.0.1  
(Apple Inc. build 5488)]
nose version 0.10.4

For
   File
 /home/nwagner/local/lib64/python2.6/site-packages/numpy/lib/tests/ 
 test_recfunctions.py,
 line 34, in test_zip_descr
 np.dtype([('', 'i4'), ('', 'i4')]))

I guess I can change 'i4' to int, which should work.

For:

 ==
 FAIL: Test the ignoremask option of find_duplicates
 --
 Traceback (most recent call last):
   File
 /home/nwagner/local/lib64/python2.6/site-packages/numpy/lib/tests/ 
 test_recfunctions.py,
 line 186, in test_find_duplicates_ignoremask
 assert_equal(test[-1], control)

 (mismatch 33.33%)
  x: array([0, 1, 3, 4, 2, 6])
  y: array([0, 1, 3, 4, 6, 2])


there's obviously a machine-dependent element somewhere. I'd blame  
argsort: the last 2 indices that are switched correspond to the masked  
elements in the input of the test. Note that the result is basically  
correct.
I should have access to a linux box, I'll see what I can do.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Failures in test_recfunctions.py

2009-01-22 Thread Pierre GM

On Jan 22, 2009, at 1:31 PM, Nils Wagner wrote:

 Hi Pierre,

 Thank you. Works for me.

You're welcome, thanks for reporting!
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange multiplication behavior with numpy.float64 and ndarray subclass

2009-01-21 Thread Pierre GM

On Jan 21, 2009, at 11:34 AM, Darren Dale wrote:

 I have a simple test script here that multiplies an ndarray subclass  
 with another number. Can anyone help me understand why each of these  
 combinations returns a new instance of MyArray:

 mine = MyArray()
 print type(np.float32(1)*mine)
 print type(mine*np.float32(1))
 print type(mine*np.float64(1))
 print type(1*mine)
 print type(mine*1)

 but this one returns a np.float64 instance?

FYI, that's the same behavior as observed in ticket #826. A first  
thread addressed that issue
http://www.mail-archive.com/numpy-discussion@scipy.org/msg13235.html
But so far, no answer has been suggested.
Any help welcome.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt

2009-01-21 Thread Pierre GM
Brent,
Currently, no, you won't be able to retrieve the header if it's  
commented.
I'll see what I can do.
P.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt

2009-01-21 Thread Pierre GM
Brent,
Mind trying r6330 and let me know if it works for you ? Make sure that  
you use names=True to detect a header.
P.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Examples for numpy.genfromtxt

2009-01-20 Thread Pierre GM
Till I write some proper doc, you can check the examples in tests/ 
test_io (TestFromTxt suitcase)


On Jan 20, 2009, at 4:17 AM, Nils Wagner wrote:

 Hi all,

 Where can I find some sophisticated examples for the usage
 of numpy.genfromtxt ?


 Nils
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.testing.asserts and masked array

2009-01-16 Thread Pierre GM

On Jan 16, 2009, at 10:51 AM, josef.p...@gmail.com wrote:

 I have a regression result with masked arrays that produces a masked
 array output, estm5.yhat, and I want to test equality to the benchmark
 case, estm1.yhat, with the asserts in numpy.testing, but I am getting
 strange results.

 ...
 Whats the trick to assert_almost_equal for masked arrays?

Us numpy.ma.testutils.assert_almost_equal instead.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unique1d and asarray

2009-01-04 Thread Pierre GM

On Jan 4, 2009, at 4:47 PM, Robert Kern wrote:

 On Sun, Jan 4, 2009 at 15:44, Pierre GM pgmdevl...@gmail.com wrote:

 If we used np.asanyarray instead, subclasses are recognized properly,
 the mask is recognized by argsort and the result correct.
 Is there a reason why we use np.asarray instead of np.asanyarray ?

 Probably not.

So there wouldn't be any objections to make the switch ? We can wait a  
couple of days if anybody has a pb with that...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genloadtxt : ready for inclusion

2009-01-03 Thread Pierre GM
All,
You'll probably remember that last December, I started rewriting  
np.loadtxt and ame up with a series of functions that support missing  
data. I tried to copy/paste the code in numpy.lib.io.py but ran into  
dependency problems and left it at that. I think that part of the  
reason is that the code relies on numpy.ma which can't be loaded when  
numpy.lib gets loaded.

As I needed a way to grant access to the code to anybody, I created a  
small project on launchpad: you can access it to:

https://code.launchpad.net/~pierregm/numpy/numpy_addons

The loadtxt reimplementation functions can be found in the  
numpy.io.fromascii module, their unittest in the corresponding test  
directory. In addition, you'll find several other functions and their  
unittest to manipulate arrays w/ flexible data-type. They are  
basically rewritten version of some functions in matplotlib.mlab.

Would anybody be willing to try inserting the new functions in numpy ?  
I was hoping the genfromtxt and consorts would make it to numpy 1.3.x  
(I'd need the code for the scikits.timeseries package).

As usual, I'd need all the feedback you can share.

Thanks a lot in advance.
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Alternative to record array

2008-12-29 Thread Pierre GM
Jean-Baptiste,
As you stated, everything depends on what you want to do.
If you need to keep the correspondence ageweight for each entry,  
then yes, record arrays, or at least flexible-type arrays, are the  
best. (The difference between a recarray and a flexible-type array is  
that fields can be accessed by attributes (data.age) or items  
(data['age']) with recarrays, but only with items with felxible-type  
arrays).

Using your example, you could very well do:
data['age'] += 1
and still keep the correspondence ageweight.

Your FieldArray class returns an object that is not a ndarray, which  
may have some undesired side-effects.

As Ryan noted, flexible-type arrays are usually faster, because they  
lack the overhead brought by the possibiity of accessing data by  
attributes. So, if you don't mind using the 'access-by-fields' syntax,  
you're good to go.


On Dec 29, 2008, at 10:58 AM, Jean-Baptiste Rudant wrote:

 Hello,

 I like to use record arrays to access fields by their name, and  
 because they are esay to use with pytables. But I think it's not  
 very effiicient for what I have to do. Maybe I'm misunderstanding  
 something.

 Example :

 import numpy as np
 age = np.random.randint(0, 99, 10e6)
 weight = np.random.randint(0, 200, 10e6)
 data = np.rec.fromarrays((age, weight), names='age, weight')
 # the kind of operations I do is :
 data.age += data.age + 1
 # but it's far less efficient than doing :
 age += 1
 # because I think the record array stores [(age_0, weight_0) ... 
 (age_n, weight_n)]
 # and not [age0 ... age_n] then [weight_0 ... weight_n].

 So I think I don't use record arrays for the right purpose. I only  
 need something which would make me esasy to manipulate data by  
 accessing fields by their name.

 Am I wrong ? Is their something in numpy for my purpose ? Do I have  
 to implement my own class, with something like :


 class FieldArray:
 def __init__(self, array_dict):
 self.array_list = array_dict

 def __getitem__(self, field):
 return self.array_list[field]

 def __setitem__(self, field, value):
 self.array_list[field] = value

 my_arrays = {'age': age, 'weight' : weight}
 data = FieldArray(my_arrays)

 data['age'] += 1

 Thank you for the help,

 Jean-Baptiste Rudant
   




 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is there a sortrows

2008-12-21 Thread Pierre GM

On Dec 21, 2008, at 10:19 PM, josef.p...@gmail.com wrote:

 From the examples that I tried out np.sort, sorts each column
 separately (with axis = 0). If the elements of a row is supposed to
 stay together, then np.sort doesn't work

Well, if the elements are supposed to stay together, why wouldn't you  
tie them first, sort, and then untie them ?

  np.sort(a.view([('',int),('',int)]),0).view(int)

The first view transforms your 2D array into a 1D array of tuples, the  
second one retransforms the 1D array to 2D.

Not sure it's better than your lexsort, haven't timed it.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unexpected MaskedArray behavior

2008-12-17 Thread Pierre GM

On Dec 17, 2008, at 12:13 PM, Jim Vickroy wrote:

 Sorry for being dense about this, but I really do not understand why  
 masked values should not be trusted.  If I apply a procedure to an  
 array with elements designated as untouchable, I would expect that  
 contract to be honored.  What am I missing here?

 Thanks for your patience!
 -- jv

Everything depends on your interpretation of masked data.  
Traditionally, masked data indicate invalid data, whatever the cause  
of the invalidity. Operations involving invalid data yield invalid  
data, hence the presence of a mask on the result. However, the value  
underneath the mask is still invalid, hence the statement don't trust  
masked values.
Interpreting a mask as a way to prevent some elements of an array to  
be processed (designating them as untouchable) is a bit of a stretch.  
Nevertheless, I agree that this behavior is not intuitive, so I'll  
check what I can do.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt : last call

2008-12-16 Thread Pierre GM
Ryan,
OK, I'll look into that. I won't have time to address it before this  
next week, however. Option #2 looks like the best.

In other news, I was considering renaming genloadtxt to genfromtxt,  
and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the  
function names. That way, loadtxt is untouched.



On Dec 16, 2008, at 6:07 PM, Ryan May wrote:

 Pierre GM wrote:
 All,
 Here's the latest version of genloadtxt, with some recent  
 corrections.
 With just a couple of tweaking, we end up with some decent speed:  
 it's
 still slower than np.loadtxt, but only 15% so according to the test  
 at
 the end of the package.

 I have one more use issue that you may or may not want to fix. My  
 problem is that
 missing values are specified by their string representation, so  
 that a string
 representing a missing value, while having the same actual numeric  
 value, may not
 compare equal when represented as a string.  For instance, if you  
 specify that
 -999.0 represents a missing value, but the value written to the file  
 is -999.00,
 you won't end up masking the -999.00 data point.  I'm sure a test  
 case will help
 here:

 def test_withmissing_float(self):
 data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
 test = mloadtxt(data, dtype=None, delimiter=',',  
 missing='-999.0',
 names=True)
 control = ma.array([(0, 1.5), (2, -1.)],
mask=[(False, False), (False, True)],
dtype=[('A', np.int), ('B', np.float)])
 print control
 print test
 assert_equal(test, control)
 assert_equal(test.mask, control.mask)

 Right now this fails with the latest version of genloadtxt.  I've  
 worked around
 this by specifying a whole bunch of string representations of the  
 values, but I
 wasn't sure if you knew of a better way that this could be handled  
 within
 genloadtxt.  I can only think of two ways, though I'm not thrilled  
 with either:

 1) Call the converter on the string form of the missing value and  
 compare against
 the converted value from the file to determine if missing. (Probably  
 very slow)

 2) Add a list of objects (ints, floats, etc.) to compare against  
 after conversion
 to determine if they're missing. This might needlessly complicate  
 the function,
 which I know you've already taken pains to optimize.

 If there's no good way to do it, I'm content to live with a  
 workaround.

 Ryan

 -- 
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unexpected MaskedArray behavior

2008-12-16 Thread Pierre GM

On Dec 16, 2008, at 1:57 PM, Ryan May wrote:

 I just noticed the following and I was kind of surprised:

 a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False])
 b = a*5
 b
 masked_array(data = [5 -- -- 20 25],
   mask = [False  True  True False False],
   fill_value=99)
 b.data
 array([ 5, 10, 15, 20, 25])

 I was expecting that the underlying data wouldn't get modified while  
 masked.  Is
 this actual behavior expected?

Meh. Masked data shouldn't be trusted anyway, so I guess it doesn't  
really matter one way or the other.
But I tend to agree, it'd make more sense leave masked data untouched  
(or at least, reset them to their original value after the operation),  
which would mimic the behavior of gimp/photoshop.
Looks like there's a relatively easy fix. I need time to check whether  
it doesn't break anything elsewhere, nor that it slows things down too  
much. I won't have time to test all that before next week, though. In  
any case, that would be for 1.3.x, not for 1.2.x.
In the meantime, if you need the functionality, use something like
ma.where(a.mask,a,a*5)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Superclassing numpy.matrix: got an unexpected keyword argument 'dtype'

2008-12-10 Thread Pierre GM
Robert,
Transforming your matrix to a list before computation isn't very  
efficient. If you do need some extra parameters in your __init__ to be  
compatible with other functions such as asmatrix, well, just add them,  
or use a coverall **kwargs
def __init__(self, instruments, **kwargs)
No guarantee it'll work all the time.

Otherwise, please have a look at:
http://docs.scipy.org/doc/numpy/user/basics.subclassing.html
and the other link at the top of that page. In your case, I'd try to  
put the initialization in the __array_finalize__.



On Dec 10, 2008, at 7:15 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] 
  wrote:

 Hello,

 I'm using numpy-1.1.1 for Python 2.3.  I'm trying to create a class  
 that acts just like the numpy.matrix class with my own added methods  
 and attributes.  I want to pass my class a list of custom  
 instrument objects and do some math based on these objects to set  
 the matrix.  To this end I've done the following:

 from numpy import matrix

 class rcMatrix(matrix):
def __init__(self,instruments):
Do some calculations and set the values of the matrix.
self[0,0] = 100 # Just an example
self[0,1] = 100 # The real init method
self[1,0] = 200 # Does some math based on the input objects
self[1,1] = 300 #
def __new__(cls,instruments):
When creating a new instance begin by creating an NxN  
 matrix of
zeroes.
len_ = len(instruments)
return matrix.__new__(cls,[[0.0]*len_]*len_)

 It works great and I can, for example, multiply two of my custom  
 matrices seamlessly.  I can also get the transpose.  However, when I  
 try to get the inverse I get an error:

 rcm = rcMatrix(['instrument1','instrument2'])
 print rcm
 [[ 100.  100.]
 [ 200.  300.]]
 print rcm.T
 [[ 100.  200.]
 [ 100.  300.]]
 print [5,10] * rcm
 [[ 2500.  3500.]]
 print rcm.I
 Traceback (most recent call last):
  File [Standard]/deleteme, line 29, in ?
  File C:\Python23\Lib\site-packages\numpy\core\defmatrix.py, line  
 492, in getI
return asmatrix(func(self))
  File C:\Python23\Lib\site-packages\numpy\core\defmatrix.py, line  
 52, in asmatrix
return matrix(data, dtype=dtype, copy=False)
 TypeError: __init__() got an unexpected keyword argument 'dtype'



 I've had to overwrite the getI function in order for things to work  
 out:

def getI(self): return matrix(self.tolist()).I
I = property(getI, None, doc=inverse)

 Is this the correct way to achieve my goals?

 Please let me know if anything is unclear.

 Thanks,

 Robert Conde


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt : last call

2008-12-09 Thread Pierre GM

On Dec 9, 2008, at 12:59 PM, Christopher Barker wrote:

 Jarrod Millman wrote:

 From the user's perspective, I would like all the NumPy IO code to  
 be
 in the same place in NumPy; and all the SciPy IO code to be in the
 same place in SciPy.

 +1

So, no problem w/ importing numpy.ma and numpy.records in numpy.lib.io ?




 So I
 wonder if it would make sense to incorporate AstroAsciiData?

 Doesn't it overlap a lot with genloadtxt? If so, that's a bit  
 confusing
 to new users.

For the little I browsed, do we need it ? We could get the same thing  
with record arrays...


 3. What about data source?

 Should we remove datasource?  Start using it more?

 start  using it more -- it sounds very handy.

Didn't know it was around. I'll adapt genloadtxt to use it.

 Documentation
 -
 Let me try NumPy; this seems
 pretty good.  Now let's see how to load in some of my data)

 totally key -- I have a colleague that has used Matlab a fair bi tin
 past that is starting a new project -- he asked me what to use. I, of
 course, suggested python+numpy+scipy. His first question was -- can I
 load data in from excel?

So that would go in scipy.io ?



 One more comment -- for fast reading of lots of ascii data, fromfile()
 needs some help -- I wish I had more time for it -- maybe some day.

I'm afraid you'd have to count me out on this one: I don't speak C  
(yet), and don't foresee learning it soon enough to be of any help...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Python2.4 support

2008-12-07 Thread Pierre GM
All,

* What versions of Python should be supported by what version of  
numpy ? Are we to expect users to rely on Python2.5 for the upcoming  
1.3.x ? Could we have some kind of timeline on the trac site or  
elsewhere (and if such a timeline exists already, can I get the link?) ?

* Talking about 1.3.x, what's the timeline? Are we still shooting for  
a release in 2008 or could we wait till mid Jan. 2009 ?

Thx a lot in advance
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python2.4 support

2008-12-07 Thread Pierre GM

On Dec 7, 2008, at 4:21 PM, Jarrod Millman wrote:
 NumPy 1.3.x should work with Python 2.4, 2.5, and 2.6.  At some point
 we can drop 2.4, but I would like to wait a bit since we just dropped
 2.3 support.  The timeline is on the trac site:
  http://projects.scipy.org/scipy/numpy/milestone/1.3.0

OK, great, thanks a lot.


 * Talking about 1.3.x, what's the timeline? Are we still shooting for
 a release in 2008 or could we wait till mid Jan. 2009 ?

 I am fine with pushing the release back, if there is interest in doing
 that.  I have been mainly focusing on getting SciPy 0.7.x out, so I
 haven't been following the NumPy development closely.  But it is good
 that you are asking for more concrete details about the next NumPy
 release.  We need to start making plans.  Does anyone have any
 suggestions about whether we should push the release back?  Is 1 month
 long enough?  What is left to do?

Well, on my side, there's some doc to be updated, of course. Then, I'd  
like to put the rec_functions that were developed in matplotlib to  
manipulate recordarrays. I haven't started yet, might be able to do so  
before the end of the year (not much to do, just a clean up and some  
examples). And what should we do with the genloadtxt function ?



 Please feel free to update the release notes, which are checked into  
 the trunk:
  http://scipy.org/scipy/numpy/browser/trunk/doc/release/1.3.0- 
 notes.rst


Will do in good time.
Thx again

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genloadtxt : last call

2008-12-05 Thread Pierre GM

All,
Here's the latest version of genloadtxt, with some recent corrections.  
With just a couple of tweaking, we end up with some decent speed: it's  
still slower than np.loadtxt, but only 15% so according to the test at  
the end of the package.


And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ?

Thx for any comment and suggestions.


Proposal : 
Here's an extension to np.loadtxt, designed to take missing values into account.





import itertools
import numpy as np
import numpy.ma as ma


def _is_string_like(obj):

Check whether obj behaves like a string.

try:
obj + ''
except (TypeError, ValueError):
return False
return True

def _to_filehandle(fname, flag='r', return_opened=False):

Returns the filehandle corresponding to a string or a file.
If the string ends in '.gz', the file is automatically unzipped.

Parameters
--
fname : string, filehandle
Name of the file whose filehandle must be returned.
flag : string, optional
Flag indicating the status of the file ('r' for read, 'w' for write).
return_opened : boolean, optional
Whether to return the opening status of the file.

if _is_string_like(fname):
if fname.endswith('.gz'):
import gzip
fhd = gzip.open(fname, flag)
elif fname.endswith('.bz2'):
import bz2
fhd = bz2.BZ2File(fname)
else:
fhd = file(fname, flag)
opened = True
elif hasattr(fname, 'seek'):
fhd = fname
opened = False
else:
raise ValueError('fname must be a string or file handle')
if return_opened:
return fhd, opened
return fhd


def flatten_dtype(ndtype):

Unpack a structured data-type.


names = ndtype.names
if names is None:
return [ndtype]
else:
types = []
for field in names:
(typ, _) = ndtype.fields[field]
flat_dt = flatten_dtype(typ)
types.extend(flat_dt)
return types


def nested_masktype(datatype):

Construct the dtype of a mask for nested elements.


names = datatype.names
if names:
descr = []
for name in names:
(ndtype, _) = datatype.fields[name]
descr.append((name, nested_masktype(ndtype)))
return descr
# Is this some kind of composite a la (np.float,2)
elif datatype.subdtype:
mdescr = list(datatype.subdtype)
mdescr[0] = np.dtype(bool)
return tuple(mdescr)
else:
return np.bool



class LineSplitter:

Defines a function to split a string at a given delimiter or at given places.

Parameters
--
comment : {'#', string}
Character used to mark the beginning of a comment.
delimiter : var, optional
If a string, character used to delimit consecutive fields.
If an integer or a sequence of integers, width(s) of each field.
autostrip : boolean, optional
Whether to strip each individual fields


def autostrip(self, method):
Wrapper to strip each member of the output of `method`.
return lambda input: [_.strip() for _ in method(input)]
#
def __init__(self, delimiter=None, comments='#', autostrip=True):
self.comments = comments
# Delimiter is a character
if (delimiter is None) or _is_string_like(delimiter):
delimiter = delimiter or None
_handyman = self._delimited_splitter
# Delimiter is a list of field widths
elif hasattr(delimiter, '__iter__'):
_handyman = self._variablewidth_splitter
idx = np.cumsum([0]+list(delimiter))
delimiter = [slice(i,j) for (i,j) in zip(idx[:-1], idx[1:])]
# Delimiter is a single integer
elif int(delimiter):
(_handyman, delimiter) = (self._fixedwidth_splitter, int(delimiter))
else:
(_handyman, delimiter) = (self._delimited_splitter, None)
self.delimiter = delimiter
if autostrip:
self._handyman = self.autostrip(_handyman)
else:
self._handyman = _handyman
#
def _delimited_splitter(self, line):
line = line.split(self.comments)[0].strip()
if not line:
return []
return line.split(self.delimiter)
#
def _fixedwidth_splitter(self, line):
line = line.split(self.comments)[0]
if not line:
return []
fixed = self.delimiter
slices = [slice(i, i+fixed) for i in range(len(line))[::fixed]]
return [line[s] for s in slices]
#
def _variablewidth_splitter(self, line):
line = line.split(self.comments)[0]
if not line:
return []
slices = self.delimiter
return [line[s] for s in slices]
#
def __call__(self, line):
return self._handyman(line)



class 

[Numpy-discussion] genloadtxt: second serving

2008-12-04 Thread Pierre GM

All,
Here's the second round of genloadtxt. That's a tad cleaner version  
than the previous one, where I tried to take  into account the  
different comments and suggestions that were posted. So, tabs should  
be supported and explicit whitespaces are not collapsed.
FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit  
comparison: same input, no missing data, one with genloadtxt, one with  
np.loadtxt and a last one with matplotlib.mlab.csv2rec.


As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but  
twice faster than csv2rec. One of the explanation for the slowness is  
indeed the use of classes for splitting lines and converting values.  
Instead of a basic function, we use the __call__ method of the class,  
which itself calls another function depending on the attribute values.  
I'd like to reduce this overhead, any suggestion is more than welcome,  
as usual.


Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in  
numpy.ma, with an alias recfromcsv for John, using his defaults.  
Unless somebody comes with a brilliant optimization.


Let me know how it goes,
Cheers,
P.





Proposal : 
Here's an extension to np.loadtxt, designed to take missing values into account.





import itertools
import numpy as np
import numpy.ma as ma


def _is_string_like(obj):

Check whether obj behaves like a string.

try:
obj + ''
except (TypeError, ValueError):
return False
return True

def _to_filehandle(fname, flag='r', return_opened=False):

Returns the filehandle corresponding to a string or a file.
If the string ends in '.gz', the file is automatically unzipped.

Parameters
--
fname : string, filehandle
Name of the file whose filehandle must be returned.
flag : string, optional
Flag indicating the status of the file ('r' for read, 'w' for write).
return_opened : boolean, optional
Whether to return the opening status of the file.

if _is_string_like(fname):
if fname.endswith('.gz'):
import gzip
fhd = gzip.open(fname, flag)
elif fname.endswith('.bz2'):
import bz2
fhd = bz2.BZ2File(fname)
else:
fhd = file(fname, flag)
opened = True
elif hasattr(fname, 'seek'):
fhd = fname
opened = False
else:
raise ValueError('fname must be a string or file handle')
if return_opened:
return fhd, opened
return fhd


def flatten_dtype(ndtype):

Unpack a structured data-type.


names = ndtype.names
if names is None:
return [ndtype]
else:
types = []
for field in names:
(typ, _) = ndtype.fields[field]
flat_dt = flatten_dtype(typ)
types.extend(flat_dt)
return types


def nested_masktype(datatype):

Construct the dtype of a mask for nested elements.


names = datatype.names
if names:
descr = []
for name in names:
(ndtype, _) = datatype.fields[name]
descr.append((name, nested_masktype(ndtype)))
return descr
# Is this some kind of composite a la (np.float,2)
elif datatype.subdtype:
mdescr = list(datatype.subdtype)
mdescr[0] = np.dtype(bool)
return tuple(mdescr)
else:
return np.bool



class LineSplitter:

Defines a function to split a string at a given delimiter or at given places.

Parameters
--
comment : {'#', string}
Character used to mark the beginning of a comment.
delimiter : var, optional
If a string, character used to delimit consecutive fields.
If an integer or a sequence of integers, width(s) of each field.
autostrip : boolean, optional
Whether to strip each individual fields


def autostrip(self, method):
Wrapper to strip each member of the output of `method`.
return lambda input: [_.strip() for _ in method(input)]
#
def __init__(self, delimiter=None, comments='#', autostrip=True):
self.comments = comments
# Delimiter is a character
if (delimiter is None) or _is_string_like(delimiter):
delimiter = delimiter or None
_called = self._delimited_splitter
# Delimiter is a list of field widths
elif hasattr(delimiter, '__iter__'):
_called = self._variablewidth_splitter
idx = np.cumsum([0]+list(delimiter))
delimiter = [slice(i,j) for (i,j) in zip(idx[:-1], idx[1:])]
# Delimiter is a single integer
elif int(delimiter):
(_called, delimiter) = (self._fixedwidth_splitter, int(delimiter))
else:
(_called, delimiter) = (self._delimited_splitter, None)
self.delimiter = delimiter
if autostrip:
self._called = self.autostrip(_called)
else:

[Numpy-discussion] genloadtxt: second serving (tests)

2008-12-04 Thread Pierre GM

And now for the tests:

# pylint disable-msg=E1101, W0212, W0621

import numpy as np
import numpy.ma as ma

from numpy.ma.testutils import *

from StringIO import StringIO

from _preview import *


class TestLineSplitter(TestCase):
Tests the LineSplitter class.
#
def test_no_delimiter(self):
Test LineSplitter w/o delimiter
strg =  1 2 3 4  5 # test
test = LineSplitter()(strg)
assert_equal(test, ['1', '2', '3', '4', '5'])
test = LineSplitter('')(strg)
assert_equal(test, ['1', '2', '3', '4', '5'])

def test_space_delimiter(self):
Test space delimiter
strg =  1 2 3 4  5 # test
test = LineSplitter(' ')(strg)
assert_equal(test, ['1', '2', '3', '4', '', '5'])
test = LineSplitter('  ')(strg)
assert_equal(test, ['1 2 3 4', '5'])

def test_tab_delimiter(self):
Test tab delimiter
strg=  1\t 2\t 3\t 4\t 5  6
test = LineSplitter('\t')(strg)
assert_equal(test, ['1', '2', '3', '4', '5  6'])
strg=  1  2\t 3  4\t 5  6
test = LineSplitter('\t')(strg)
assert_equal(test, ['1  2', '3  4', '5  6'])

def test_other_delimiter(self):
Test LineSplitter on delimiter
strg = 1,2,3,4,,5
test = LineSplitter(',')(strg)
assert_equal(test, ['1', '2', '3', '4', '', '5'])
#
strg =  1,2,3,4,,5 # test
test = LineSplitter(',')(strg)
assert_equal(test, ['1', '2', '3', '4', '', '5'])

def test_constant_fixed_width(self):
Test LineSplitter w/ fixed-width fields
strg =   1  2  3  4 5   # test
test = LineSplitter(3)(strg)
assert_equal(test, ['1', '2', '3', '4', '', '5', ''])
#
strg =   1 3  4  5  6# test
test = LineSplitter(20)(strg)
assert_equal(test, ['1 3  4  5  6'])
#
strg =   1 3  4  5  6# test
test = LineSplitter(30)(strg)
assert_equal(test, ['1 3  4  5  6'])

def test_variable_fixed_width(self):
strg =   1 3  4  5  6# test
test = LineSplitter((3,6,6,3))(strg)
assert_equal(test, ['1', '3', '4  5', '6'])
#
strg =   1 3  4  5  6# test
test = LineSplitter((6,6,9))(strg)
assert_equal(test, ['1', '3  4', '5  6'])


#---

class TestNameValidator(TestCase):
#
def test_case_sensitivity(self):
Test case sensitivity
names = ['A', 'a', 'b', 'c']
test = NameValidator().validate(names)
assert_equal(test, ['A', 'a', 'b', 'c'])
test = NameValidator(case_sensitive=False).validate(names)
assert_equal(test, ['A', 'A_1', 'B', 'C'])
#
def test_excludelist(self):
Test excludelist
names = ['dates', 'data', 'Other Data', 'mask']
validator = NameValidator(excludelist = ['dates', 'data', 'mask'])
test = validator.validate(names)
assert_equal(test, ['dates_', 'data_', 'Other_Data', 'mask_'])


#---

class TestStringConverter(TestCase):
Test StringConverter
#
def test_creation(self):
Test creation of a StringConverter
converter = StringConverter(int, -9)
assert_equal(converter._status, 1)
assert_equal(converter.default, -9)
#
def test_upgrade(self):
Tests the upgrade method.
converter = StringConverter()
assert_equal(converter._status, 0)
converter.upgrade('0')
assert_equal(converter._status, 1)
converter.upgrade('0.')
assert_equal(converter._status, 2)
converter.upgrade('0j')
assert_equal(converter._status, 3)
converter.upgrade('a')
assert_equal(converter._status, len(converter._mapper)-1)
#
def test_missing(self):
Tests the use of missing values.
converter = StringConverter(missing_values=('missing','missed'))
converter.upgrade('0')
assert_equal(converter('0'), 0)
assert_equal(converter(''), converter.default)
assert_equal(converter('missing'), converter.default)
assert_equal(converter('missed'), converter.default)
try:
converter('miss')
except ValueError:
pass
#
def test_upgrademapper(self):
Tests updatemapper
import dateutil.parser
import datetime
dateparser = dateutil.parser.parse
StringConverter.upgrade_mapper(dateparser, datetime.date(2000,1,1))
convert = StringConverter(dateparser, datetime.date(2000, 1, 1))
test = convert('2001-01-01')
assert_equal(test, datetime.datetime(2001, 01, 01, 00, 00, 00))


#---

class TestLoadTxt(TestCase):
#
def test_record(self):
Test w/ explicit 

Re: [Numpy-discussion] genloadtxt: second serving

2008-12-04 Thread Pierre GM

On Dec 4, 2008, at 7:22 AM, Manuel Metz wrote:

 Will loadtxt in that case remain as is? Or will the _faulttolerantconv
 class be used?

No idea, we need to discuss it. There's a problem with  
_faulttolerantconv: using np.nan as default value will not work in  
Python2.6 if the output is to be int, as an exception will be raised.  
Therefore, we'd need to change the default to something else when  
defining _faulttolerantconv. The easiest would be to define a class  
and set the argument at instantiation, but then we're going back  
dangerously close to StringConverter...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] in(np.nan) on python 2.6

2008-12-04 Thread Pierre GM

On Nov 25, 2008, at 12:23 PM, Pierre GM wrote:

 All,
 Sorry to bump my own post, and I was kinda threadjacking anyway:

 Some functions of numy.ma (eg, ma.max, ma.min...) accept explicit  
 outputs that may not be MaskedArrays.
 When such an explicit output is not a MaskedArray, a value that  
 should have been masked is transformed into np.nan.

 That worked great in 2.5, with np.nan automatically transformed to 0  
 when the explicit output had a int dtype. With Python 2.6, a  
 ValueError is raised instead, as np.nan can no longer be casted to  
 int.

 What should be the recommended behavior in this case ? Raise a  
 ValueError or some other exception, to follow the new Python2.6  
 convention, or silently replace np.nan by some value acceptable by  
 int dtype (0, or something else) ?


Second bump, sorry. Any consensus on what the behavior should be ?  
Raise a ValueError (even in 2.5, therefore risking to break something)  
or just go with the flow and switch np.nan to an acceptable value  
(like 0), under the hood ? I'd like to close the corresponding ticket...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] in(np.nan) on python 2.6

2008-12-04 Thread Pierre GM

On Dec 4, 2008, at 3:24 PM, [EMAIL PROTECTED] wrote:

 On Thu, Dec 4, 2008 at 2:40 PM, Jarrod Millman  
 [EMAIL PROTECTED] wrote:
 On Thu, Dec 4, 2008 at 11:14 AM, Pierre GM [EMAIL PROTECTED]  
 wrote:
 Raise a ValueError (even in 2.5, therefore risking to break  
 something)

 +1


 +1

OK then, I'll do that and update the SVN later tonight or early tmw...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM

On Dec 3, 2008, at 12:48 PM, Christopher Barker wrote:

 Pierre GM wrote:
 I can try, but in that case, please write me a unittest, so that I
 have a clear and unambiguous idea of what you expect.

 fair enough, though I'm not sure when I'll have time to do it.

Oh, don;t worry, nothing too fancy: give me a couple lines of input  
data and a line with what you expect. Using Ryan's recent example:

 f = StringIO('stid stnm relh tair\nnrmn 121 45 9.1')
  test = loadtxt(f, usecols=('stid', 'relh', 'tair'), names=True,  
dtype=None)
  control=array(('nrmn', 45, 9.0996),
 dtype=[('stid', '|S4'), ('relh', 'i8'), 
('tair', 'f8')])

That's quite enough for a test.

 I do wonder if anyone else thinks it would be useful to have multiple
 delimiters as an option. I got the idea because with fromfile(), if  
 you
 specify, say ',' as the delimiter, it won't use '\n', only  a comma,  
 so
 there is no way to quickly read a whole bunch of comma delimited  
 data like:

 1,2,3,4
 5,6,7,8
 

 so I'd like to be able to say to use either ',' or '\n' as the  
 delimiter.

I'm not quite sure I follow you.
Do you want to delimiters, one for the field of a record (','), one  
for the records (\n) ?




 However, if I understand loadtxt() correctly, it's handling the new
 lines separately anyway (to get a 2-d array), so this use case isn't  
 an
 issue. So how likely is it that someone would have:

 1  2  3, 4, 5
 6  7  8, 8, 9

 and want to read that into a single 2-d array?

With the current behaviour, you gonna have
[(1 2 3, 4, 5), (6 7 8, 8, 9)] if you use , as a delimiter,
[(1,2,3,,4,,5),(6,7,8,,8,,9)] if you use   as a delimiter.

Mixing delimiter is doable, but I don't think it's that a good idea.  
I'm in favor of sticking to one and only field delimiter, and the  
default line spearator for record delimiter. In other terms, not  
changing anythng.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM

On Dec 3, 2008, at 12:32 PM, Alan G Isaac wrote:

 If I know my data is already clean
 and is handled nicely by the
 old loadtxt, will I be able to turn
 off and the special handling in
 order to retain the old load speed?

Hopefully. I'm looking for the best way to do it. Do you have an  
example you could send me off-list so that I can play with timers ?  
Thx in advance.
P.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM

On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote:

 by the way, should this work:

 io.loadtxt('junk.dat', delimiter=' ')

 for more than one space between numbers, like:

 1  2  3  4   5
 6  7  8  9  10


On the version I'm working on, both delimiter='' and delimiter=None  
(default) would give you the expected output. delimiter=' ' would  
fail, delimiter='  ' would work.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM
Manuel,
Looks nice, I gonna try to see how I can incorporate yours. Note that  
returning np.nan by default will not work w/ Python 2.6 if you want an  
int...


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in ma.masked_all()?

2008-12-02 Thread Pierre GM
Eric,
That's quite a handful you have with this dtype...
So yes, the fix I gave works with nested dtypes and flexible dtypes  
with a simple name (string, not tuple). I'm a bit surprised with  
numpy, here.
Consider:

  dt.names
  ('P', 'D', 'T', 'w', 'S', 'sigtheta', 'theta')

So we lose the tuple and get a single string instead, corresponding to  
the right-hand element of the name..
But this single string is one of the keys of dt.fields, whereas the  
tuple is not. Puzzling. I'm sure there must be some reference in the  
numpy book, but I can't look for it now.

Anyway:
Prior to version 6127, make_mask_descr was substituting the 2nd  
element of each tuple of a dtype.descr by a bool. Which failed for  
nested dtypes. Now, we check the field corresponding to a name, which  
fails in our particular case.


I'll be working on it...



On Dec 2, 2008, at 1:59 AM, Eric Firing wrote:

 dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), 'f4'), (('  
 Depth [salt water, m]', 'D'), 'f4'), ((' Temperature [ITS-90, deg  
 C]', 'T'), 'f4'), ((' Descent Rate [m/s]', 'w'), 'f4'), (('  
 Salinity [PSU]', 'S'), 'f4'), ((' Density [sigma-theta, Kg/m^3]',  
 'sigtheta'), 'f4'), ((' Potential Temperature [ITS-90, deg C]',  
 'theta'), 'f4')])

 np.ma.zeros((2,2), dt)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in ma.masked_all()?

2008-12-02 Thread Pierre GM

On Dec 2, 2008, at 4:26 AM, Eric Firing wrote:

 From page 132 in the numpy book:

  The fields dictionary is indexed by keys that are the names of the
 fields. Each entry in the dictionary is a tuple fully describing the
 field: (dtype, offset[,title]). If present, the optional title can
 actually be any object (if it is string or unicode then it will also  
 be
 a key in the fields dictionary, otherwise it’s meta-data).

I should read it more often...


 I put the titles in as a sort of additional documentation, and  
 thinking
 that they might be useful for labeling plots;

That's actually quite a good idea...

 but it is rather hard to
 get the titles back out, since they are not directly accessible as an
 attribute, like names.  Probably I should just omit them.


We could perhaps try a function:
def gettitle(dtype, name):
 try:
 field = dtype.fields[name]
 except (TypeError, KeyError):
 return None
 else:
 if len(field)  2:
 return field[-1]
 return None


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Pierre GM

On Dec 2, 2008, at 3:12 PM, Ryan May wrote:

 Pierre GM wrote:
 Well, looks like the attachment is too big, so here's the
 implementation. The tests will come in another message.

 A couple of quick nitpicks:

 1) On line 186 (in the NameValidator class), you use
 excludelist.append() to append a list to the end of a list.  I think  
 you
 meant to use excludelist.extend()

Good call.

 2) When validating a list of names, why do you insist on lower casing
 them? (I'm referring to the call to lower() on line 207).  On one  
 hand,
 this would seem nicer than all upper case, but on the other hand this
 can cause confusion for someone who sees certain casing of names in  
 the
 file and expects that data to be laid out the same.

I recall a life where names were case-insensitives, so 'dates' and  
'Dates' and 'DATES' were the same field. It should be easy enough to  
get rid of that limitations, or add a parameter for case-sensitivity


On Dec 2, 2008, at 2:47 PM, Zachary Pincus wrote:

 Specifically, on line 115 in LineSplitter, we have:
 self.delimiter = delimiter.strip() or None
 so if I pass in, say, '\t' as the delimiter, self.delimiter gets set
 to None, which then causes the default behavior of any-whitespace-is-
 delimiter to be used. This makes lines like Gene Name\tPubMed ID
 \tStarting Position get split wrong, even when I explicitly pass in
 '\t' as the delimiter!

OK, I'll check that.


 I think that treating an explicitly-passed-in ' ' delimiter as
 identical to 'no delimiter' is a bad idea. If I say that ' ' is the
 delimiter, or '\t' is the delimiter, this should be treated *just*
 like ',' being the delimiter, where the expected output is:
 ['1', '2', '3', '4', '', '5']


Valid point.
Well, all, stay tuned for yet another yet another implementation...






 Other than those, it's working fine for me here.

 Ryan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Pierre GM
Chris,
I can try, but in that case, please write me a unittest, so that I  
have a clear and unambiguous idea of what you expect.
ANFSCD, have you tried the missing_values option ?


On Dec 2, 2008, at 5:36 PM, Christopher Barker wrote:

 Pierre GM wrote:
 I think that treating an explicitly-passed-in ' ' delimiter as
 identical to 'no delimiter' is a bad idea. If I say that ' ' is the
 delimiter, or '\t' is the delimiter, this should be treated *just*
 like ',' being the delimiter, where the expected output is:
 ['1', '2', '3', '4', '', '5']


 Valid point.
 Well, all, stay tuned for yet another yet another implementation...

 While we're at it, it might be nice to be able to pass in more than  
 one
 delimiter: ('\t',' '). though maybe that only combination that I'd
 really want would be something and '\n', which I think is being  
 treated
 specially already.

 -Chris




 -- 
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 [EMAIL PROTECTED]
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
All,

Please find attached to this message another implementation of  
np.loadtxt, which focuses on missing values. It's basically a  
combination of John Hunter's et al mlab.csv2rec, Ryan May's patches  
and pieces of code I'd been working on over the last few weeks.
Besides some helper classes (StringConverter to convert a string into  
something else, NameValidator to check names..._), you'll find 3  
functions:

* `genloadtxt` is the base function that makes all the work. It  
outputs 2 arrays, one for the data (missing values being substituted  
by the appropriate default) and one for the mask. It would go in  
np.lib.io

* `loadtxt` would replace the current np.loadtxt. It outputs a  
ndarray, where missing data being filled. It would also go in np.lib.io

* `mloadtxt` would go into np.ma.io (to be created) and renamed  
`loadtxt`. Right now, I needed a different name to avoid conflicts. It  
combines the outputs of `genloadtxt` into a single masked array.

You'll also several series of tests, that you can use as examples.

Please give it a try and send me some feedback (bugs, wishes,  
suggestions). I'd like it to make the 1.3.0 release (I need some of  
the functionalities to improve the corresponding function in  
scikits.timeseries, currently fubar...)

P.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM

And now for the tests:


Proposal : 
Here's an extension to np.loadtxt, designed to take missing values into account.



from genload_proposal import *
from numpy.ma.testutils import *
import StringIO

class TestLineSplitter(TestCase):
#
def test_nodelimiter(self):
Test LineSplitter w/o delimiter
strg =  1 2 3 4  5 # test
test = LineSplitter(' ')(strg)
assert_equal(test, ['1', '2', '3', '4', '5'])
test = LineSplitter()(strg)
assert_equal(test, ['1', '2', '3', '4', '5'])

#
def test_delimiter(self):
Test LineSplitter on delimiter
strg = 1,2,3,4,,5
test = LineSplitter(',')(strg)
assert_equal(test, ['1', '2', '3', '4', '', '5'])
#
strg =  1,2,3,4,,5 # test
test = LineSplitter(',')(strg)
assert_equal(test, ['1', '2', '3', '4', '', '5'])
#
strg =  1 2 3 4  5 # test
test = LineSplitter(' ')(strg)
assert_equal(test, ['1', '2', '3', '4', '5'])
#
def test_fixedwidth(self):
Test LineSplitter w/ fixed-width fields
strg =   1  2  3  4 5   # test
test = LineSplitter(3)(strg)
assert_equal(test, ['1', '2', '3', '4', '', '5', ''])
#
strg =   1 3  4  5  6# test
test = LineSplitter((3,6,6,3))(strg)
assert_equal(test, ['1', '3', '4  5', '6'])
#
strg =   1 3  4  5  6# test
test = LineSplitter((6,6,9))(strg)
assert_equal(test, ['1', '3  4', '5  6'])
#
strg =   1 3  4  5  6# test
test = LineSplitter(20)(strg)
assert_equal(test, ['1 3  4  5  6'])
#
strg =   1 3  4  5  6# test
test = LineSplitter(30)(strg)
assert_equal(test, ['1 3  4  5  6'])


class TestStringConverter(TestCase):
Test StringConverter
#
def test_creation(self):
Test creation of a StringConverter
converter = StringConverter(int, -9)
assert_equal(converter._status, 1)
assert_equal(converter.default, -9)
#
def test_upgrade(self):
Tests the upgrade method.
converter = StringConverter()
assert_equal(converter._status, 0)
converter.upgrade('0')
assert_equal(converter._status, 1)
converter.upgrade('0.')
assert_equal(converter._status, 2)
converter.upgrade('0j')
assert_equal(converter._status, 3)
converter.upgrade('a')
assert_equal(converter._status, len(converter._mapper)-1)
#
def test_missing(self):
Tests the use of missing values.
converter = StringConverter(missing_values=('missing','missed'))
converter.upgrade('0')
assert_equal(converter('0'), 0)
assert_equal(converter(''), converter.default)
assert_equal(converter('missing'), converter.default)
assert_equal(converter('missed'), converter.default)
try:
converter('miss')
except ValueError:
pass
#
def test_upgrademapper(self):
Tests updatemapper
import dateutil.parser
import datetime
dateparser = dateutil.parser.parse
StringConverter.upgrade_mapper(dateparser, datetime.date(2000,1,1))
convert = StringConverter(dateparser, datetime.date(2000, 1, 1))
test = convert('2001-01-01')
assert_equal(test, datetime.datetime(2001, 01, 01, 00, 00, 00))



class TestLoadTxt(TestCase):
#
def test_record(self):
Test w/ explicit dtype
data = StringIO.StringIO('1 2\n3 4')
#data.seek(0)
test = loadtxt(data, dtype=[('x', np.int32), ('y', np.int32)])
control = np.array([(1, 2), (3, 4)], dtype=[('x', 'i4'), ('y', 'i4')])
assert_equal(test, control)
#
data = StringIO.StringIO('M 64.0 75.0\nF 25.0 60.0')
#data.seek(0)
descriptor = {'names': ('gender','age','weight'),
  'formats': ('S1', 'i4', 'f4')}
control = np.array([('M', 64.0, 75.0), ('F', 25.0, 60.0)],
   dtype=descriptor)
test = loadtxt(data, dtype=descriptor)
assert_equal(test, control)

def test_array(self):
Test outputing a standard ndarray
data = StringIO.StringIO('1 2\n3 4')
control = np.array([[1,2],[3,4]], dtype=int)
test = loadtxt(data, dtype=int)
assert_array_equal(test, control)
#
data.seek(0)
control = np.array([[1,2],[3,4]], dtype=float)
test = np.loadtxt(data, dtype=float)
assert_array_equal(test, control)

def test_1D(self):
Test squeezing to 1D
control = np.array([1, 2, 3, 4], int)
#
data = StringIO.StringIO('1\n2\n3\n4\n')
test = loadtxt(data, dtype=int)
assert_array_equal(test, control)
#
data = StringIO.StringIO('1,2,3,4\n')
test = loadtxt(data, dtype=int, delimiter=',')

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
Well, looks like the attachment is too big, so here's the  
implementation. The tests will come in another message.



Proposal : 
Here's an extension to np.loadtxt, designed to take missing values into account.





import itertools
import numpy as np
import numpy.ma as ma


def _is_string_like(obj):

Check whether obj behaves like a string.

try:
obj + ''
except (TypeError, ValueError):
return False
return True

def _to_filehandle(fname, flag='r', return_opened=False):

Returns the filehandle corresponding to a string or a file.
If the string ends in '.gz', the file is automatically unzipped.

Parameters
--
fname : string, filehandle
Name of the file whose filehandle must be returned.
flag : string, optional
Flag indicating the status of the file ('r' for read, 'w' for write).
return_opened : boolean, optional
Whether to return the opening status of the file.

if _is_string_like(fname):
if fname.endswith('.gz'):
import gzip
fhd = gzip.open(fname, flag)
elif fname.endswith('.bz2'):
import bz2
fhd = bz2.BZ2File(fname)
else:
fhd = file(fname, flag)
opened = True
elif hasattr(fname, 'seek'):
fhd = fname
opened = False
else:
raise ValueError('fname must be a string or file handle')
if return_opened:
return fhd, opened
return fhd


def flatten_dtype(ndtype):

Unpack a structured data-type.


names = ndtype.names
if names is None:
return [ndtype]
else:
types = []
for field in names:
(typ, _) = ndtype.fields[field]
flat_dt = flatten_dtype(typ)
types.extend(flat_dt)
return types


def nested_masktype(datatype):

Construct the dtype of a mask for nested elements.


names = datatype.names
if names:
descr = []
for name in names:
(ndtype, _) = datatype.fields[name]
descr.append((name, nested_masktype(ndtype)))
return descr
# Is this some kind of composite a la (np.float,2)
elif datatype.subdtype:
mdescr = list(datatype.subdtype)
mdescr[0] = np.dtype(bool)
return tuple(mdescr)
else:
return np.bool


class LineSplitter:

Defines a function to split a string at a given delimiter or at given places.

Parameters
--
comment : {'#', string}
Character used to mark the beginning of a comment.
delimiter : var


def __init__(self, delimiter=None, comments='#'):
self.comments = comments
# Delimiter is a character
if delimiter is None:
self._isfixed = False
self.delimiter = None
elif _is_string_like(delimiter):
self._isfixed = False
self.delimiter = delimiter.strip() or None
# Delimiter is a list of field widths
elif hasattr(delimiter, '__iter__'):
self._isfixed = True
idx = np.cumsum([0]+list(delimiter))
self.slices = [slice(i,j) for (i,j) in zip(idx[:-1], idx[1:])]
# Delimiter is a single integer
elif int(delimiter):
self._isfixed = True
self.slices = None
self.delimiter = delimiter
else:
self._isfixed = False
self.delimiter = None
#
def __call__(self, line):
# Strip the comments
line = line.split(self.comments)[0]
if not line:
return []
# Fixed-width fields
if self._isfixed:
# Fields have different widths
if self.slices is None:
fixed = self.delimiter
slices = [slice(i, i+fixed)
  for i in range(len(line))[::fixed]]
else:
slices = self.slices
return [line[s].strip() for s in slices]
else:
return [s.strip() for s in line.split(self.delimiter)]


Splits the line at each current delimiter.
Comments are stripped beforehand.



class NameValidator:

Validates a list of strings to use as field names.
The strings are stripped of any non alphanumeric character, and spaces
are replaced by `_`.

During instantiation, the user can define a list of names to exclude, as 
well as a list of invalid characters. Names in the exclude list are appended
a '_' character.

Once an instance has been created, it can be called with a list of names
and a list of valid names will be created.
The `__call__` method accepts an optional keyword, `default`, that sets
the default name in case of ambiguity. By default, `default = 'f'`, so
that names will default to `f0`, `f1`

Parameters
--
excludelist : sequence, optional
A list of names to 

[Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
(Sorry about that, I pressed Reply instead of Reply all. Not my  
day for emails...)

 On Dec 1, 2008, at 1:54 PM, John Hunter wrote:

 It looks like I am doing something wrong -- trying to parse a CSV  
 file
 with dates formatted like '2008-10-14', with::

   import datetime, sys
   import dateutil.parser
   StringConverter.upgrade_mapper(dateutil.parser.parse,
 default=datetime.date(1900,1,1))
   r = loadtxt(sys.argv[1], delimiter=',', names=True)

 John,
 The problem you have is that the default dtype is 'float' (for  
 backwards compatibility w/ the original np.loadtxt). What you want  
 is to automatically change the dtype according to the content of  
 your file: you should use dtype=None

 r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None)

 As you'll want a recarray, we could make a np.records.loadtxt  
 function where dtype=None would be the default...

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM

On Dec 1, 2008, at 2:26 PM, John Hunter wrote

 OK, that worked great.  I do think some a default impl in np.rec which
 returned a recarray would be nice.  It might also be nice to have a
 method like np.rec.fromcsv which defaults to a delimiter=',',
 names=True and dtype=None.  Since csv is one of the most common data
 interchange format in  the world, it would be nice to have some
 obvious function that works with it with little or no customization
 required.


Quite agreed. Personally, I'd ditch the default dtype=float in favor  
of dtype=None, but compatibility is an issue.
However, if we all agree on genloadtxt, we can use tailored-made  
version in different modules, like you suggest.

There's an extra issue for which we have an solution I'm not  
completely satisfied with: names=True.
It might be simpler for basic user not to set names=True, and have the  
first header recognized as names or not if needed (by processing the  
first line after the others, and using it as header if it's found to  
be a list of names, or inserting it back at the beginning otherwise)...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
I agree, genloadtxt is a bit blotted, and it's not a surprise it's  
slower than the initial one. I think that in order to be fair,  
comparisons must be performed with matplotlib.mlab.csv2rec, that  
implements as well the autodetection of the dtype. I'm quite in favor  
of keeping a lite version around.



On Dec 1, 2008, at 4:47 PM, Stéfan van der Walt wrote:

 I haven't investigated the code in too much detail, but wouldn't it be
 possible to implement the current set of functionality in a
 base-class, which is then specialised to add the rest?  That way, one
 could always instantiate TextReader yourself for some added speed.

Well, one of the issues is that we need to keep the function  
compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not  
being able to go back to the beginning of a file (no call to .seek).  
Another issue comes from the possibility to define the dtype  
automatically: you need to keep track of the converters, then have to  
do a second loop on the data. Those converters are likely the  
bottleneck, as you need to check whether each value can be interpreted  
as missing or not and respond appropriately.

I thought about creating a base class, with a specific subclass taking  
care of the missing values. I found out it would have duplicated a lot  
of code


In any case, I think that's secondary: we can always optimize pieces  
of the code afterwards. I'd like more feedback on corner cases and  
usage...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in ma.masked_all()?

2008-12-01 Thread Pierre GM

On Dec 1, 2008, at 6:09 PM, Eric Firing wrote:

 Pierre,

 ma.masked_all does not seem to work with fancy dtypes and more then  
 one dimension:


Eric,
Should be fixed in SVN (r6130). There were indeed problems with nested  
dtypes. Tricky beasts they are.
Thanks for reporting!
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-28 Thread Pierre GM
Manuel,

Give me the week-end to come up with something. What you want is  
already doable with the current implementation of np.loadtxt, through  
the converter keyword. Support for missing data will be covered in a  
separate function, most likely to be put in numpy.ma.io at term.



On Nov 28, 2008, at 5:42 AM, Manuel Metz wrote:

 Pierre GM wrote:
 On Nov 27, 2008, at 3:08 AM, Manuel Metz wrote:
 Certainly, yes! Dealing with fixed-length fields would be necessary.
 The
 case I had in mind had both -- a separator (|) __and__ fixed- 
 length
 fields -- and is probably very special in that sense. But such
 data-files exists out there...

 Well, if you have a non-space delimiter, it doesn't matter if the
 fields have a fixed length or not, does it? Each field is stripped
 anyway.

 Yes. It would already be _very_ helpful (without changing loadtxt too
 much) if the current implementation uses a converter like this

 def fval(val):
 try:
 return float(val)
 except:
 return numpy.nan

 instead of float(val) by default.

 mm

 The real issue is when the delimiter is ' '... I should be able to
 take care of that over the week-end (which started earlier today over
 here :)
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-27 Thread Pierre GM

On Nov 27, 2008, at 3:08 AM, Manuel Metz wrote:


 Certainly, yes! Dealing with fixed-length fields would be necessary.  
 The
 case I had in mind had both -- a separator (|) __and__ fixed-length
 fields -- and is probably very special in that sense. But such
 data-files exists out there...

Well, if you have a non-space delimiter, it doesn't matter if the  
fields have a fixed length or not, does it? Each field is stripped  
anyway.
The real issue is when the delimiter is ' '... I should be able to  
take care of that over the week-end (which started earlier today over  
here :) 
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread Pierre GM

On Nov 26, 2008, at 5:55 PM, Ryan May wrote:

 Manuel Metz wrote:
 Ryan May wrote:
 3) Better support for missing values.  The docstring mentions a  
 way of
 handling missing values by passing in a converter.  The problem  
 with this is
 that you have to pass in a converter for *every column* that will  
 contain
 missing values.  If you have a text file with 50 columns, writing  
 this
 dictionary of converters seems like ugly and needless  
 boilerplate.  I'm
 unsure of how best to pass in both what values indicate missing  
 values and
 what values to fill in their place.  I'd love suggestions

 Hi Ryan,
   this would be a great feature to have !!!

About missing values:

* I don't think missing values should be supported in np.loadtxt. That  
should go into a specific np.ma.io.loadtxt function, a preview of  
which I posted earlier. I'll modify it taking Ryan's new function into  
account, and Chrisopher's suggestion (defining a dictionary {column  
name : missing values}.

* StringConverter already defines some default filling values for each  
dtype. In  np.ma.io.loadtxt, these values can be overwritten. Note  
that you should also be able to define a filling value by specifying a  
converter (think float(x or 0) for example)

* Missing values on space-separated fields are very tricky to handle:
take a line like a,,,d. With a comma as separator, it's clear that  
the 2nd and 3rd fields are missing.
Now, imagine that commas are actually spaces ( a d): 'd' is now  
seen as the 2nd field of a 2-field record, not as the 4th field of a 4- 
field record with 2 missing values. I thought about it, and kicked in  
touch

* That said, there should be a way to deal with fixed-length fields,  
probably by taking consecutive slices of the initial string. That way,  
we should be able to keep track of missing data...


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] What happened to numpy-docs ?

2008-11-26 Thread Pierre GM
All,
I'd like to update routines.ma.rst on the numpy/numpy-docs/trunk SVN,  
but the whole trunk seems to be MIA... Where has it gone ? How can I  
(where should I)  commit changes ?
Thx in advance.
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What happened to numpy-docs ?

2008-11-26 Thread Pierre GM

On Nov 27, 2008, at 12:32 AM, Robert Kern wrote:

 On Wed, Nov 26, 2008 at 23:27, Pierre GM [EMAIL PROTECTED] wrote:
 All,
 I'd like to update routines.ma.rst on the numpy/numpy-docs/trunk SVN,
 but the whole trunk seems to be MIA... Where has it gone ? How can I
 (where should I)  commit changes ?

 It got moved into the numpy trunk under docs/.

Duh... Guess I fell right at the time of the change.
Robert, thx a lot!
Pauli, do you think you could put your numpyext in the doc/ directory  
as well ?
Cheers,
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What happened to numpy-docs ?

2008-11-26 Thread Pierre GM

On Nov 27, 2008, at 1:39 AM, Scott Sinclair wrote:
 Looking at some recent changes made to docstrings in SVN by Pierre
 (r6110  r6111), these are not yet reflected in the doc wiki.

  Well, I haven't committed my version yet. I'm polishing a couple of  
issues with functions that are not recognized as such by inspect  
(because they're actually instances of a factory class).

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy on Mac OS X python 2.6

2008-11-25 Thread Pierre GM
FYI,
I can't reproduce David's failures on my machine (intel core2 duo w/  
10.5.5)
* python 2.6 from macports
* numpy svn 6098
* GCC 4.0.1 (Apple Inc. build 5488)

I have only 1 failure:
FAIL: test_umath.TestComplexFunctions.test_against_cmath
--
Traceback (most recent call last):
   File /opt/local/lib/python2.6/site-packages/nose-0.10.4-py2.6.egg/ 
nose/case.py, line 182, in runTest
 self.test(*self.arg)
   File /Users/pierregm/Computing/.pythonenvs/default26/lib/python2.6/ 
site-packages/numpy/core/tests/test_umath.py, line 423, in  
test_against_cmath
 assert abs(a - b)  atol, %s %s: %s; cmath: %s%(fname,p,a,b)
AssertionError: arcsin 2: (1.57079632679-1.31695789692j); cmath:  
(1.57079632679+1.31695789692j)

--

(Well, there's another one in numpy.ma.min, but that's a different  
matter).



On Nov 25, 2008, at 2:19 AM, David Cournapeau wrote:

 On Mon, 2008-11-24 at 22:06 -0700, Charles R Harris wrote:


 Well, it may not be that easy to figure.  The (generated)
 pyconfig-32.h has

 /* Define to 1 if your processor stores words with the most
 significant byte
first (like Motorola and SPARC, unlike Intel and VAX).

The block below does compile-time checking for endianness on
 platforms
that use GCC and therefore allows compiling fat binaries on OSX by
 using
'-arch ppc -arch i386' as the compile flags. The phrasing was
 choosen
such that the configure-result is used on systems that don't use
 GCC.
  */
 #ifdef __BIG_ENDIAN__
 #define WORDS_BIGENDIAN 1
 #else
 #ifndef __LITTLE_ENDIAN__
 /* #undef WORDS_BIGENDIAN */
 #endif
 #endif


 Hm, interesting: just by grepping, I do have WORDS_BIGENDIAN defined  
 to
 1 on *both* python 2.5 and python 2.6 on Mac OS X (running Intel).
 Looking closer, I do have the above code (conditional) in 2.5, but not
 in 2.6: it is inconditionally defined to BIGENDIAN on 2.6 !! That's
 actually part of something I have wondered for quite some time about  
 fat
 binaries: how do you handle config headers, since they are generated
 only once for every fat binary, but they should really be generated  
 for
 each arch.

 And I guess that __BIG_ENDIAN__  is a compiler flag, it isn't in any
 of the include files. In any case, this looks like a Python bug or  
 the
 Python folks have switched their API on us.

 Hm, actually, it is a bug in numpy as much as in python: python should
 NOT include any config.h in their public namespace, and we should not
 rely on it.

 But with this info, it should be relatively easy to fix (by setting  
 the
 correct endianness by ourselves with some detection code)

 David


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

Ryan,
FYI,  I've been coding over the last couple of weeks an extension of  
loadtxt for a better support of masked data, with the option to read  
column names in a header. Please find an example below (I also have  
unittest). Most of the work is actually inspired from matplotlib's  
mlab.csv2rec. It might be worth not duplicating efforts.

Cheers,
P.




:mod:`_preview`

A collection of utilities from incoming versions of numpy.ma





import itertools
import numpy as np
import numpy.ma as ma



_string_like = np.lib.io._string_like

def _to_filehandle(fname, flag='r', return_opened=False):

Returns the filehandle corresponding to a string or a file.
If the string ends in '.gz', the file is automatically unzipped.

Parameters
--
fname : string, filehandle
Name of the file whose filehandle must be returned.
flag : string, optional
Flag indicating the status of the file ('r' for read, 'w' for write).
return_opened : boolean, optional
Whether to return the opening status of the file.

if _string_like(fname):
if fname.endswith('.gz'):
import gzip
fhd = gzip.open(fname, flag)
else:
fhd = file(fname, flag)
opened = True
elif hasattr(fname, 'seek'):
fhd = fname
opened = False
else:
raise ValueError('fname must be a string or file handle')
if return_opened:
return fhd, opened
return fhd


def flatten_dtype(dtp):

Unpack a structured data-type.


if dtp.names is None:
return [dtp]
else:
types = []
for field in dtp.names:
(typ, _) = dtp.fields[field]
flat_dt = flatten_dtype(typ)
types.extend(flat_dt)
return types



class LineReader:

File reader that automatically split each line. This reader behaves like
an iterator.

Parameters
--
fhd : filehandle
File handle of the underlying file.
comment : string, optional
The character used to indicate the start of a comment.
delimiter : string, optional
The string used to separate values.  By default, this is any
whitespace.


#
def __init__(self, fhd, comment='#', delimiter=None):
self.fh = fhd
self.comment = comment
self.delimiter = delimiter
if delimiter == ' ':
self.delimiter = None
#
def close(self):
Close the current reader.
self.fh.close()
#
def seek(self, arg):

Moves to a new position in the file.

See Also

file.seek


self.fh.seek(arg)
#
def splitter(self, line):

Splits the line at each current delimiter.
Comments are stripped beforehand.

line = line.split(self.comment)[0].strip()
delimiter = self.delimiter
if line:
return line.split(delimiter)
else:
return []
#
def next(self):

Moves to the next line or raises :exc:StopIteration.

return self.splitter(self.fh.next())
#
def __iter__(self):
for line in self.fh:
yield self.splitter(line)

def readline(self):

Returns the next line of the file, splitted at the delimiter and stripped
of comments.

return self.splitter(self.fh.readline())

def skiprows(self, nbrows=1):

Skips `nbrows` from the file.

for i in range(nbrows):
self.fh.readline()

def get_first_valid_row(self):

Returns the values in the first valid (uncommented and not empty) line
of the file.

first_values = None
while not first_values:
first_line = self.fh.readline()
if first_line == '': # EOF reached
raise IOError('End-of-file reached before encountering data.')
first_values = self.splitter(first_line)
return first_values



itemdictionary = {'return': 'return_',
  'file':'file_',
  'print':'print_'
  }


def process_header(headers):

Validates a list of strings to use as field names.
The strings are stripped of any non alphanumeric character, and spaces
are replaced by `_`

#
# Define the characters to delete from the headers
delete = set([EMAIL PROTECTED]*()-=+~\|]}[{';: /?.,)
delete.add('')

names = []
seen = dict()
for i, item in enumerate(headers):
item = item.strip().lower().replace(' ', '_')
item = ''.join([c for c in item if c not in delete])
if not len(item):
item = 'column%d' % i

item = itemdictionary.get(item, item)
cnt = seen.get(item, 0)
if cnt  0:
names.append(item + '_%d'%cnt)
else:
names.append(item)
seen[item] = cnt+1
return 

[Numpy-discussion] in(np.nan) on python 2.6

2008-11-25 Thread Pierre GM
All,
Sorry to bump my own post, and I was kinda threadjacking anyway:

Some functions of numy.ma (eg, ma.max, ma.min...) accept explicit  
outputs that may not be MaskedArrays.
When such an explicit output is not a MaskedArray, a value that should  
have been masked is transformed into np.nan.

That worked great in 2.5, with np.nan automatically transformed to 0  
when the explicit output had a int dtype. With Python 2.6, a  
ValueError is raised instead, as np.nan can no longer be casted to int.

What should be the recommended behavior in this case ? Raise a  
ValueError or some other exception, to follow the new Python2.6  
convention, or silently replace np.nan by some value acceptable by int  
dtype (0, or something else) ?

Thanks for any suggestion,
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

On Nov 25, 2008, at 12:30 PM, Christopher Barker wrote:

 
 missing : string, optional
 A string representing a missing value, irrespective of the
 column where it appears (e.g., ``'missing'`` or ``'unused'``.
 

 It might be nice if missing could be a sequence of strings, if there
 is more than one value for missing values, that are not clearly mapped
 to a particular field.

OK, easy enough.

 
 missing_values : {None, dictionary}, optional
 A dictionary mapping a column number to a string indicating
 whether the corresponding field should be masked.
 

 would it possible to specify column header, rather than number here?

A la mlab.csv2rec ? It could work with a bit more tweaking, basically  
following John Hunter's et al. path. What happens when the column  
names are unknown (read from the header) or wrong ?

Actually, I'd like John to comment on that, hence the CC. More  
generally, wouldn't be useful to push the recarray manipulating  
functions from matplotlib.mlab to numpy ?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


<    1   2   3   4   5   6   7   >