[Numpy-discussion] recfunctions.stack_arrays

2009-01-27 Thread Ryan May
Pierre (or anyone else who cares to chime in),

I'm using stack_arrays to combine data from two different files into a single
array.  In one of these files, the data from one entire record comes back
missing, which, thanks to your recent change, ends up having a boolean dtype.
There is actual data for this same field in the 2nd file, so it ends up having
the dtype of float64.  When I try to combine the two arrays, I end up with the
following traceback:

data = stack_arrays((old_data, data))
  File /home/rmay/.local/lib64/python2.5/site-packages/metpy/cbook.py, line
260, in stack_arrays
output = ma.masked_all((np.sum(nrecords),), newdescr)
  File /home/rmay/.local/lib64/python2.5/site-packages/numpy/ma/extras.py, 
line
79, in masked_all
a = masked_array(np.empty(shape, dtype),
ValueError: two fields with the same name

Which is unsurprising.  Do you think there is any reasonable way to get
stack_arrays() to find a common dtype for fields with the same name?  Or another
suggestion on how to approach this?  If you think coercing one/both of the 
fields
to a common dtype is the way to go, just point me to a function that could 
figure
out the dtype and I'll try to put together a patch.

Thanks,

Ryan

P.S.  Thanks so much for your work on putting those utility functions in
recfunctions.py  It makes it so much easier to have these functions available in
the library itself rather than needing to reinvent the wheel over and over.

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recfunctions.stack_arrays

2009-01-27 Thread Pierre GM
[Some background: we're talking about numpy.lib.recfunctions, a set of  
functions to manipulate structured arrays]

Ryan,
If the two files have the same structure, you can use that fact and  
specify the dtype of the output directly with the dtype parameter of  
mafromtxt. That way, you're sure that the two arrays will have the  
same dtype. If you don't know the structure beforehand, you could try  
to load one array and use its dtype as input of mafromtxt to load the  
second one.
Now, we could also try to modify stack_arrays so that it would take  
the largest dtype when several fields have the same name. I'm not  
completely satisfied by this approach, as it makes dtype conversions  
under the hood. Maybe we could provide the functionality as an option  
(w/ a forced_conversion boolean input parameter) ?
I'm a bit surprised by the error message you get. If I try:

  a = ma.array([(1,2,3)], mask=[(0,1,0)], dtype=[('a',int),  
('b',bool), ('c',float)])
  b = ma.array([(4, 5, 6)], dtype=[('a', int), ('b', float), ('c',  
float)])
  test = np.stack_arrays((a, b))

I get a TypeError instead (the field 'b' hasn't the same type in a and  
b). Now, I get the 'two fields w/ the same name' when I use  
np.merge_arrays (with the flatten option). Could you send a small  
example ?


 P.S.  Thanks so much for your work on putting those utility  
 functions in
 recfunctions.py  It makes it so much easier to have these functions  
 available in
 the library itself rather than needing to reinvent the wheel over  
 and over.

Indeed. Note that most of the job had been done by John Hunter and the  
matplotlib developer in their matplotlib.mlab module, so you should  
thank them and not me. I just cleaned up some of the functions.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recfunctions.stack_arrays

2009-01-27 Thread Pierre GM

On Jan 27, 2009, at 4:23 PM, Ryan May wrote:


 I definitely wouldn't advocate magic by default, but I think it  
 would be nice to
 be able to get the functionality if one wanted to.

OK. Put on the TODO list.


 There is one problem I
 noticed, however.  I found common_type and lib.mintypecode, but both  
 raise errors
 when trying to find a dtype to match both bool and float.  I don't  
 know if
 there's another function somewhere that would work for what I want.

I'm not familiar with these functions, I'll check that.

 Apparently, I get my error as a result of my use of titles in the  
 dtype to store
 an alternate name for the field.  (If you're not familiar with  
 titles, they're
 nice because you can get fields by either name, so for the following  
 example,
 a['a'] and a['A'] both return array([1]).)  The following version of  
 your case
 gives me the ValueError:

Ah OK. You found a bug. There's a frustrating feature of dtypes:  
dtype.names doesn't always match [_[0] for _ in dtype.descr].


 As a side question, do you have some local mods to your numpy SVN so  
 that some of
 the functions in recfunctions are available in numpy's top level?

Probably. I used the develop option of setuptools to install numpy on  
a virtual environment.

 On mine, I
 can't get to them except by importing them from  
 numpy.lib.recfunctions.  I don't
 see any mention of recfunctions in lib/__init__.py.


Well, till some problems are ironed out, I'm not really in favor of  
advertising them too much...

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion