[Numpy-discussion] recfunctions.stack_arrays
Pierre (or anyone else who cares to chime in), I'm using stack_arrays to combine data from two different files into a single array. In one of these files, the data from one entire record comes back missing, which, thanks to your recent change, ends up having a boolean dtype. There is actual data for this same field in the 2nd file, so it ends up having the dtype of float64. When I try to combine the two arrays, I end up with the following traceback: data = stack_arrays((old_data, data)) File /home/rmay/.local/lib64/python2.5/site-packages/metpy/cbook.py, line 260, in stack_arrays output = ma.masked_all((np.sum(nrecords),), newdescr) File /home/rmay/.local/lib64/python2.5/site-packages/numpy/ma/extras.py, line 79, in masked_all a = masked_array(np.empty(shape, dtype), ValueError: two fields with the same name Which is unsurprising. Do you think there is any reasonable way to get stack_arrays() to find a common dtype for fields with the same name? Or another suggestion on how to approach this? If you think coercing one/both of the fields to a common dtype is the way to go, just point me to a function that could figure out the dtype and I'll try to put together a patch. Thanks, Ryan P.S. Thanks so much for your work on putting those utility functions in recfunctions.py It makes it so much easier to have these functions available in the library itself rather than needing to reinvent the wheel over and over. -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] recfunctions.stack_arrays
[Some background: we're talking about numpy.lib.recfunctions, a set of functions to manipulate structured arrays] Ryan, If the two files have the same structure, you can use that fact and specify the dtype of the output directly with the dtype parameter of mafromtxt. That way, you're sure that the two arrays will have the same dtype. If you don't know the structure beforehand, you could try to load one array and use its dtype as input of mafromtxt to load the second one. Now, we could also try to modify stack_arrays so that it would take the largest dtype when several fields have the same name. I'm not completely satisfied by this approach, as it makes dtype conversions under the hood. Maybe we could provide the functionality as an option (w/ a forced_conversion boolean input parameter) ? I'm a bit surprised by the error message you get. If I try: a = ma.array([(1,2,3)], mask=[(0,1,0)], dtype=[('a',int), ('b',bool), ('c',float)]) b = ma.array([(4, 5, 6)], dtype=[('a', int), ('b', float), ('c', float)]) test = np.stack_arrays((a, b)) I get a TypeError instead (the field 'b' hasn't the same type in a and b). Now, I get the 'two fields w/ the same name' when I use np.merge_arrays (with the flatten option). Could you send a small example ? P.S. Thanks so much for your work on putting those utility functions in recfunctions.py It makes it so much easier to have these functions available in the library itself rather than needing to reinvent the wheel over and over. Indeed. Note that most of the job had been done by John Hunter and the matplotlib developer in their matplotlib.mlab module, so you should thank them and not me. I just cleaned up some of the functions. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] recfunctions.stack_arrays
On Jan 27, 2009, at 4:23 PM, Ryan May wrote: I definitely wouldn't advocate magic by default, but I think it would be nice to be able to get the functionality if one wanted to. OK. Put on the TODO list. There is one problem I noticed, however. I found common_type and lib.mintypecode, but both raise errors when trying to find a dtype to match both bool and float. I don't know if there's another function somewhere that would work for what I want. I'm not familiar with these functions, I'll check that. Apparently, I get my error as a result of my use of titles in the dtype to store an alternate name for the field. (If you're not familiar with titles, they're nice because you can get fields by either name, so for the following example, a['a'] and a['A'] both return array([1]).) The following version of your case gives me the ValueError: Ah OK. You found a bug. There's a frustrating feature of dtypes: dtype.names doesn't always match [_[0] for _ in dtype.descr]. As a side question, do you have some local mods to your numpy SVN so that some of the functions in recfunctions are available in numpy's top level? Probably. I used the develop option of setuptools to install numpy on a virtual environment. On mine, I can't get to them except by importing them from numpy.lib.recfunctions. I don't see any mention of recfunctions in lib/__init__.py. Well, till some problems are ironed out, I'm not really in favor of advertising them too much... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion