Re: [Numpy-discussion] Selection of only a certain number of fields
A Sunday 08 February 2009, Neil escrigué: The first one (and most important IMO), is that newarr continues to be an structured array (BTW, when changed this name from the original record array?), and you can use all the features of these beasts with it. Other reason (albeit a bit secondary) is that its data buffer can be shared through the array interface with other applications, or plain C code, in a relatively straightforward way. However, if newarr becomes a list (or dictionary), this is simply not possible. Cheers, That's not a sample use case ;) One of the things I love about Python is that it has a small core set of features and tries to avoid having many ways to do the same thing. This makes it extremely easy to learn. With every new feature, numpy gets a little bit harder to learn, there's more to document and the code base gets larger and so harder to maintain. In those senses, whenever you add a new function/feature to numpy, it gets a little bit worse. Mmm, you have made another good point. Actually, it is not very clear to me that adding too much functionality to NumPy is going to be a good idea for every case. For example, lately I was thinking in that it would be a good idea to support column-wise structured arrays (the current ones are row-wise), but provided that they can be trivially reproduced with a combination of dictionaries and plain arrays I think now that implementing that in NumPy has not much sense. Similarly, and as you said, having: l = [rec[n] for n in ['name', 'age']] or, if a dictionary is wanted instead: d = dict((n,rec[n]) for n in ['name', 'age']) would admittedly cover many of the needs of users. In addition, one can get a record array easily from the above dictionary: newrec = np.rec.fromarrays(d.values(), names=d.keys()) Having said that, I still see some value in implementing arr[['name', 'age']], but frankly, I'm not so sure now whether this idiom would be much better than: d = dict((n,rec[n]) for n in ['name', 'age']) newrec = np.rec.fromarrays(d.values(), names=d.keys()) or than the already implemented drop_fields() function in np.lib.recfunctions. So, I'm +0 on the proposal now. So I think it would be nice to have some concrete examples of what the new feature will be useful for, just to show how it outweighs those negatives. As a bonus, they'd provide nice examples to put in the documentation :). Yeah, I completely agree that this would be a nice excercise to do: for every new asked feature, first look if it can be done easily with a combination of the current weaponeries of Python and NumPy together. That would lead to a simple and powerful NumPy. PS. Thanks for your work on pytables! I've used it quite a bit, mostly for reading hdf5 files. My pleasure. -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
Francesc Alted faltet at pytables.org writes: What are some common use cases for this feature? I use structured arrays quite a lot, but I haven't found myself wanting something like this. If I do need a subset of a structured array generally I use something like [rec[n] for n in 'name age gender'.split()] Good point. However, there are still some very valid reasons for having an idiom like: newarr = arr[['name', 'age']] returning a record array. The first one (and most important IMO), is that newarr continues to be an structured array (BTW, when changed this name from the original record array?), and you can use all the features of these beasts with it. Other reason (albeit a bit secondary) is that its data buffer can be shared through the array interface with other applications, or plain C code, in a relatively straightforward way. However, if newarr becomes a list (or dictionary), this is simply not possible. Cheers, That's not a sample use case ;) One of the things I love about Python is that it has a small core set of features and tries to avoid having many ways to do the same thing. This makes it extremely easy to learn. With every new feature, numpy gets a little bit harder to learn, there's more to document and the code base gets larger and so harder to maintain. In those senses, whenever you add a new function/feature to numpy, it gets a little bit worse. So I think it would be nice to have some concrete examples of what the new feature will be useful for, just to show how it outweighs those negatives. As a bonus, they'd provide nice examples to put in the documentation :). Neil PS. Thanks for your work on pytables! I've used it quite a bit, mostly for reading hdf5 files. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
Travis E. Oliphant oliphant at enthought.com writes: I've been fairly quiet on this list for awhile due to work and family schedule, but I think about how things can improve regularly.One feature that's been requested by a few people is the ability to select multiple fields from a structured array. Thus, suppose *arr* is a structured array with dtype: [('name', 'S25'), ('height', float), ('age', int), ('gender', 'S8') ] Then, newarr = arr[['name', 'age']] should be a structured array with just the name and age fields. What are some common use cases for this feature? I use structured arrays quite a lot, but I haven't found myself wanting something like this. If I do need a subset of a structured array generally I use something like [rec[n] for n in 'name age gender'.split()] For me that use case doesn't come up very often though. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
A Saturday 07 February 2009, Neil escrigué: Travis E. Oliphant oliphant at enthought.com writes: I've been fairly quiet on this list for awhile due to work and family schedule, but I think about how things can improve regularly.One feature that's been requested by a few people is the ability to select multiple fields from a structured array. Thus, suppose *arr* is a structured array with dtype: [('name', 'S25'), ('height', float), ('age', int), ('gender', 'S8') ] Then, newarr = arr[['name', 'age']] should be a structured array with just the name and age fields. What are some common use cases for this feature? I use structured arrays quite a lot, but I haven't found myself wanting something like this. If I do need a subset of a structured array generally I use something like [rec[n] for n in 'name age gender'.split()] Good point. However, there are still some very valid reasons for having an idiom like: newarr = arr[['name', 'age']] returning a record array. The first one (and most important IMO), is that newarr continues to be an structured array (BTW, when changed this name from the original record array?), and you can use all the features of these beasts with it. Other reason (albeit a bit secondary) is that its data buffer can be shared through the array interface with other applications, or plain C code, in a relatively straightforward way. However, if newarr becomes a list (or dictionary), this is simply not possible. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
Francesc Alted wrote: A Saturday 07 February 2009, Neil escrigué: Travis E. Oliphant oliphant at enthought.com writes: I've been fairly quiet on this list for awhile due to work and family schedule, but I think about how things can improve regularly.One feature that's been requested by a few people is the ability to select multiple fields from a structured array. Thus, suppose *arr* is a structured array with dtype: [('name', 'S25'), ('height', float), ('age', int), ('gender', 'S8') ] Then, newarr = arr[['name', 'age']] should be a structured array with just the name and age fields. What are some common use cases for this feature? I use structured arrays quite a lot, but I haven't found myself wanting something like this. If I do need a subset of a structured array generally I use something like [rec[n] for n in 'name age gender'.split()] Good point. However, there are still some very valid reasons for having an idiom like: newarr = arr[['name', 'age']] returning a record array. The first one (and most important IMO), is that newarr continues to be an structured array (BTW, when changed this name from the original record array?), To avoid confusion with the record array subclass which maps attributes to fields, Eric Jones and I have been using this terminology for about a year. -Travis -- Travis Oliphant Enthought, Inc. (512) 536-1057 (office) (512) 536-1059 (fax) http://www.enthought.com oliph...@enthought.com ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
Hi Travis 2009/2/6 Travis Oliphant oliph...@enthought.com: Thus newarr = arr[['name', 'age']].copy() would be exactly the same size as arr because elements are copied wholesale and each row is a single element in the NumPy array.Some infrastructure would have to be implemented at a fundamental level to handle partial-element manipulation similar at least in spirit to what is needed to handle bit-level striding on a fundamental level. I like your suggestion! Can you think of a way to implement #2 with the correct copy semantics? Being able to create a view without copying is such a big plus that it is worth considering, even at an implementation cost. Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
On Feb 5, 2009, at 6:08 PM, Travis E. Oliphant wrote: Hi all, I've been fairly quiet on this list for awhile due to work and family schedule, but I think about how things can improve regularly.One feature that's been requested by a few people is the ability to select multiple fields from a structured array. [...] +1 for #2. Note that we now have a drop_fields function in np.lib.recfunctions, a reimplementation of the equivalent function in matplotlib. It works along the lines of your proposition #1 (create a new array w/ a new dtype and fill it) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
Pierre GM wrote: On Feb 5, 2009, at 6:08 PM, Travis E. Oliphant wrote: Hi all, I've been fairly quiet on this list for awhile due to work and family schedule, but I think about how things can improve regularly.One feature that's been requested by a few people is the ability to select multiple fields from a structured array. [...] +1 for #2. Note that we now have a drop_fields function in np.lib.recfunctions, a reimplementation of the equivalent function in matplotlib. It works along the lines of your proposition #1 (create a new array w/ a new dtype and fill it) After more thought, I think I was too eager in my suggestion of #2. It's actually not really possible to do a view the way I would want it to work. It would be possible to create a data-type with hidden-fields, but a copy would be not get rid of the extra data. Thus newarr = arr[['name', 'age']].copy() would be exactly the same size as arr because elements are copied wholesale and each row is a single element in the NumPy array.Some infrastructure would have to be implemented at a fundamental level to handle partial-element manipulation similar at least in spirit to what is needed to handle bit-level striding on a fundamental level. Also, I don't remember if we resolved how hidden fields would be shown in the array interface. So, I think that we may be stuck with #1 which at least is consistent with the fancy-indexing is a copy pattern (and is just syntatic sugar for capability you've already implemented in recfunctions). -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Selection of only a certain number of fields
A Friday 06 February 2009, Travis Oliphant escrigué: Pierre GM wrote: On Feb 5, 2009, at 6:08 PM, Travis E. Oliphant wrote: Hi all, I've been fairly quiet on this list for awhile due to work and family schedule, but I think about how things can improve regularly.One feature that's been requested by a few people is the ability to select multiple fields from a structured array. [...] +1 for #2. Note that we now have a drop_fields function in np.lib.recfunctions, a reimplementation of the equivalent function in matplotlib. It works along the lines of your proposition #1 (create a new array w/ a new dtype and fill it) After more thought, I think I was too eager in my suggestion of #2. It's actually not really possible to do a view the way I would want it to work. It would be possible to create a data-type with hidden-fields, but a copy would be not get rid of the extra data. Thus newarr = arr[['name', 'age']].copy() would be exactly the same size as arr because elements are copied wholesale and each row is a single element in the NumPy array.Some infrastructure would have to be implemented at a fundamental level to handle partial-element manipulation similar at least in spirit to what is needed to handle bit-level striding on a fundamental level. Also, I don't remember if we resolved how hidden fields would be shown in the array interface. So, I think that we may be stuck with #1 which at least is consistent with the fancy-indexing is a copy pattern (and is just syntatic sugar for capability you've already implemented in recfunctions). Mmh, I'd also vote for #2 for performance reasons, but as the implementation seems quite involved, I suppose that #1 would be great too. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion