Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-09 Thread Francesc Alted
A Sunday 08 February 2009, Neil escrigué:
  The first one (and most important IMO), is that newarr continues to
  be an structured array (BTW, when changed this name from the
  original record array?), and you can use all the features of these
  beasts with it.  Other reason (albeit a bit secondary) is that its
  data buffer can be shared through the array interface with other
  applications, or plain C code, in a relatively straightforward way.
   However, if newarr becomes a list (or dictionary), this is simply
  not possible.
 
  Cheers,

 That's not a sample use case ;)

 One of the things I love about Python is that it has a small core set
 of features and tries to avoid having many ways to do the same thing.
  This makes it extremely easy to learn.  With every new feature,
 numpy gets a little bit harder to learn, there's more to document and
 the code base gets larger and so harder to maintain.  In those
 senses, whenever you add a new function/feature to numpy, it gets a
 little bit worse.

Mmm, you have made another good point.  Actually, it is not very clear 
to me that adding too much functionality to NumPy is going to be a good 
idea for every case.  For example, lately I was thinking in that it 
would be a good idea to support column-wise structured arrays (the 
current ones are row-wise), but provided that they can be trivially 
reproduced with a combination of dictionaries and plain arrays I think 
now that implementing that in NumPy has not much sense.

Similarly, and as you said, having:

l = [rec[n] for n in ['name', 'age']]

or, if a dictionary is wanted instead:

d = dict((n,rec[n]) for n in ['name', 'age'])

would admittedly cover many of the needs of users.  In addition, one can 
get a record array easily from the above dictionary:

newrec = np.rec.fromarrays(d.values(), names=d.keys())

Having said that, I still see some value in implementing 
arr[['name', 'age']], but frankly, I'm not so sure now whether this 
idiom would be much better than:

d = dict((n,rec[n]) for n in ['name', 'age'])
newrec = np.rec.fromarrays(d.values(), names=d.keys())

or than the already implemented drop_fields() function in 
np.lib.recfunctions.

So, I'm +0 on the proposal now.

 So I think it would be nice to have some concrete examples of what
 the new feature will be useful for, just to show how it outweighs
 those negatives.  As a bonus, they'd provide nice examples to put in
 the documentation :).

Yeah, I completely agree that this would be a nice excercise to do: for 
every new asked feature, first look if it can be done easily with a 
combination of the current weaponeries of Python and NumPy together.
That would lead to a simple and powerful NumPy.

 PS.  Thanks for your work on pytables!  I've used it quite a bit,
 mostly for reading hdf5 files.

My pleasure.

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-08 Thread Neil
Francesc Alted faltet at pytables.org writes:

  What are some common use cases for this feature?
 
  I use structured arrays quite a lot, but I haven't found myself
  wanting something like this. If I do need a subset of a structured
  array generally I use something like
 
  [rec[n] for n in 'name age gender'.split()]
 
 Good point.  However, there are still some very valid reasons for having 
 an idiom like:
 
 newarr = arr[['name', 'age']]
 
 returning a record array.
 
 The first one (and most important IMO), is that newarr continues to be 
 an structured array (BTW, when changed this name from the original 
 record array?), and you can use all the features of these beasts with 
 it.  Other reason (albeit a bit secondary) is that its data buffer can 
 be shared through the array interface with other applications, or plain 
 C code, in a relatively straightforward way.  However, if newarr 
 becomes a list (or dictionary), this is simply not possible.
 
 Cheers,
 

That's not a sample use case ;)

One of the things I love about Python is that it has a small core set of
features and tries to avoid having many ways to do the same thing.  This makes
it extremely easy to learn.  With every new feature, numpy gets a little bit
harder to learn, there's more to document and the code base gets larger and so
harder to maintain.  In those senses, whenever you add a new function/feature to
numpy, it gets a little bit worse.

So I think it would be nice to have some concrete examples of what the new
feature will be useful for, just to show how it outweighs those negatives.  As a
bonus, they'd provide nice examples to put in the documentation :).

Neil

PS.  Thanks for your work on pytables!  I've used it quite a bit, mostly for
reading hdf5 files.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-07 Thread Neil
Travis E. Oliphant oliphant at enthought.com writes:

 I've been fairly quiet on this list for awhile due to work and family 
 schedule, but I think about how things can improve regularly.One 
 feature that's been requested by a few people is the ability to select 
 multiple fields from a structured array.
 
 Thus,  suppose *arr* is a structured array with dtype:
 
 [('name', 'S25'),
   ('height', float),
   ('age', int),
   ('gender', 'S8')
 ]
 
 Then,  newarr = arr[['name', 'age']]  should be a structured array with 
 just the name and age fields.
 

What are some common use cases for this feature?

I use structured arrays quite a lot, but I haven't found myself wanting
something like this. If I do need a subset of a structured array generally I use
something like 

[rec[n] for n in 'name age gender'.split()]

For me that use case doesn't come up very often though. 


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-07 Thread Francesc Alted
A Saturday 07 February 2009, Neil escrigué:
 Travis E. Oliphant oliphant at enthought.com writes:
  I've been fairly quiet on this list for awhile due to work and
  family schedule, but I think about how things can improve
  regularly.One feature that's been requested by a few people is
  the ability to select multiple fields from a structured array.
 
  Thus,  suppose *arr* is a structured array with dtype:
 
  [('name', 'S25'),
('height', float),
('age', int),
('gender', 'S8')
  ]
 
  Then,  newarr = arr[['name', 'age']]  should be a structured array
  with just the name and age fields.

 What are some common use cases for this feature?

 I use structured arrays quite a lot, but I haven't found myself
 wanting something like this. If I do need a subset of a structured
 array generally I use something like

 [rec[n] for n in 'name age gender'.split()]

Good point.  However, there are still some very valid reasons for having 
an idiom like:

newarr = arr[['name', 'age']]

returning a record array.

The first one (and most important IMO), is that newarr continues to be 
an structured array (BTW, when changed this name from the original 
record array?), and you can use all the features of these beasts with 
it.  Other reason (albeit a bit secondary) is that its data buffer can 
be shared through the array interface with other applications, or plain 
C code, in a relatively straightforward way.  However, if newarr 
becomes a list (or dictionary), this is simply not possible.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-07 Thread Travis E. Oliphant
Francesc Alted wrote:
 A Saturday 07 February 2009, Neil escrigué:
   
 Travis E. Oliphant oliphant at enthought.com writes:
 
 I've been fairly quiet on this list for awhile due to work and
 family schedule, but I think about how things can improve
 regularly.One feature that's been requested by a few people is
 the ability to select multiple fields from a structured array.

 Thus,  suppose *arr* is a structured array with dtype:

 [('name', 'S25'),
   ('height', float),
   ('age', int),
   ('gender', 'S8')
 ]

 Then,  newarr = arr[['name', 'age']]  should be a structured array
 with just the name and age fields.
   
 What are some common use cases for this feature?

 I use structured arrays quite a lot, but I haven't found myself
 wanting something like this. If I do need a subset of a structured
 array generally I use something like

 [rec[n] for n in 'name age gender'.split()]
 

 Good point.  However, there are still some very valid reasons for having 
 an idiom like:

 newarr = arr[['name', 'age']]

 returning a record array.

 The first one (and most important IMO), is that newarr continues to be 
 an structured array (BTW, when changed this name from the original 
 record array?), 
To avoid confusion with the record array subclass which maps 
attributes to fields, Eric Jones and I have been using this terminology 
for about a year. 

-Travis



-- 

Travis Oliphant
Enthought, Inc.
(512) 536-1057 (office)
(512) 536-1059 (fax)
http://www.enthought.com
oliph...@enthought.com

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-06 Thread Stéfan van der Walt
Hi Travis

2009/2/6 Travis Oliphant oliph...@enthought.com:
 Thus  newarr = arr[['name', 'age']].copy() would be exactly the same
 size as arr because elements are  copied wholesale and each row is a
 single element in the NumPy array.Some infrastructure would have to
 be implemented at a fundamental level to handle partial-element
 manipulation similar at least in spirit to what is needed to handle
 bit-level striding on a fundamental level.

I like your suggestion!  Can you think of a way to implement #2 with
the correct copy semantics?  Being able to create a view without
copying is such a big plus that it is worth considering, even at an
implementation cost.

Regards
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-05 Thread Pierre GM

On Feb 5, 2009, at 6:08 PM, Travis E. Oliphant wrote:


 Hi all,

 I've been fairly quiet on this list for awhile due to work and family
 schedule, but I think about how things can improve regularly.One
 feature that's been requested by a few people is the ability to select
 multiple fields from a structured array.


 [...]

+1 for #2.

Note that we now have a drop_fields function in np.lib.recfunctions, a  
reimplementation of the equivalent function in matplotlib. It works  
along the lines of your proposition #1 (create a new array w/ a new  
dtype and fill it)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-05 Thread Travis Oliphant
Pierre GM wrote:
 On Feb 5, 2009, at 6:08 PM, Travis E. Oliphant wrote:

   
 Hi all,

 I've been fairly quiet on this list for awhile due to work and family
 schedule, but I think about how things can improve regularly.One
 feature that's been requested by a few people is the ability to select
 multiple fields from a structured array.
 

   
 [...]
 

 +1 for #2.

 Note that we now have a drop_fields function in np.lib.recfunctions, a  
 reimplementation of the equivalent function in matplotlib. It works  
 along the lines of your proposition #1 (create a new array w/ a new  
 dtype and fill it)
   

After more thought, I think I was too eager in my suggestion of #2.  
It's actually not really possible to do a view the way I would want it 
to work.   It would be possible to create a data-type with 
hidden-fields, but a copy would be not get rid of the extra data.

Thus  newarr = arr[['name', 'age']].copy() would be exactly the same 
size as arr because elements are  copied wholesale and each row is a 
single element in the NumPy array.Some infrastructure would have to 
be implemented at a fundamental level to handle partial-element 
manipulation similar at least in spirit to what is needed to handle 
bit-level striding on a fundamental level.  

Also, I don't remember if we resolved how hidden fields would be shown 
in the array interface.  

So, I think that we may be stuck with #1 which at least is consistent 
with the fancy-indexing is a copy pattern (and is just syntatic sugar 
for capability you've already implemented in recfunctions).

-Travis

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Selection of only a certain number of fields

2009-02-05 Thread Francesc Alted
A Friday 06 February 2009, Travis Oliphant escrigué:
 Pierre GM wrote:
  On Feb 5, 2009, at 6:08 PM, Travis E. Oliphant wrote:
  Hi all,
 
  I've been fairly quiet on this list for awhile due to work and
  family schedule, but I think about how things can improve
  regularly.One feature that's been requested by a few people is
  the ability to select multiple fields from a structured array.
 
 
 
  [...]
 
  +1 for #2.
 
  Note that we now have a drop_fields function in
  np.lib.recfunctions, a reimplementation of the equivalent function
  in matplotlib. It works along the lines of your proposition #1
  (create a new array w/ a new dtype and fill it)

 After more thought, I think I was too eager in my suggestion of #2.
 It's actually not really possible to do a view the way I would want
 it to work.   It would be possible to create a data-type with
 hidden-fields, but a copy would be not get rid of the extra data.

 Thus  newarr = arr[['name', 'age']].copy() would be exactly the same
 size as arr because elements are  copied wholesale and each row is
 a single element in the NumPy array.Some infrastructure would
 have to be implemented at a fundamental level to handle
 partial-element manipulation similar at least in spirit to what is
 needed to handle bit-level striding on a fundamental level.

 Also, I don't remember if we resolved how hidden fields would be
 shown in the array interface.

 So, I think that we may be stuck with #1 which at least is consistent
 with the fancy-indexing is a copy pattern (and is just syntatic
 sugar for capability you've already implemented in recfunctions).

Mmh, I'd also vote for #2 for performance reasons, but as the 
implementation seems quite involved, I suppose that #1 would be great 
too.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion