Re: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding)

2018-01-30 Thread Joe

Does someone know of a function or a convenient way to automatically
derive a dtype object from a C typedef struct string or a cffi.typeof()?

Am 27.01.2018 10:30 schrieb Joe:

Thanks for your help on this! This solved my issue.


Am 25.01.2018 um 19:01 schrieb Allan Haldane:
There is a new section discussing alignment in the numpy 1.14 
structured

array docs, which has some hints about interfacing with C structs.

These new 1.14 docs are not online yet on scipy.org, but in the 
meantime

  you can view them here:
https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment

(That links specifically to the discussion of alignments and padding).

Allan

On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote:




The numpy dtype constructor takes an “align” keyword that will pad 
it

for you.


https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html

-CHB



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extending C with Python

2018-01-30 Thread Jialin Liu
Amazing! It works! Thank you Robert.

I've been stuck with this many days.

Best,
Jialin
LBNL/NERSC

On Tue, Jan 30, 2018 at 10:52 PM, Robert Kern  wrote:

> On Wed, Jan 31, 2018 at 3:25 PM, Jialin Liu  wrote:
>
>> Hello,
>> I'm extending C with python (which is opposite way of what people usually
>> do, extending python with C), I'm currently stuck in passing a C array to
>> python layer, could anyone plz advise?
>>
>> I have a C buffer in my C code and want to pass it to a python function.
>> In the C code, I have:
>>
>> npy_intp  dims [2];
>>> dims[0] = 10;
>>> dims[1] = 20;
>>> import_array();
>>> npy_intp m=2;
>>> PyObject * py_dims = PyArray_SimpleNewFromData(1, &m, NPY_INT16 ,(void
>>> *)dims ); // I also tried NPY_INT
>>> PyObject_CallMethod(pInstance, method_name, "O", py_dims);
>>
>>
>> In the Python code, I want to just print that array:
>>
>> def f(self, dims):
>>
>>print ("np array:%d,%d"%(dims[0],dims[1]))
>>
>>
>>
>> But it only prints the first number correctly, i.e., dims[0]. The second
>> number is always 0.
>>
>
> The correct typecode would be NPY_INTP.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extending C with Python

2018-01-30 Thread Robert Kern
On Wed, Jan 31, 2018 at 3:25 PM, Jialin Liu  wrote:

> Hello,
> I'm extending C with python (which is opposite way of what people usually
> do, extending python with C), I'm currently stuck in passing a C array to
> python layer, could anyone plz advise?
>
> I have a C buffer in my C code and want to pass it to a python function.
> In the C code, I have:
>
> npy_intp  dims [2];
>> dims[0] = 10;
>> dims[1] = 20;
>> import_array();
>> npy_intp m=2;
>> PyObject * py_dims = PyArray_SimpleNewFromData(1, &m, NPY_INT16 ,(void
>> *)dims ); // I also tried NPY_INT
>> PyObject_CallMethod(pInstance, method_name, "O", py_dims);
>
>
> In the Python code, I want to just print that array:
>
> def f(self, dims):
>
>print ("np array:%d,%d"%(dims[0],dims[1]))
>
>
>
> But it only prints the first number correctly, i.e., dims[0]. The second
> number is always 0.
>

The correct typecode would be NPY_INTP.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Extending C with Python

2018-01-30 Thread Jialin Liu
Hello,
I'm extending C with python (which is opposite way of what people usually
do, extending python with C), I'm currently stuck in passing a C array to
python layer, could anyone plz advise?

I have a C buffer in my C code and want to pass it to a python function. In
the C code, I have:

npy_intp  dims [2];
> dims[0] = 10;
> dims[1] = 20;
> import_array();
> npy_intp m=2;
> PyObject * py_dims = PyArray_SimpleNewFromData(1, &m, NPY_INT16 ,(void
> *)dims ); // I also tried NPY_INT
> PyObject_CallMethod(pInstance, method_name, "O", py_dims);


In the Python code, I want to just print that array:

def f(self, dims):

   print ("np array:%d,%d"%(dims[0],dims[1]))



But it only prints the first number correctly, i.e., dims[0]. The second
number is always 0.


Best,
Jialin
LBNL/NERSC
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread josef . pktd
On Tue, Jan 30, 2018 at 7:33 PM, Allan Haldane 
wrote:

> On 01/30/2018 04:54 PM, josef.p...@gmail.com wrote:
> >
> >
> > On Tue, Jan 30, 2018 at 3:21 PM, Allan Haldane  > > wrote:
> >
> > On 01/30/2018 01:33 PM, josef.p...@gmail.com
> >  wrote:
> > > AFAICS, one problem is that the padded view didn't come with the
> > > matching down stream usage support, the pack function as
> mentioned, an
> > > alternative way to convert to a standard ndarray, copy doesn't get
> rid
> > > of the padding and so on.
> > >
> > > eg. another mailing list thread I just found with the same problem
> > > http://numpy-discussion.10968.n7.nabble.com/view-of-
> recarray-issue-td32001.html
> >  recarray-issue-td32001.html>
> > >
> > > quoting Ralf:
> > > Question: is that really the recommended way to get an (N, 2) size
> float
> > > array from two columns of a larger record array? If so, why isn't
> there
> > > a better way? If you'd want to write to that (N, 2) array you have
> to
> > > append a copy, making it even uglier. Also, then there really
> should be
> > > tests for views in test_records.py.
> > >
> > >
> > > This "better way" never showed up, AFAIK. And it looks like we
> came back
> > > to this problem every few years.
> > >
> > > Josef
> >
> > Since we are at least pushing off this change to a later release
> > (1.15?), we have some time to prepare/catch up.
> >
> > What can we add to numpy.lib.recfunctions to make the multi-field
> > copy->view change smoother? We have discussed at least two functions:
> >
> >  * repack_fields - rearrange the memory layout of a structured array
> to
> > add/remove padding between fields
> >
> >  * structured_to_unstructured - turns a n-D structured array into an
> > (n+1)-D unstructured ndarray, whose dtype is the highest common type
> of
> > all the fields. May want the inverse function too.
> >
> >
> > The only sticky point with statsmodels is to have an equivalent of
> > a[['b', 'c']].view(('f8', 2)).
> >
> > Highest common dtype might be object, the main usecase for this is to
> > select some elements of a specific dtype and then use them as
> > standard,homogeneous ndarray. In our case and other cases that I have
> > seen it is mainly to select a subset of the floating point numbers.
> > Another case of this might be to combine two strings into one  a[['b',
> > 'c']].view(('S8'))if b is s5 and c is S3, but I don't think I used
> > this in serious code.
>
> I implemented and put up a draft of these functions in
> https://github.com/numpy/numpy/pull/10411


Comments based on reading the last commit


>
>
> I think they satisfy all your cases: code like
>
> >>> a = np.ones(3, dtype=[('a', 'f8'), ('b', 'f8'), ('c', 'f8')])
> >>> a[['b', 'c']].view(('f8', 2))`
>
> becomes:
>
> >>> import numpy.lib.recfunctions as rf
> >>> rf.structured_to_unstructured(a[['b', 'c']])
> array([[1., 1.],
>[1., 1.],
>[1., 1.]])
>
> The highest common dtype is usually not "Object", since I use
> `np.result_type` to determine the output type. So two fields of 'S5' and
> 'S3' result in an 'S5' array.
>
>
structured_to_unstructured  looks good to me



>
> >
> > for inverse function: I guess it is still possible to view any standard
> > homogenous ndarray with a structured dtype as long as the itemsize
> matches.
>
> The inverse is implemented too. And it even supports varied field
> dtypes, nested fields, and subarrays, as you can see in the docstring
> examples.
>
>
> > Browsing through old mailing list threads, I saw that adding multiple
> > fields or concatenating two arrays with structured dtypes into an array
> > with a single combined dtype was missing and I guess still is. (IIRC
> > this is the usecase where we go now the pandas detour in statsmodels.)
> >
> > We might also consider
> >
> >  * apply_along_fields(arr, method) - applies the method along the
> > "field" axis, equivalent to something like
> > method(struct_to_unstructured(arr), axis=-1)
> >
> >
> > If this works on a padded view of an existing array, then this would be
> > an improvement over the current version of having to extract and copy
> > the relevant fields of an existing structured dtype or loop over
> > different numeric dtypes, ints, floats.
> >
> > In general there will need to be a way to apply `method` only to
> > selected columns, or columns of a matching dtype. (e.g. We don't want
> > the sum or mean of a string.)
> > (e.g. we use ptp() on numeric fields to check if there is already a
> > constant column in the array or dataframe)
>
> Means over selected columns are accounted for using multi-field
> indexing. For example:
>
> >>> b = np.array([(1, 2, 5), (4, 5, 7), (7, 8 ,11), (10, 11, 12)],
> ...  

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread Allan Haldane
On 01/30/2018 04:54 PM, josef.p...@gmail.com wrote:
> 
> 
> On Tue, Jan 30, 2018 at 3:21 PM, Allan Haldane  > wrote:
> 
> On 01/30/2018 01:33 PM, josef.p...@gmail.com
>  wrote:
> > AFAICS, one problem is that the padded view didn't come with the
> > matching down stream usage support, the pack function as mentioned, an
> > alternative way to convert to a standard ndarray, copy doesn't get rid
> > of the padding and so on.
> >
> > eg. another mailing list thread I just found with the same problem
> > 
> http://numpy-discussion.10968.n7.nabble.com/view-of-recarray-issue-td32001.html
> 
> 
> >
> > quoting Ralf:
> > Question: is that really the recommended way to get an (N, 2) size float
> > array from two columns of a larger record array? If so, why isn't there
> > a better way? If you'd want to write to that (N, 2) array you have to
> > append a copy, making it even uglier. Also, then there really should be
> > tests for views in test_records.py.
> >
> >
> > This "better way" never showed up, AFAIK. And it looks like we came back
> > to this problem every few years.
> >
> > Josef
> 
> Since we are at least pushing off this change to a later release
> (1.15?), we have some time to prepare/catch up.
> 
> What can we add to numpy.lib.recfunctions to make the multi-field
> copy->view change smoother? We have discussed at least two functions:
> 
>  * repack_fields - rearrange the memory layout of a structured array to
> add/remove padding between fields
> 
>  * structured_to_unstructured - turns a n-D structured array into an
> (n+1)-D unstructured ndarray, whose dtype is the highest common type of
> all the fields. May want the inverse function too.
> 
> 
> The only sticky point with statsmodels is to have an equivalent of
> a[['b', 'c']].view(('f8', 2)).
> 
> Highest common dtype might be object, the main usecase for this is to
> select some elements of a specific dtype and then use them as
> standard,homogeneous ndarray. In our case and other cases that I have
> seen it is mainly to select a subset of the floating point numbers.
> Another case of this might be to combine two strings into one  a[['b',
> 'c']].view(('S8'))    if b is s5 and c is S3, but I don't think I used
> this in serious code.

I implemented and put up a draft of these functions in
https://github.com/numpy/numpy/pull/10411

I think they satisfy all your cases: code like

>>> a = np.ones(3, dtype=[('a', 'f8'), ('b', 'f8'), ('c', 'f8')])
>>> a[['b', 'c']].view(('f8', 2))`

becomes:

>>> import numpy.lib.recfunctions as rf
>>> rf.structured_to_unstructured(a[['b', 'c']])
array([[1., 1.],
   [1., 1.],
   [1., 1.]])

The highest common dtype is usually not "Object", since I use
`np.result_type` to determine the output type. So two fields of 'S5' and
'S3' result in an 'S5' array.


> 
> for inverse function: I guess it is still possible to view any standard
> homogenous ndarray with a structured dtype as long as the itemsize matches.

The inverse is implemented too. And it even supports varied field
dtypes, nested fields, and subarrays, as you can see in the docstring
examples.


> Browsing through old mailing list threads, I saw that adding multiple
> fields or concatenating two arrays with structured dtypes into an array
> with a single combined dtype was missing and I guess still is. (IIRC
> this is the usecase where we go now the pandas detour in statsmodels.)
> 
> We might also consider
> 
>  * apply_along_fields(arr, method) - applies the method along the
> "field" axis, equivalent to something like
> method(struct_to_unstructured(arr), axis=-1)
> 
> 
> If this works on a padded view of an existing array, then this would be
> an improvement over the current version of having to extract and copy
> the relevant fields of an existing structured dtype or loop over
> different numeric dtypes, ints, floats.
> 
> In general there will need to be a way to apply `method` only to
> selected columns, or columns of a matching dtype. (e.g. We don't want
> the sum or mean of a string.)
> (e.g. we use ptp() on numeric fields to check if there is already a
> constant column in the array or dataframe)

Means over selected columns are accounted for using multi-field
indexing. For example:

>>> b = np.array([(1, 2, 5), (4, 5, 7), (7, 8 ,11), (10, 11, 12)],
...  dtype=[('x', 'i4'), ('y', 'f4'), ('z', 'f8')])

>>> rf.apply_along_fields(np.mean, b)
array([ 2.6667,  5.,  8.6667, 11.])

>>> rf.apply_along_fields(np.mean, b[['x', 'z']])
array([ 3. ,  5.5,  9. , 11. ])


This is unaffected by the 1.14 to 1.15 changes.

Allan

> 
>  
> 
> 
> 
> I think these are pretty minimal and shouldn't

[Numpy-discussion] doc? music through mathematical relations between LPCM samples and musical elements/characteristics

2018-01-30 Thread Renato Fabbri
the half-shape suite:
archive.org/details/ShapeSuite
was completely synthesized using psychophysical relations
for each resulting 16bit 44kHz samples:
https://arxiv.org/abs/1412.6853

I am thinking about the ways in which to make the documentation
at least to:
* mass (music and audio in sample sequences): the framework.
It might be considered a toolbox, but it is not a package:
https://github.com/ttm/mass/

* music: a package for making music, including the routines
used to make the half-shape suite:
https://github.com/ttm/music

==
It seems reasonable at first to:

* upload the package to PyPI (mass is not a package),
music is there (beta but there).

* Make available some summary of the file tree, including
some automated code documentation.
As I follow numpy's convention (or try to), and the package and framework
are
particularly useful for proficient numpy users,
I used Sphinx. The very preliminary version is:
https://pythonmusic.github.io/

* Make a nice readthedocs documentation.
"music" project name was taken so I made:
http://music-documentation.readthedocs.io/en/latest/
"mass" project name was also taken so I made:
http://musicmass.readthedocs.io/en/latest/

And I found these URLs clumsy. Is there a standard or would you go with
musicpackage.read.. and
massframework.read...
?

More importantly:
* would you use readthedocs for the sphinx/doxigen
output?
* would you use readthedocs.org or a gitbook would be better or...?

Should I contact the scipy community to make available a scikit or
integrate it to numpy in any way beyond using and citing
it appropriately?.

BTW. I am using vimwiki with some constant attempt to organize and track
my I/O:
https://arxiv.org/abs/1712.06933
So maybe a unified solution for all these documentation instances
using a wiki structure seems reasonable at the moment.
Maybe upload the generated html from Sphinx and from Vimwiki
to readthedocs...?

Best,
R.


-- 
Renato Fabbri
GNU/Linux User #479299
labmacambira.sourceforge.net
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread josef . pktd
On Tue, Jan 30, 2018 at 3:21 PM, Allan Haldane 
wrote:

> On 01/30/2018 01:33 PM, josef.p...@gmail.com wrote:
> > AFAICS, one problem is that the padded view didn't come with the
> > matching down stream usage support, the pack function as mentioned, an
> > alternative way to convert to a standard ndarray, copy doesn't get rid
> > of the padding and so on.
> >
> > eg. another mailing list thread I just found with the same problem
> > http://numpy-discussion.10968.n7.nabble.com/view-of-
> recarray-issue-td32001.html
> >
> > quoting Ralf:
> > Question: is that really the recommended way to get an (N, 2) size float
> > array from two columns of a larger record array? If so, why isn't there
> > a better way? If you'd want to write to that (N, 2) array you have to
> > append a copy, making it even uglier. Also, then there really should be
> > tests for views in test_records.py.
> >
> >
> > This "better way" never showed up, AFAIK. And it looks like we came back
> > to this problem every few years.
> >
> > Josef
>
> Since we are at least pushing off this change to a later release
> (1.15?), we have some time to prepare/catch up.
>
> What can we add to numpy.lib.recfunctions to make the multi-field
> copy->view change smoother? We have discussed at least two functions:
>
>  * repack_fields - rearrange the memory layout of a structured array to
> add/remove padding between fields
>
>  * structured_to_unstructured - turns a n-D structured array into an
> (n+1)-D unstructured ndarray, whose dtype is the highest common type of
> all the fields. May want the inverse function too.
>

The only sticky point with statsmodels is to have an equivalent of
a[['b', 'c']].view(('f8', 2)).

Highest common dtype might be object, the main usecase for this is to
select some elements of a specific dtype and then use them as
standard,homogeneous ndarray. In our case and other cases that I have seen
it is mainly to select a subset of the floating point numbers.
Another case of this might be to combine two strings into one  a[['b',
'c']].view(('S8'))if b is s5 and c is S3, but I don't think I used this
in serious code.

for inverse function: I guess it is still possible to view any standard
homogenous ndarray with a structured dtype as long as the itemsize matches.

Browsing through old mailing list threads, I saw that adding multiple
fields or concatenating two arrays with structured dtypes into an array
with a single combined dtype was missing and I guess still is. (IIRC this
is the usecase where we go now the pandas detour in statsmodels.)



>
>
> We might also consider
>
>  * apply_along_fields(arr, method) - applies the method along the
> "field" axis, equivalent to something like
> method(struct_to_unstructured(arr), axis=-1)
>

If this works on a padded view of an existing array, then this would be an
improvement over the current version of having to extract and copy the
relevant fields of an existing structured dtype or loop over different
numeric dtypes, ints, floats.

In general there will need to be a way to apply `method` only to selected
columns, or columns of a matching dtype. (e.g. We don't want the sum or
mean of a string.)
(e.g. we use ptp() on numeric fields to check if there is already a
constant column in the array or dataframe)



>
>
> I think these are pretty minimal and shouldn't be too hard to implement.
>

AFAICS, it would cover the statsmodels usage.


Josef



>
> Allan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread Allan Haldane
On 01/30/2018 01:33 PM, josef.p...@gmail.com wrote:
> AFAICS, one problem is that the padded view didn't come with the
> matching down stream usage support, the pack function as mentioned, an
> alternative way to convert to a standard ndarray, copy doesn't get rid
> of the padding and so on.
> 
> eg. another mailing list thread I just found with the same problem
> http://numpy-discussion.10968.n7.nabble.com/view-of-recarray-issue-td32001.html
> 
> quoting Ralf:
> Question: is that really the recommended way to get an (N, 2) size float
> array from two columns of a larger record array? If so, why isn't there
> a better way? If you'd want to write to that (N, 2) array you have to
> append a copy, making it even uglier. Also, then there really should be
> tests for views in test_records.py.
> 
> 
> This "better way" never showed up, AFAIK. And it looks like we came back
> to this problem every few years.
> 
> Josef

Since we are at least pushing off this change to a later release
(1.15?), we have some time to prepare/catch up.

What can we add to numpy.lib.recfunctions to make the multi-field
copy->view change smoother? We have discussed at least two functions:

 * repack_fields - rearrange the memory layout of a structured array to
add/remove padding between fields

 * structured_to_unstructured - turns a n-D structured array into an
(n+1)-D unstructured ndarray, whose dtype is the highest common type of
all the fields. May want the inverse function too.


We might also consider

 * apply_along_fields(arr, method) - applies the method along the
"field" axis, equivalent to something like
method(struct_to_unstructured(arr), axis=-1)


I think these are pretty minimal and shouldn't be too hard to implement.

Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread josef . pktd
On Tue, Jan 30, 2018 at 1:33 PM,  wrote:

>
>
> On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane 
> wrote:
>
>> On 01/29/2018 11:50 PM, josef.p...@gmail.com wrote:
>>
>>>
>>>
>>> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane >> > wrote:
>>>
>>> On 01/29/2018 05:59 PM, josef.p...@gmail.com
>>>  wrote:
>>>
>>>
>>>
>>> On Mon, Jan 29, 2018 at 5:50 PM, >>  >> >> wrote:
>>>
>>>
>>>
>>>  On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
>>>  mailto:allanhald...@gmail.com>
>>> >>
>>> wrote:
>>>
>>>  On 01/29/2018 04:02 PM, josef.p...@gmail.com
>>> 
>>>  >> > wrote:
>>>  >
>>>  >
>>>  > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root
>>> mailto:ben.v.r...@gmail.com>
>>> >
>>>  > >>  >> >>  >
>>>  > I <3 structured arrays. I love the fact that I
>>> can access data by
>>>  > row and then by fieldname, or vice versa. There
>>> are times when I
>>>  > need to pass just a column into a function, and
>>> there are times when
>>>  > I need to process things row by row. Yes, pandas
>>> is nice if you want
>>>  > the specialized indexing features, but it becomes
>>> a bear to deal
>>>  > with if all you want is normal indexing, or even
>>> the ability to
>>>  > easily loop over the dataset.
>>>  >
>>>  >
>>>  > I don't think there is a doubt that structured
>>> arrays, arrays with
>>>  > structured dtypes, are a useful container. The
>>> question is whether they
>>>  > should be more or the foundation for more.
>>>  >
>>>  > For example, computing a mean, or reduce operation,
>>> over numeric element
>>>  > ("columns"). Before padded views it was possible to
>>> index by selecting
>>>  > the relevant "columns" and view them as standard
>>> array. With padded
>>>  > views that breaks and AFAICS, there is no way in
>>> numpy 1.14.0 to compute
>>>  > a mean of some "columns". (I don't have numpy 1.14 to
>>> try or find a
>>>  > workaround, like maybe looping over all relevant
>>> columns.)
>>>  >
>>>  > Josef
>>>
>>>  Just to clarify, structured types have always had
>>> padding bytes,
>>>  that
>>>  isn't new.
>>>
>>>  What *is* new (which we are pushing to 1.15, I think)
>>> is that it
>>>  may be
>>>  somewhat more common to end up with padding than
>>> before, and
>>>  only if you
>>>  are specifically using multi-field indexing, which is a
>>> fairly
>>>  specialized case.
>>>
>>>  I think recfunctions already account properly for
>>> padding bytes.
>>>  Except
>>>  for the bug in #8100, which we will fix, padding-bytes
>>> in
>>>  recarrays are
>>>  more or less invisible to a non-expert who only cares
>>> about
>>>  dataframe-like behavior.
>>>
>>>  In other words, padding is no obstacle at all to
>>> computing a
>>>  mean over a
>>>  column, and single-field indexes in 1.15 behave
>>> identically as
>>>  before.
>>>  The only thing that will change in 1.15 is multi-field
>>> indexing,
>>>  and it
>>>  has never been possible to compute a mean (or any binary
>>>  operation) on
>>>  multiple fields.
>>>
>>>
>>>  from the example in the other thread
>>>  a[['b', 'c']].view(('f8', 2)).mean(0)
>>>
>>>
>>>  (from the statsmodels usecase:
>>>  read csv with genfromtext to get recarray or structured
>>> array
>>>  select/index the numeric columns
>>>  view them as standard array
>>>  do whatever we can do with standard numpy  arrays
>>

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread josef . pktd
On Tue, Jan 30, 2018 at 2:42 PM,  wrote:

>
>
> On Tue, Jan 30, 2018 at 1:33 PM,  wrote:
>
>>
>>
>> On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane 
>> wrote:
>>
>>> On 01/29/2018 11:50 PM, josef.p...@gmail.com wrote:
>>>


 On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane >>> > wrote:

 On 01/29/2018 05:59 PM, josef.p...@gmail.com
  wrote:



 On Mon, Jan 29, 2018 at 5:50 PM, >>>  >> wrote:



  On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
  mailto:allanhald...@gmail.com>
 >>
 wrote:

  On 01/29/2018 04:02 PM, josef.p...@gmail.com
 
  > wrote:
  >
  >
  > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root
 mailto:ben.v.r...@gmail.com>
 >
  >  
  > I <3 structured arrays. I love the fact that I
 can access data by
  > row and then by fieldname, or vice versa. There
 are times when I
  > need to pass just a column into a function, and
 there are times when
  > I need to process things row by row. Yes, pandas
 is nice if you want
  > the specialized indexing features, but it becomes
 a bear to deal
  > with if all you want is normal indexing, or even
 the ability to
  > easily loop over the dataset.
  >
  >
  > I don't think there is a doubt that structured
 arrays, arrays with
  > structured dtypes, are a useful container. The
 question is whether they
  > should be more or the foundation for more.
  >
  > For example, computing a mean, or reduce operation,
 over numeric element
  > ("columns"). Before padded views it was possible to
 index by selecting
  > the relevant "columns" and view them as standard
 array. With padded
  > views that breaks and AFAICS, there is no way in
 numpy 1.14.0 to compute
  > a mean of some "columns". (I don't have numpy 1.14 to
 try or find a
  > workaround, like maybe looping over all relevant
 columns.)
  >
  > Josef

  Just to clarify, structured types have always had
 padding bytes,
  that
  isn't new.

  What *is* new (which we are pushing to 1.15, I think)
 is that it
  may be
  somewhat more common to end up with padding than
 before, and
  only if you
  are specifically using multi-field indexing, which is a
 fairly
  specialized case.

  I think recfunctions already account properly for
 padding bytes.
  Except
  for the bug in #8100, which we will fix, padding-bytes
 in
  recarrays are
  more or less invisible to a non-expert who only cares
 about
  dataframe-like behavior.

  In other words, padding is no obstacle at all to
 computing a
  mean over a
  column, and single-field indexes in 1.15 behave
 identically as
  before.
  The only thing that will change in 1.15 is multi-field
 indexing,
  and it
  has never been possible to compute a mean (or any
 binary
  operation) on
  multiple fields.


  from the example in the other thread
  a[['b', 'c']].view(('f8', 2)).mean(0)


  (from the statsmodels usecase:
  read csv with genfromtext to get recarray or str

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread josef . pktd
On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane 
wrote:

> On 01/29/2018 11:50 PM, josef.p...@gmail.com wrote:
>
>>
>>
>> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane > > wrote:
>>
>> On 01/29/2018 05:59 PM, josef.p...@gmail.com
>>  wrote:
>>
>>
>>
>> On Mon, Jan 29, 2018 at 5:50 PM, >  > >> wrote:
>>
>>
>>
>>  On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
>>  mailto:allanhald...@gmail.com>
>> >>
>> wrote:
>>
>>  On 01/29/2018 04:02 PM, josef.p...@gmail.com
>> 
>>  > > wrote:
>>  >
>>  >
>>  > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root
>> mailto:ben.v.r...@gmail.com>
>> >
>>  > >  > >  >
>>  > I <3 structured arrays. I love the fact that I
>> can access data by
>>  > row and then by fieldname, or vice versa. There
>> are times when I
>>  > need to pass just a column into a function, and
>> there are times when
>>  > I need to process things row by row. Yes, pandas
>> is nice if you want
>>  > the specialized indexing features, but it becomes
>> a bear to deal
>>  > with if all you want is normal indexing, or even
>> the ability to
>>  > easily loop over the dataset.
>>  >
>>  >
>>  > I don't think there is a doubt that structured
>> arrays, arrays with
>>  > structured dtypes, are a useful container. The
>> question is whether they
>>  > should be more or the foundation for more.
>>  >
>>  > For example, computing a mean, or reduce operation,
>> over numeric element
>>  > ("columns"). Before padded views it was possible to
>> index by selecting
>>  > the relevant "columns" and view them as standard
>> array. With padded
>>  > views that breaks and AFAICS, there is no way in
>> numpy 1.14.0 to compute
>>  > a mean of some "columns". (I don't have numpy 1.14 to
>> try or find a
>>  > workaround, like maybe looping over all relevant
>> columns.)
>>  >
>>  > Josef
>>
>>  Just to clarify, structured types have always had
>> padding bytes,
>>  that
>>  isn't new.
>>
>>  What *is* new (which we are pushing to 1.15, I think)
>> is that it
>>  may be
>>  somewhat more common to end up with padding than
>> before, and
>>  only if you
>>  are specifically using multi-field indexing, which is a
>> fairly
>>  specialized case.
>>
>>  I think recfunctions already account properly for
>> padding bytes.
>>  Except
>>  for the bug in #8100, which we will fix, padding-bytes in
>>  recarrays are
>>  more or less invisible to a non-expert who only cares
>> about
>>  dataframe-like behavior.
>>
>>  In other words, padding is no obstacle at all to
>> computing a
>>  mean over a
>>  column, and single-field indexes in 1.15 behave
>> identically as
>>  before.
>>  The only thing that will change in 1.15 is multi-field
>> indexing,
>>  and it
>>  has never been possible to compute a mean (or any binary
>>  operation) on
>>  multiple fields.
>>
>>
>>  from the example in the other thread
>>  a[['b', 'c']].view(('f8', 2)).mean(0)
>>
>>
>>  (from the statsmodels usecase:
>>  read csv with genfromtext to get recarray or structured array
>>  select/index the numeric columns
>>  view them as standard array
>>  do whatever we can do with standard numpy  arrays
>>  )
>>
>>
>> Oh ok, I misunderstood. I see your point: a mean over fields is more
>> difficult than before.
>>
>> Or, to phrase it as a question:
>>
>

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread Allan Haldane

On 01/29/2018 11:50 PM, josef.p...@gmail.com wrote:



On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane > wrote:


On 01/29/2018 05:59 PM, josef.p...@gmail.com
 wrote:



On Mon, Jan 29, 2018 at 5:50 PM, mailto:josef.p...@gmail.com> >> wrote:



     On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
     mailto:allanhald...@gmail.com>
>>
wrote:

         On 01/29/2018 04:02 PM, josef.p...@gmail.com

         > wrote:
         >
         >
         > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root
mailto:ben.v.r...@gmail.com>
>
         >  
         >     I <3 structured arrays. I love the fact that I
can access data by
         >     row and then by fieldname, or vice versa. There
are times when I
         >     need to pass just a column into a function, and
there are times when
         >     I need to process things row by row. Yes, pandas
is nice if you want
         >     the specialized indexing features, but it becomes
a bear to deal
         >     with if all you want is normal indexing, or even
the ability to
         >     easily loop over the dataset.
         >
         >
         > I don't think there is a doubt that structured
arrays, arrays with
         > structured dtypes, are a useful container. The
question is whether they
         > should be more or the foundation for more.
         >
         > For example, computing a mean, or reduce operation,
over numeric element
         > ("columns"). Before padded views it was possible to
index by selecting
         > the relevant "columns" and view them as standard
array. With padded
         > views that breaks and AFAICS, there is no way in
numpy 1.14.0 to compute
         > a mean of some "columns". (I don't have numpy 1.14 to
try or find a
         > workaround, like maybe looping over all relevant
columns.)
         >
         > Josef

         Just to clarify, structured types have always had
padding bytes,
         that
         isn't new.

         What *is* new (which we are pushing to 1.15, I think)
is that it
         may be
         somewhat more common to end up with padding than
before, and
         only if you
         are specifically using multi-field indexing, which is a
fairly
         specialized case.

         I think recfunctions already account properly for
padding bytes.
         Except
         for the bug in #8100, which we will fix, padding-bytes in
         recarrays are
         more or less invisible to a non-expert who only cares about
         dataframe-like behavior.

         In other words, padding is no obstacle at all to
computing a
         mean over a
         column, and single-field indexes in 1.15 behave
identically as
         before.
         The only thing that will change in 1.15 is multi-field
indexing,
         and it
         has never been possible to compute a mean (or any binary
         operation) on
         multiple fields.


     from the example in the other thread
     a[['b', 'c']].view(('f8', 2)).mean(0)


     (from the statsmodels usecase:
     read csv with genfromtext to get recarray or structured array
     select/index the numeric columns
     view them as standard array
     do whatever we can do with standard numpy  arrays
     )


Oh ok, I misunderstood. I see your point: a mean over fields is more
difficult than before.

Or, to phrase it as a question:

How do we get a standard array with homogeneous dtype from the
corresponding elements of a structured dtype in numpy 1.14.0?

Josef


The answer may be that "numpy has never had a way to that",
even if in a few special cases you might hack a workaround using views.

That's what your example seems like to me. It uses an explicit view,
which is an "expert" feature since views de

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread Chris Barker
On Mon, Jan 29, 2018 at 7:44 PM, Allan Haldane 
wrote:

> I suggest that if we want to allow either means over fields, or conversion
> of a n-D structured array to an n+1-D regular ndarray, we should add a
> dedicated function to do so in numpy.lib.recfunctions
> which does not depend on the binary representation of the array.
>

IIUC, the core use-case of structured dtypes is binary compatibility with
external systems (arrays of C structs, mostly) -- at least that's how I use
them :-)

In which case, "conversion of a n-D structured array to an n+1-D regular
ndarray" is an important feature -- actually even more important if you
don't use recarrays

So yes, let's have a utility to make that easy.

as for recarrays -- are we that far from having them be robust and useful?
in which case, why not keep them around, fix the few issues, but explicitly
not try to extend them into more dataframe-like domains

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread josef . pktd
On Tue, Jan 30, 2018 at 3:24 AM, Eric Wieser 
wrote:

> Because dtypes were low level with clear memory layout and stayed that way
>
> Dtypes have supported padded and out-of-order-fields since at least 2005
> (v0.8.4)
> ,
> and I would guess that the memory layout has not changed since.
>
> The house has always been made out of glass, it just didn’t look fragile
> until we showed people where the stones were.
>

Even so, I don't remember any problems with it.

There might have been stones on the side streets and alleys, but 1.14.0
puts a big padded stone right in the front of the drive way.
(Maybe only the solarium was made out of glass, now it's also the billiard
room.)

(I never had to learn about padding and I don't remember having any related
problems getting statsmodels through Debian testing on various machine
types.)

Josef



> ​
>
> On Mon, 29 Jan 2018 at 20:51  wrote:
>
>> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane 
>> wrote:
>>
>>> On 01/29/2018 05:59 PM, josef.p...@gmail.com wrote:
>>>


 On Mon, Jan 29, 2018 at 5:50 PM, >>> josef.p...@gmail.com>> wrote:



 On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
 mailto:allanhald...@gmail.com>> wrote:

 On 01/29/2018 04:02 PM, josef.p...@gmail.com
  wrote:
 >
 >
 > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root <
 ben.v.r...@gmail.com 
 > >>
 wrote:
 >
 > I <3 structured arrays. I love the fact that I can access
 data by
 > row and then by fieldname, or vice versa. There are times
 when I
 > need to pass just a column into a function, and there are
 times when
 > I need to process things row by row. Yes, pandas is nice
 if you want
 > the specialized indexing features, but it becomes a bear
 to deal
 > with if all you want is normal indexing, or even the
 ability to
 > easily loop over the dataset.
 >
 >
 > I don't think there is a doubt that structured arrays, arrays
 with
 > structured dtypes, are a useful container. The question is
 whether they
 > should be more or the foundation for more.
 >
 > For example, computing a mean, or reduce operation, over
 numeric element
 > ("columns"). Before padded views it was possible to index by
 selecting
 > the relevant "columns" and view them as standard array. With
 padded
 > views that breaks and AFAICS, there is no way in numpy 1.14.0
 to compute
 > a mean of some "columns". (I don't have numpy 1.14 to try or
 find a
 > workaround, like maybe looping over all relevant columns.)
 >
 > Josef

 Just to clarify, structured types have always had padding bytes,
 that
 isn't new.

 What *is* new (which we are pushing to 1.15, I think) is that it
 may be
 somewhat more common to end up with padding than before, and
 only if you
 are specifically using multi-field indexing, which is a fairly
 specialized case.

 I think recfunctions already account properly for padding bytes.
 Except
 for the bug in #8100, which we will fix, padding-bytes in
 recarrays are
 more or less invisible to a non-expert who only cares about
 dataframe-like behavior.

 In other words, padding is no obstacle at all to computing a
 mean over a
 column, and single-field indexes in 1.15 behave identically as
 before.
 The only thing that will change in 1.15 is multi-field indexing,
 and it
 has never been possible to compute a mean (or any binary
 operation) on
 multiple fields.


 from the example in the other thread
 a[['b', 'c']].view(('f8', 2)).mean(0)


 (from the statsmodels usecase:
 read csv with genfromtext to get recarray or structured array
 select/index the numeric columns
 view them as standard array
 do whatever we can do with standard numpy  arrays
 )

>>>
>>> Oh ok, I misunderstood. I see your point: a mean over fields is more
>>> difficult than before.
>>>
>>> Or, to phrase it as a question:

 How do we get a standard array with homogeneous dtype from the
 corresponding elements of a structured dtype in numpy 1.14.0?

 Josef
>>>

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread Eric Wieser
Because dtypes were low level with clear memory layout and stayed that way

Dtypes have supported padded and out-of-order-fields since at least 2005
(v0.8.4)
,
and I would guess that the memory layout has not changed since.

The house has always been made out of glass, it just didn’t look fragile
until we showed people where the stones were.
​

On Mon, 29 Jan 2018 at 20:51  wrote:

> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane 
> wrote:
>
>> On 01/29/2018 05:59 PM, josef.p...@gmail.com wrote:
>>
>>>
>>>
>>> On Mon, Jan 29, 2018 at 5:50 PM, >> josef.p...@gmail.com>> wrote:
>>>
>>>
>>>
>>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
>>> mailto:allanhald...@gmail.com>> wrote:
>>>
>>> On 01/29/2018 04:02 PM, josef.p...@gmail.com
>>>  wrote:
>>> >
>>> >
>>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root <
>>> ben.v.r...@gmail.com 
>>> > >>
>>> wrote:
>>> >
>>> > I <3 structured arrays. I love the fact that I can access
>>> data by
>>> > row and then by fieldname, or vice versa. There are times
>>> when I
>>> > need to pass just a column into a function, and there are
>>> times when
>>> > I need to process things row by row. Yes, pandas is nice
>>> if you want
>>> > the specialized indexing features, but it becomes a bear
>>> to deal
>>> > with if all you want is normal indexing, or even the
>>> ability to
>>> > easily loop over the dataset.
>>> >
>>> >
>>> > I don't think there is a doubt that structured arrays, arrays
>>> with
>>> > structured dtypes, are a useful container. The question is
>>> whether they
>>> > should be more or the foundation for more.
>>> >
>>> > For example, computing a mean, or reduce operation, over
>>> numeric element
>>> > ("columns"). Before padded views it was possible to index by
>>> selecting
>>> > the relevant "columns" and view them as standard array. With
>>> padded
>>> > views that breaks and AFAICS, there is no way in numpy 1.14.0
>>> to compute
>>> > a mean of some "columns". (I don't have numpy 1.14 to try or
>>> find a
>>> > workaround, like maybe looping over all relevant columns.)
>>> >
>>> > Josef
>>>
>>> Just to clarify, structured types have always had padding bytes,
>>> that
>>> isn't new.
>>>
>>> What *is* new (which we are pushing to 1.15, I think) is that it
>>> may be
>>> somewhat more common to end up with padding than before, and
>>> only if you
>>> are specifically using multi-field indexing, which is a fairly
>>> specialized case.
>>>
>>> I think recfunctions already account properly for padding bytes.
>>> Except
>>> for the bug in #8100, which we will fix, padding-bytes in
>>> recarrays are
>>> more or less invisible to a non-expert who only cares about
>>> dataframe-like behavior.
>>>
>>> In other words, padding is no obstacle at all to computing a
>>> mean over a
>>> column, and single-field indexes in 1.15 behave identically as
>>> before.
>>> The only thing that will change in 1.15 is multi-field indexing,
>>> and it
>>> has never been possible to compute a mean (or any binary
>>> operation) on
>>> multiple fields.
>>>
>>>
>>> from the example in the other thread
>>> a[['b', 'c']].view(('f8', 2)).mean(0)
>>>
>>>
>>> (from the statsmodels usecase:
>>> read csv with genfromtext to get recarray or structured array
>>> select/index the numeric columns
>>> view them as standard array
>>> do whatever we can do with standard numpy  arrays
>>> )
>>>
>>
>> Oh ok, I misunderstood. I see your point: a mean over fields is more
>> difficult than before.
>>
>> Or, to phrase it as a question:
>>>
>>> How do we get a standard array with homogeneous dtype from the
>>> corresponding elements of a structured dtype in numpy 1.14.0?
>>>
>>> Josef
>>>
>>
>> The answer may be that "numpy has never had a way to that",
>> even if in a few special cases you might hack a workaround using views.
>>
>> That's what your example seems like to me. It uses an explicit view,
>> which is an "expert" feature since views depend on the exact memory layout
>> and binary representation of the array. Your example only works if the two
>> fields have exactly the same dtype as each other and as the final dtype,
>> and evidently breaks if there is byte padding for any reason.
>>
>> Pandas can do row means without these problems:
>>
>> >>> pd.DataFrame(np.ones(10, dtype='i8,f