Re: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python

2016-08-31 Thread Jason Newton
Hey Ian - I hope I gave Cython a fair comment, but I have to add the
disclaimer that your capability to understand/implement those
solutions/workarounds in that project is greatly enhanced from your knowing
the innards of Cython from being core developer on the Cython project. This
doesn't detract from DyDN's accomplishments (if nothing it means Cython
users should look there for how to use C++ with Cython and the workarounds
used + shortcomings) but I would not expect not everyone would want to jump
through those hoops to get things working without a firm understanding of
Cython's edges, and all this potential for special/hack adaption code is
still something to keep in mind when comparing to something more straight
forward and easier to understand coming from a more pure C/C++ side, where
things are a bit more dangerous and fairly more verbose but make play with
the language and environment first-class (like Boost.Python/pybind).  Since
this thread is a survey over state and options it's my intent just to make
sure readers have something bare in mind for current pros/cons of the
approaches.

-Jason

On Wed, Aug 31, 2016 at 2:17 PM, Ian Henriksen <
insertinterestingnameh...@gmail.com> wrote:

> We use Cython very heavily in DyND's Python bindings. It has worked well
> for us
> even when working with some very modern C++. That said, a lot depends on
> exactly which C++ features you want to expose as a part of the interface.
> Interfaces that require things like non-type template parameters or
> variadic
> templates will often require a some extra C++ code to work them in to
> something
> that Cython can understand. In my experience, those particular limitations
> haven't
> been that hard to work with.
> Best,
> Ian Henriksen
>
>
> On Wed, Aug 31, 2016 at 12:20 PM Jason Newton <nev...@gmail.com> wrote:
>
>> I just wanted to follow up on the C++ side of OP email - Cython has quite
>> a few difficulties working with C++ code at the moment.  It's really more
>> of a C solution most of the time and you must split things up into a mostly
>> C call interface (that is the C code Cython can call) and limit
>> exposure/complications with templates and  complex C++11+ constructs.  This
>> may change in the longer term but in the near, that is the state.
>>
>> I used to use Boost.Python but I'm getting my feet wet with Pybind (which
>> is basically the same api but works more as you expect it to with it's
>> signature/type plumbing  (including std::shared_ptr islanding), with some
>> other C++11 based improvements, and is header only + submodule friendly!).
>> I also remembered ndarray thanks to Neal's post but I haven't figured out
>> how to leverage it better than pybind, at the moment.  I'd be interested to
>> see ndarray gain support for pybind interoperability...
>>
>> -Jason
>>
>> On Wed, Aug 31, 2016 at 1:08 PM, David Morris <otha...@othalan.net>
>> wrote:
>>
>>> On Wed, Aug 31, 2016 at 2:28 PM, Michael Bieri <mibi...@gmail.com>
>>> wrote:
>>>
>>>> Hi all
>>>>
>>>> There are several ways on how to use C/C++ code from Python with NumPy,
>>>> as given in http://docs.scipy.org/doc/numpy/user/c-info.html .
>>>> Furthermore, there's at least pybind11.
>>>>
>>>> I'm not quite sure which approach is state-of-the-art as of 2016. How
>>>> would you do it if you had to make a C/C++ library available in Python
>>>> right now?
>>>>
>>>> In my case, I have a C library with some scientific functions on
>>>> matrices and vectors. You will typically call a few functions to configure
>>>> the computation, then hand over some pointers to existing buffers
>>>> containing vector data, then start the computation, and finally read back
>>>> the data. The library also can use MPI to parallelize.
>>>>
>>>
>>> I have been delighted with Cython for this purpose.  Great integration
>>> with NumPy (you can access numpy arrays directly as C arrays), very python
>>> like syntax and amazing performance.
>>>
>>> Good luck,
>>>
>>> David
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python

2016-08-31 Thread Jason Newton
I just wanted to follow up on the C++ side of OP email - Cython has quite a
few difficulties working with C++ code at the moment.  It's really more of
a C solution most of the time and you must split things up into a mostly C
call interface (that is the C code Cython can call) and limit
exposure/complications with templates and  complex C++11+ constructs.  This
may change in the longer term but in the near, that is the state.

I used to use Boost.Python but I'm getting my feet wet with Pybind (which
is basically the same api but works more as you expect it to with it's
signature/type plumbing  (including std::shared_ptr islanding), with some
other C++11 based improvements, and is header only + submodule friendly!).
I also remembered ndarray thanks to Neal's post but I haven't figured out
how to leverage it better than pybind, at the moment.  I'd be interested to
see ndarray gain support for pybind interoperability...

-Jason

On Wed, Aug 31, 2016 at 1:08 PM, David Morris  wrote:

> On Wed, Aug 31, 2016 at 2:28 PM, Michael Bieri  wrote:
>
>> Hi all
>>
>> There are several ways on how to use C/C++ code from Python with NumPy,
>> as given in http://docs.scipy.org/doc/numpy/user/c-info.html .
>> Furthermore, there's at least pybind11.
>>
>> I'm not quite sure which approach is state-of-the-art as of 2016. How
>> would you do it if you had to make a C/C++ library available in Python
>> right now?
>>
>> In my case, I have a C library with some scientific functions on matrices
>> and vectors. You will typically call a few functions to configure the
>> computation, then hand over some pointers to existing buffers containing
>> vector data, then start the computation, and finally read back the data.
>> The library also can use MPI to parallelize.
>>
>
> I have been delighted with Cython for this purpose.  Great integration
> with NumPy (you can access numpy arrays directly as C arrays), very python
> like syntax and amazing performance.
>
> Good luck,
>
> David
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deterministic, reproducible matmul / __matmult_

2016-07-18 Thread Jason Newton
On Mon, Jul 11, 2016 at 3:27 PM, Pauli Virtanen <p...@iki.fi> wrote:
> Mon, 11 Jul 2016 13:01:49 -0400, Jason Newton kirjoitti:
>> Does the ML have any ideas on how one could get a matmul that will not
>> allow any funny business on the evaluation of the products?  Funny
>> business here is something like changing the evaluation order additions
>> of terms. I want strict IEEE 754 compliance - no 80 bit registers, and
>> perhaps control of the rounding mode, no unsafe math optimizations.
>
> If you link Numpy with a BLAS and LAPACK libraries that have been
> compiled for this purpose, and turn on the compiler flags that enforce
> strict IEEE (and disable SSE) when compiling Numpy, you probably will get
> reproducible builds. Numpy itself just offloads the dot computations to
> BLAS, so if your BLAS is reproducible, things should mainly be OK.

The matrix multiplication of the reference blas hard to follow, so
that's good.  Generalized Inverse is a little more difficult.  I've
actually had problems building numpy w/ ref blas under windows...,
anyone know where and how to take a ready made cblas dll and get an
existing numpy installation (e.g. anaconda) to use it?

>
> You may also need to turn off the SSE optimizations in Numpy, because
> these can make results depend on memory alignment --- not in dot
> products, but in other computations.
>
> Out of curiosity, what is the application where this is necessary?
> Maybe there is a numerically stable formulation?

This can come up in recursive processes or anywhere where parallelism
(vectorization or other kinds) and the need for reproducable code
exists (there are many reasons you'd want this, see slides below).  As
I mentioned, I desire to have reproducable calculations when involving
alternative implementations with things like c modules, MPI, FPGAs,
GPUs coming into the picture.  You  can usually figure out a strategy
to do this when you write everything yourself, but I'd love to write
things in numpy and have it just choose simple / straight forward
implementations that are usually what everything boils down to onto
these other devices, at least in meaningful peaces.  There may be
other times where it gets more complicated than what i mention here,
but it is still very useful as a building block for those cases (which
are often multiple accumulators/partitioning, tree like reduction
ordering).

I looked into reproblas further and also I asked the authors of
repoblas to add a cblas wrapper in the hopes I might sometime have
numpy working on top of it.  I read alot of their research recently
and it's pretty cool - they get better accuracy than most
implementations and the performance is pretty good.  Check out slides
here http://people.eecs.berkeley.edu/~hdnguyen/public/talks/ARITH21_Fast_Sum.pdf
(skip to slide 24 for accuracy) - the takeaway here is you actually do
quite a bit better in precision/accuracy with their sum method, than
the naive and alternative implementations.  The cpu performance is
worth the trade and really not bad at all for most operations and
purposes - especially since they hand scalability very well.

They also just posted a monster of a technical report here
https://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-121.html -
this details recent performance using avx and sse, if anyone is
interested.

I'd love to have a library like this that I could just tell numpy to
switch out to at runtime - at the start of a script.

-Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] deterministic, reproducible matmul / __matmult_

2016-07-11 Thread Jason Newton
Hello

I'm a long time user of numpy - but an issue I've had with it is
making sure I can reproduce the results of a floating point matrix
multiplication in other languages/modules (like c or GPU) in another,
or across installations.   I take great pains in doing this type of
work because it allows me to both prototype with python/numpy and to
also in a fairly strong/useful capacity, use it as a strict reference
implementation.  For me - small differences accumulate such that
allclose type thinking starts failing in a few iterations of
algorithms, things diverge.

I've had success with einsum before for some cases (by chance) where
no difference was ever observed between it and Eigen (C++), but I'm
not sure if I should use this any longer.  The new @ operator is very
tempting to use too, in prototyping.

Does the ML have any ideas on how one could get a matmul that will not
allow any funny business on the evaluation of the products?  Funny
business here is something like changing the evaluation order
additions of terms. I want strict IEEE 754 compliance - no 80 bit
registers, and perhaps control of the rounding mode, no unsafe math
optimizations.

I'm definitely willing to sacrifice performance (esp multi threaded
based enhancements which already cause problems in reduction ordering)
in order to get these guarantees.  I was looking around and found a
few BLAS's that might be worth a mention, comments on these would also
be welcome:

http://bebop.cs.berkeley.edu/reproblas/
https://exblas.lip6.fr/


-Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] correct sizeof for ndarray

2015-10-19 Thread Jason Newton
Hi folks,

I noticed an unexpected behavior of itemsize for structures with offsets
that are larger than that of a packed structure in memory.  This matters
when parsing in memory structures from C and some others (recently and
HDF5/h5py detail got me for a bit).

So what is the correct way to get "sizeof" a structure?  AFAIK this is the
size of the last item + it's offset.  If this doesn't exist... shouldn't it?

Thanks,
Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python needs goto

2015-09-24 Thread Jason Newton
In my experience, it's also come up with finite-state-machines where
there's lots of loops.  You might consider something like a long-lived
client-loop on some socket, where states like try-connect, connected, and
while-connected-and-everythings-ok exist and each can have it's own never
ending loops and each loop jumps to each others section.  It actually is
more straightforward than adding unneeded layers like callbacks which will
force you to design and pass around data, use auxiliary classes/structs,
and it's easier to shoot yourself in the foot with burden and context loss
that way.

-Jason

On Thu, Sep 24, 2015 at 1:54 PM, Alexander Eberspächer <
alex.eberspaec...@gmail.com> wrote:

> On 24.09.2015 13:25, Christophe Bal wrote:
>
> > Can you give an example where GOTO is useful ?
>
> I think those pieces of code are best understood with some humour..
>
> However, basically I can think two main causes for using goto:
>
> 1. Stop whatever your code is doing and jump towards the end of the
> program. However, this is mainly something useful for languages without
> exception handling and garbage collection.
>
> 2. Get out of something deeply nested. Also, this probably isn't very
> useful in Python as there's exception handling.
>
> Best
>
> Alex
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-02 Thread Jason Newton
Just chiming in with my 2 cents, in direct response to your points...

   - Image oriented processing is most typically done with row-major
   storage layout.  From hardware to general software implementations.
   - Well really think of it as [slice,] row, column (logical)... you don't
   actually have to be concerned about the layout unless you want higher
   performance - in which case for a better access pattern you process a
   fundamental image-line at a time.  I also find it helps me avoid bugs with
   xyz semantics by working with rows and columns only and remembering x=col,
   y = row.
   - I'm most familiar with having slice first like the above.
   - ITK is stored as row-major actually, but it's index type has
   dimensions specified as column,row, slice .  Matlab does alot of things
   column order and thus acts different from implementations which can result
   in different outputs, but matlab seems perfectly happy living on an island
   where it's the only implementation providing a specific answer given a
   specific input.
   - Numpy is 0 based...?

Good luck keeping it all sane though,

-Jason

On Sun, Aug 2, 2015 at 7:16 PM, Kang Wang kwan...@wisc.edu wrote:

 Thank you all for replying and providing useful insights and suggestions.

 The reasons I really want to use column-major are:

- I am image-oriented user (not matrix-oriented, as explained in

 http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
)
- I am so used to read/write I(x, y, z) in textbook and code, and it
is very likely that if the environment (row-major environment) forces me to
write I(z, y, x),  I will write a bug if I am not 100% focused. When this
happens, it is difficult to debug, because everything compile and build
fine. You will see run time error. Depending on environment, you may get
useful error message (i.e. index out of range), but sometimes you just get
bad image results.
- It actually has not too much to do with the actual data layout in
memory. In imaging processing, especially medical imaging where I am
working in, if you have a 3D image, everyone will agree that in memory, the
X index is the fasted changing index, and the Z dimension (we often call it
the slice dimension) has the largest stride in memory. So, if
data layout is like this in memory, and image-oriented users are so used to
read/write I(x,y,z), the only storage order that makes sense is
column-major
- I also write code in MATLAB and C/C++. In MATLAB, matrix is
column-major array. In C/C++, we often use ITK, which is also column-major 
 (
http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really
prefer always read/write column-major code to minimize coding bugs related
to storage order.
- I also prefer index to be 0-based; however, there is nothing I can
do about it for MATLAB (which is 1-based).

 I can see that my original thought about modifying NumPy source and
 re-compile is probably a bad idea. The suggestions about using
 fortran_zeros = partial(np.zeros(order='F')) is probably the best way so
 far, in my opinion, and I am going to give it a try.

 Again, thank you all for replying.

 Kang

 On 08/02/15, *Nathaniel Smith * n...@pobox.com wrote:

 On Aug 2, 2015 7:30 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 
  On 02/08/15 15:55, Kang Wang wrote:
 
   Can anyone provide any insight/help?
 
  There is no default order. There was before, but now all operators
  control the order of their return arrays from the order of their input
  array.

 This is... overoptimistic. I would not rely on this in code that I wrote.

 It's true that many numpy operations do preserve the input order. But
 there are also many operations that don't. And which are which often
 changes between releases. (Not on purpose usually, but it's an easy bug to
 introduce. And sometimes it is done intentionally, e.g. to make functions
 faster. It sucks to have to make a function slower for everyone because
 someone somewhere is depending on memory layout default details.) And there
 are operations where it's not even clear what preserving order means
 (indexing a C array with a Fortran array, add(C, fortran), ...), and even
 lots of operations that intrinsically break contiguity/ordering (transpose,
 diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings
 one way or another in any non-trivial program.

 Instead, it's better to explicitly specify order= just at the places where
 you care. That way your code is *locally* correct (you can check it will
 work by just reading the one function). The alternative is to try and
 enforce a *global* property on your program (everyone everywhere is very
 careful to only use contiguity-preserving operations, where everyone
 includes third party libraries like numpy and others). In software design,
 local invariants invariants are always better than 

Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-07-31 Thread Jason Newton
On Fri, Jul 31, 2015 at 5:19 PM, Nick Papior nickpap...@gmail.com wrote:

 --

 Kind regards Nick Papior
 On 31 Jul 2015 17:53, Chris Barker chris.bar...@noaa.gov wrote:
 
  On Thu, Jul 30, 2015 at 11:24 PM, Jason Newton nev...@gmail.com wrote:
 
  This really needs changing though.  scientific researchers don't catch
 this subtlety and expect it to be just like the c and matlab types they
 know a little about.
 
 
  well, C types are a %$ nightmare as well! In fact, one of the biggest
 issues comes from cPython's use of a C long for an integer -- which is
 not clearly defined. If you are writing code that needs any kind of binary
 compatibility, cross platform compatibility, and particularly if you want
 to be abel to distribute pre-compiled binaries of extensions, etc, then
 you'd better use well-defined types.

There was some truth to this but if you, like the majority of scientific
researchers only produce code for x86 or x86_64 on windows and linux... as
long as you aren't treating pointers as int's, everything behaves in
accordance to general expectations.   The standards did and still do allow
for a bit of flux but things like OpenCL [
https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/scalarDataTypes.html
] made this really strict so we stop writing ifdef's to deal with varying
bitwidths and just implement the algorithms - which is typically a
researcher’s top priority.

I'd say I use the strongly defined types (e.g. int/float32) whenever doing
protocol or communications work - it makes complete sense there. But often
for computation, especially when interfacing with c extensions it makes
more sense for the developer to use types/typenames that ought to match 1:1
with c in every case.

-Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-07-31 Thread Jason Newton
Been using numpy in it's various forms since like 2005.  burned on int,
int_ just today with boost.python / ndarray conversions and a number of
times before that.  intc being C's int!? Didn't even know it existed till
today.  This isn't the first time, esp with float.  Bool is actually
expected for me and I'd prefer it stay 1 byte for storage efficiency - I'll
use a long if I want it machine word wide.

This really needs changing though.  scientific researchers don't catch this
subtlety and expect it to be just like the c and matlab types they know a
little about.  I can't even keep it straight in all circumstances, how can
I expect them to?  This makes all the newcomers face the same pain and
introduce more bugs into otherwise good code.

+1 Change it now like ripping off a bandaid.  Match C11/C++11 types and
solve much pain past present and future in exchange for a few lashings for
the remainder of the year.  Thankfully stdint like types have existed for
quite some times so protocol descriptions have been correct most of the
time.

-Jason

On Fri, Jul 24, 2015 at 8:51 AM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On 07/23/2015 04:29 AM, Nathaniel Smith wrote:
  Hi all,
 
  So one of the things exposed in the numpy namespace are objects called
 np.int
 np.float
 np.bool
  etc.
 
  These are commonly used -- in fact, just yesterday on another project
  I saw a senior person reviewing a pull request instruct a more junior
  person that they should use np.float instead of float or np.float64.
  But AFAICT everyone who is actually using them is doing this based on
  a very easy-to-fall-for misconception, i.e., that these objects have
  something to do with numpy.

 I don't see the issue. They are just aliases so how is np.float worse
 than just float?
 Too me this does not seem worth the bother of deprecation.
 An argument could be made for deprecating creating dtypes from python
 builtin types as they are ambiguous (C float != python float) and
 platform dependent. E.g. dtype=int is just an endless source of bugs.
 But this is also so invasive that the deprecation would never be
 completed and just be a bother to everyone.

 So -1 from me.


 
  P.S.: using metamodule.py also gives us the option of making
  np.testing lazily imported, which last time this came up was
  benchmarked to improve numpy's import speed by ~35% [1] -- not too bad
  given that most production code will never touch np.testing. But this
  is just a teaser postscript; I'm not proposing that we actually do
  this at this time :-).
 
  [1]
 http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063147.html
 

 I doubt these numbers from 2012 are still correct. When this was last
 profiled last year the import there were two main offenders, add_docs
 and np.polynomial. Both have been fixed in 1.9. I don't recall
 np.testing even showing up.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] constructing record dtypes from the c-api

2015-07-24 Thread Jason Newton
After drilling through the sources a second time, I found it was
numpy/core/src/multiarray/descriptor.c was the file to consult with
the primary routine being PyArray_DescrConverter and _convert_from_*
functions being the most interesting to read and glean the
capabilities of this with.

So in particular it looks like the _convert_from_dict based path is
the way to go to allow custom field offsets to safely and
transparently map C POD structs with the offset information generated
at compile time to hopefully keep dtype's in perfect sync with C
sources vs declaring on the python source side.  I plan on building a
helper class to generate the dictionaries for this subroutine since
something akin to the list dtype specification is more user-friendly
(even towards me).

-Jason

On Thu, Jul 23, 2015 at 7:55 PM, Jason Newton nev...@gmail.com wrote:
 Hi folks,

 The moderator for the ML approved my subscription so I can now post
 this back in the numpy list rather than scipy.  Apologies for the
 duplicate/cross posting.


 I was trying to figure out how to make a dtype for a c-struct on the
 c-side and storing that in some boost python libraries I'm making.

 Imagine the following c-struct, greatly simplified of course from the
 real ones I need to expose:

 struct datum{
 double position[3];
 float velocity[3];
 int32_t d0;
 uint64_t time_of_receipt;
 };


 How would you make the dtype/PyArray_Descr for  this?

 I have as a point of reference compound types in HDF for similar
 struct descriptions (h5py makes these nearly 1:1 and converts back and
 forth to dtypes and hdf5 types, it uses Cython to accomplish this) -
 but I don't want to bring in hdf for this task - I'm not sure how well
 the offsets would go over in that translation to h5py too.

 Proper/best solutions would make use of offsetof as we insert entries
 to the dtype/PyArray_Descr.  It's fine if you do this in straight C -
 I just need to figure out how to accomplish this in a practical way.

 The language I'm working in is C++11.  The end goal is probably going
 to be to create a helper infrastructure to allow this to be made
 automatically-ish provided implementation of a [static] visitor
 pattern in C++.  The idea is to make numpy compatible c++ POD types
 rather than use Boost.Python wrapped proxies for every object which
 will cut down on some complicated and time consuming code (both of
 computer and developer) when ndarrays are what's called for.

 Related - would one work with record dtypes passed to C?  How would
 one lookup fields and offsets within a structure?

 Thanks for any advisement!

 -Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] constructing record dtypes from the c-api

2015-07-23 Thread Jason Newton
Hi folks,

The moderator for the ML approved my subscription so I can now post
this back in the numpy list rather than scipy.  Apologies for the
duplicate/cross posting.


I was trying to figure out how to make a dtype for a c-struct on the
c-side and storing that in some boost python libraries I'm making.

Imagine the following c-struct, greatly simplified of course from the
real ones I need to expose:

struct datum{
double position[3];
float velocity[3];
int32_t d0;
uint64_t time_of_receipt;
};


How would you make the dtype/PyArray_Descr for  this?

I have as a point of reference compound types in HDF for similar
struct descriptions (h5py makes these nearly 1:1 and converts back and
forth to dtypes and hdf5 types, it uses Cython to accomplish this) -
but I don't want to bring in hdf for this task - I'm not sure how well
the offsets would go over in that translation to h5py too.

Proper/best solutions would make use of offsetof as we insert entries
to the dtype/PyArray_Descr.  It's fine if you do this in straight C -
I just need to figure out how to accomplish this in a practical way.

The language I'm working in is C++11.  The end goal is probably going
to be to create a helper infrastructure to allow this to be made
automatically-ish provided implementation of a [static] visitor
pattern in C++.  The idea is to make numpy compatible c++ POD types
rather than use Boost.Python wrapped proxies for every object which
will cut down on some complicated and time consuming code (both of
computer and developer) when ndarrays are what's called for.

Related - would one work with record dtypes passed to C?  How would
one lookup fields and offsets within a structure?

Thanks for any advisement!

-Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion