Re: [Numpy-discussion] Starting work on ufunc rewrite

2016-03-31 Thread Irwin Zaid
Hey guys,

I figured I'd just chime in here.

Over in DyND-town, we've spent a lot of time figuring out how to structure
DyND callables, which are actually more general than NumPy gufuncs. We've
just recently got them to a place where we are very happy, and are able to
represent a wide range of computations.

Our callables use a two-fold approach to evaluation. The first pass is a
resolution pass, where a callable can specialize what it is doing based on
the input types. It is able to deduce the return type, multidispatch, or
even perform some sort of recursive analysis in the form of computations
that call themselves. The second pass is construction of a kernel object
that is exactly specialized to the metadata (e.g., strides, contiguity,
...) of the array.

The callable itself can store arbitrary data, as can each pass of the
evaluation. Either (or both) of these passes can be done ahead of time,
allowing one to have a callable exactly specialized for your array.

If NumPy is looking to change it's ufunc design, we'd be happy to share our
experiences with this.

Irwin

On Thu, Mar 31, 2016 at 4:00 PM, Jaime Fernández del Río
https://mail.scipy.org/mailman/listinfo/numpy-discussion>> wrote:
>* I have started discussing with Nathaniel the implementation of the ufunc ABI
*>* break that he proposed in a draft NEP a few months ago:
*>>* http://thread.gmane.org/gmane.comp.python.numeric.general/61270

*>>* His original proposal was to make the public portion of PyUFuncObject be:
*>>* typedef struct {
*>* PyObject_HEAD
*>* int nin, nout, nargs;
*>* } PyUFuncObject;
*>>* Of course the idea is that internally we would use a much larger
struct that
*>* we could change at will, as long as its first few entries matched those of
*>* PyUFuncObject. My problem with this, and I may very well be missing
*>* something, is that in PyUFunc_Type we need to set the tp_basicsize to the
*>* size of the extended struct, so we would end up having to expose its
*>* contents. This is somewhat similar to what now happens with PyArrayObject:
*>* anyone can #include "ndarraytypes.h", cast PyArrayObject* to
*>* PyArrayObjectFields*, and access the guts of the struct without using the
*>* supplied API inline functions. Not the end of the world, but if you want to
*>* make something private, you might as well make it truly private.
*>>* I think it would be to have something similar to what NpyIter does::
*>>* typedef struct {
*>* PyObject_HEAD
*>* NpyUFunc *ufunc;
*>* } PyUFuncObject;
*>>* where NpyUFunc would, at this level, be an opaque type of which nothing
*>* would be known. We could have some of the NpyUFunc attributes cached on the
*>* PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject.
*>* This would also give us more liberty in making NpyUFunc be whatever we want
*>* it to be, including a variable-sized memory chunk that we could use and
*>* access at will. NpyIter is again a good example, where rather than storing
*>* pointers to strides and dimensions arrays, these are made part of the
*>* NpyIter memory chunk, effectively being equivalent to having variable sized
*>* arrays as part of the struct. And I think we will probably no longer trigger
*>* the Cython warnings about size changes either.
*>>* Any thoughts on this approach? Is there anything fundamentally wrong with
*>* what I'm proposing here?
*>>* Also, this is probably going to end up being a rewrite of a
pretty large and
*>* complex codebase. I am not sure that working on this on my own and
*>* eventually sending a humongous PR is the best approach. Any thoughts on how
*>* best to handle turning this into a collaborative, incremental effort? Anyone
*>* who would like to join in the fun?
*>>* Jaime
*>>* --
*>* (\__/)
*>* ( O.o)
*>* ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
*>* dominación mundial.
*>>* ___
*>* NumPy-Discussion mailing list
*>* NumPy-Discussion at scipy.org

*>* https://mail.scipy.org/mailman/listinfo/numpy-discussion

*>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 1.0.0 RC2 is out!

2016-03-31 Thread Francesc Alted
==
Announcing bcolz 1.0.0 RC2
==

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a Release Candidate, so please test it as much as you
can.  If no issues would appear in a week or so, I will proceed to tag
and release 1.0.0 final.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Starting work on ufunc rewrite

2016-03-31 Thread Joseph Fox-Rabinovitz
There is certainly good precedent for the approach you suggest.
Shortly after Nathaniel mentioned the rewrite to me, I looked up
d-pointers as a possible technique: https://wiki.qt.io/D-Pointer.

If we allow arbitrary kwargs for the new functions, is that something
you would want to note in the public structure? I was thinking
something along the lines of adding a hook to process additional
kwargs and return a void * that would then be passed to the loop.

To do this incrementally, perhaps opening a special development branch
on the main repository is in order?

I would love to join in the fun as time permits. Unfortunately, it is
not especially permissive right about now. I will at least throw in
some ideas that I have been mulling over.

-Joe


On Thu, Mar 31, 2016 at 4:00 PM, Jaime Fernández del Río
 wrote:
> I have started discussing with Nathaniel the implementation of the ufunc ABI
> break that he proposed in a draft NEP a few months ago:
>
> http://thread.gmane.org/gmane.comp.python.numeric.general/61270
>
> His original proposal was to make the public portion of PyUFuncObject be:
>
> typedef struct {
> PyObject_HEAD
> int nin, nout, nargs;
> } PyUFuncObject;
>
> Of course the idea is that internally we would use a much larger struct that
> we could change at will, as long as its first few entries matched those of
> PyUFuncObject. My problem with this, and I may very well be missing
> something, is that in PyUFunc_Type we need to set the tp_basicsize to the
> size of the extended struct, so we would end up having to expose its
> contents. This is somewhat similar to what now happens with PyArrayObject:
> anyone can #include "ndarraytypes.h", cast PyArrayObject* to
> PyArrayObjectFields*, and access the guts of the struct without using the
> supplied API inline functions. Not the end of the world, but if you want to
> make something private, you might as well make it truly private.
>
> I think it would be to have something similar to what NpyIter does::
>
> typedef struct {
> PyObject_HEAD
> NpyUFunc *ufunc;
> } PyUFuncObject;
>
> where NpyUFunc would, at this level, be an opaque type of which nothing
> would be known. We could have some of the NpyUFunc attributes cached on the
> PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject.
> This would also give us more liberty in making NpyUFunc be whatever we want
> it to be, including a variable-sized memory chunk that we could use and
> access at will. NpyIter is again a good example, where rather than storing
> pointers to strides and dimensions arrays, these are made part of the
> NpyIter memory chunk, effectively being equivalent to having variable sized
> arrays as part of the struct. And I think we will probably no longer trigger
> the Cython warnings about size changes either.
>
> Any thoughts on this approach? Is there anything fundamentally wrong with
> what I'm proposing here?
>
> Also, this is probably going to end up being a rewrite of a pretty large and
> complex codebase. I am not sure that working on this on my own and
> eventually sending a humongous PR is the best approach. Any thoughts on how
> best to handle turning this into a collaborative, incremental effort? Anyone
> who would like to join in the fun?
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
> dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Starting work on ufunc rewrite

2016-03-31 Thread Jaime Fernández del Río
I have started discussing with Nathaniel the implementation of the ufunc
ABI break that he proposed in a draft NEP a few months ago:

http://thread.gmane.org/gmane.comp.python.numeric.general/61270

His original proposal was to make the public portion of PyUFuncObject be:

typedef struct {
PyObject_HEAD
int nin, nout, nargs;
} PyUFuncObject;

Of course the idea is that internally we would use a much larger struct
that we could change at will, as long as its first few entries matched
those of PyUFuncObject. My problem with this, and I may very well be
missing something, is that in PyUFunc_Type we need to set the tp_basicsize
to the size of the extended struct, so we would end up having to expose its
contents. This is somewhat similar to what now happens with PyArrayObject:
anyone can #include "ndarraytypes.h", cast PyArrayObject* to
PyArrayObjectFields*, and access the guts of the struct without using the
supplied API inline functions. Not the end of the world, but if you want to
make something private, you might as well make it truly private.

I think it would be to have something similar to what NpyIter does::

typedef struct {
PyObject_HEAD
NpyUFunc *ufunc;
} PyUFuncObject;

where NpyUFunc would, at this level, be an opaque type of which nothing
would be known. We could have some of the NpyUFunc attributes cached on the
PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject.
This would also give us more liberty in making NpyUFunc be whatever we want
it to be, including a variable-sized memory chunk that we could use and
access at will. NpyIter is again a good example, where rather than storing
pointers to strides and dimensions arrays, these are made part of the
NpyIter memory chunk, effectively being equivalent to having variable sized
arrays as part of the struct. And I think we will probably no longer
trigger the Cython warnings about size changes either.

Any thoughts on this approach? Is there anything fundamentally wrong with
what I'm proposing here?

Also, this is probably going to end up being a rewrite of a pretty large
and complex codebase. I am not sure that working on this on my own and
eventually sending a humongous PR is the best approach. Any thoughts on how
best to handle turning this into a collaborative, incremental effort?
Anyone who would like to join in the fun?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C-API: multidimensional array indexing?

2016-03-31 Thread mpc
Cool!

But I'm having trouble implementing this, could you provide an example on
how exactly to do this? I'm not sure how to create the appropriate tuple and
how to use it with PyObject_GetItem given an PyArrayObject, unless I'm
misunderstood.

Much appreciated,

Matthew



--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/C-API-multidimensional-array-indexing-tp7413p42693.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.5.1 released

2016-03-31 Thread Francesc Alted
=
 Announcing Numexpr 2.5.1
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

Fixed a critical bug that caused wrong evaluations of log10() and
conj().  These produced wrong results when numexpr was compiled with
Intel's MKL (which is a popular build since Anaconda ships it by
default) and non-contiguous data.  This is considered a *critical* bug
and upgrading is highly recommended. Thanks to Arne de Laat and Tom
Kooij for reporting and providing a test unit.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion