Re: [Numpy-discussion] understanding buffering done when broadcasting

2015-11-27 Thread Eli Bendersky
On Wed, Nov 25, 2015 at 12:21 AM, Sebastian Berg  wrote:

> On Di, 2015-11-24 at 16:49 -0800, Eli Bendersky wrote:
> >
> >
> > On Mon, Nov 23, 2015 at 2:09 PM, Sebastian Berg
> >  wrote:
> > On Mo, 2015-11-23 at 13:31 -0800, Eli Bendersky wrote:
> > > Hello,
> > >
> > >
> > > I'm trying to understand the buffering done by the Numpy
> > iterator
> > > interface (the new post 1.6-one) when running ufuncs on
> > arrays that
> > > require broadcasting. Consider this simple case:
> > >
> > > In [35]: m = np.arange(16).reshape(4,4)
> > > In [37]: n = np.arange(4)
> > > In [39]: m + n
> > > Out[39]:
> > > array([[ 0,  2,  4,  6],
> > >[ 4,  6,  8, 10],
> > >[ 8, 10, 12, 14],
> > >[12, 14, 16, 18]])
> > >
> > >
> >
> >
> > On first sight this seems true. However, there is one other
> > point to
> > consider here. The inner ufunc loop can only handle a single
> > stride. The
> > contiguous array `n` has to be iterated as if it had the
> > strides
> > `(0, 8)`, which is not the strides of the contiguous array `m`
> > which can
> > be "unrolled" to 1-D. Those effective strides are thus not
> > contiguous
> > for the inner ufunc loop and cannot be unrolled into a single
> > ufunc
> > call!
> >
> > The optimization (which might kick in a bit more broadly
> > maybe), is thus
> > that the number of inner loop calls is minimized, whether that
> > is worth
> > it, I am not sure, it may well be that there is some worthy
> > optimization
> > possible here.
> > Note however, that this does not occur for large inner loop
> > sizes
> > (though I think you can find some "bad" sizes):
> >
> > ```
> > In [18]: n = np.arange(4)
> >
> > In [19]: m = np.arange(16).reshape(4,4)
> >
> > In [20]: o = m + n
> > Iterator: Checking casting for operand 0
> > op: dtype('int64'), iter: dtype('int64')
> > Iterator: Checking casting for operand 1
> > op: dtype('int64'), iter: dtype('int64')
> > Iterator: Checking casting for operand 2
> > op: , iter: dtype('int64')
> > Iterator: Setting allocated stride 1 for iterator dimension 0
> > to 8
> > Iterator: Setting allocated stride 0 for iterator dimension 1
> > to 32
> > Iterator: Copying inputs to buffers
> > Iterator: Expanding inner loop size from 8192 to 4 since
> > buffering
> > wasn't needed
> > Any buffering needed: 0
> > Iterator: Finished copying inputs to buffers (buffered size is
> > 4)
> > ```
> >
> >
> > The heuristic in the code says that if we can use a single stride and
> > that's larger than the buffer size (which I assume is the default
> > buffer size, and can change) then it's "is_onestride" and no buffering
> > is done.
> >
> >
> > So this led me to explore around this threshold (8192 items by default
> > on my machine), and indeed we can notice funny behavior:
> >
> > In [51]: %%timeit n = arange(8192); m =
> > np.arange(8192*40).reshape(40,8192)
> > o = m + n
> >:
> > 1000 loops, best of 3: 274 µs per loop
> >
> > In [52]: %%timeit n = arange(8292); m =
> > np.arange(8292*40).reshape(40,8292)
> > o = m + n
> >:
> > 1000 loops, best of 3: 229 µs per loop
> >
> >
> > So, given this, it's not very clear why the "optimization" kicks in.
> > Buffering for small sizes seems like a mistake.
> >
>
> I am pretty sure it is not generally a mistake. Consider the case of an
> 1x3 array (note that shrinking the buffer can have great advantage
> though, I am not sure if this is done usually).
> If you have (1, 3) + (3,) arrays, then the ufunc outer loop would
> have 1x overhead. Doing the buffering (which I believe has some
> extra code to be faster), will lower this to a few ufunc inner loop
> calls.
> I have not timed it, but I would be a bit surprised if it was not faster
> in this case at least. Even calling a C function (and looping) for an
> inner loop of 3 elements, should be quite a bit of overhead, and my
> guess is more overhead than the buffering.
>

Yes, that's a good point for arrays shaped like this. I guess all this
leaves us with is a realization that the heuristic *could* be tuned
somewhat for arrays where the inner dimension is large - as in the case I
demonstrated above it's nonsensical to have a computation be 20% faster
when the array size increases over an arbitrary threshold.

I'll see if I can find some time to dig more into this and figure out where
the knobs to tweak the heuristic are.

Thanks for the enlightening 

Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-11-27 Thread Alan G Isaac

On 11/27/2015 5:37 AM, Stephan Sahm wrote:

I like to request a generator/iterator support for np.array(...) as far as 
list(...) supports it.



http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html

hth,
Alan Isaac
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ENH: added vector operators: divergence, curl and laplacian #6727

2015-11-27 Thread Antonio Lara
Hello, I have corrected the errors in my previous pull request that
includes the new functions divergence, curl and laplacian.

https://github.com/numpy/numpy/pull/6727

Thank you,
Antonio
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-11-27 Thread Stephan Sahm
​​
[ this
​request
/discussion refers to numpy issue
​​
​​
#5863
​ ​
​​
https://github.com/numpy/numpy/pull/5863#issuecomment-159738368 ]


Dear all,

As far as I can think, the expected functionality of np.array(...) would be
np.array(list(...)) or something even nicer.
Therefore, I like to request a generator/iterator support for np.array(...)
as far as list(...) supports it.


A more detailed reasoning behind this follows now.


In general it seems possible to identify iterators/generators as needed for
this purpose: - someone actually implemented this feature already (see
​​
​​
#5863

) - there is ​``type.GeneratorType​`` and ​``​collections.abc.Iterator​``
for ``isinstance​(...)`` check ​- numpy can destinguish them already from
all other types which get well translated into a numpy array​ ​

Given this, I think the general argument goes roughly like the following:

PROS (effect maybe 10% of numpy user or more): - more intuitive overall
behaviour, array(...) = array(list(...)) roughly - python3 compatibility
(see e.g. #5951 ) -
compatibility with analog ``__builtin__`` functions (see e.g. #5756
) - all the above make numpy
easier to use in an interactive style (e.g. ipython --pylab) (computation
not that important, however coding time well) CONS (effect less than 0.1%
numpy user I would guess): - might break existing code

which in total, at least for me at this stage, speaks in favour of merging
​ ​
​the already existing
​featurebranch (see
​
​​
​​
#5863

​)
​
or something similar into numpy master
​.

Discussion, please!
​cheers,
Stepha​n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion