Re: [Numpy-discussion] Numpy Development Queries

2017-02-22 Thread Matthew Harrigan
Ashwin, I don't know your background but perhaps it is similar to mine.  I
use numpy extensively in my day job and starting contributing to numpy a
few months ago.  From using numpy, I found some things that I thought
should be added/improved.  I researched them and the associated numpy code
enough to be confident I would be capable of making the change (some of the
C code is intense).  Before I got too far along, I posted an issue and got
feedback from the experts.  Then I did it.

On Tue, Feb 21, 2017 at 12:01 AM, Ralf Gommers 
wrote:

>
>
> On Tue, Feb 21, 2017 at 7:32 AM, ashwin.pathak <
> ashwin.pat...@students.iiit.ac.in> wrote:
>
>> Hello all,
>> I am new to this organization and wanted to start with some easy-fix
>> issues to get some knowledge about the soruce code. However, the issues
>> under easy-fix labels have already been solved or someone is at it. Can
>> someone help me find such issues?
>>
>
> Hi Ashwin, welcome. I don't want to seem discouraging, but I do want to
> explain that NumPy is significantly harder to get started on than SciPy
> (which you've started on already) as a newcomer to the scientific Python
> ecosystem. So I'd encourage you to spend some more time on the SciPy issues
> - there are more easy-fix ones there, and the process of contributing (pull
> requests, reviews, finding your way around the codebase) is similar for the
> two projects.
>
> Cheers,
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread Matthew Harrigan
Alex,

Can you please post some code showing exactly what you are trying to do and
any issues you are having, particularly the "irritating problems with its
row indexing and some other problems" you quote above?

On Wed, Feb 22, 2017 at 10:34 AM, Robert McLeod 
wrote:

> Just as a note, Appveyor supports uploading modules to "public websites":
>
> https://packaging.python.org/appveyor/
>
> The main issue I would see from this, is the PyPi has my password stored
> on my machine in a plain text file.   I'm not sure whether there's a way to
> provide Appveyor with a SSH key instead.
>
> On Wed, Feb 22, 2017 at 4:23 PM, Alex Rogozhnikov <
> alex.rogozhni...@yandex.ru> wrote:
>
>> Hi Francesc,
>> thanks a lot for you reply and for your impressive job on bcolz!
>>
>> Bcolz seems to make stress on compression, which is not of much interest
>> for me, but the *ctable*, and chunked operations look very appropriate
>> to me now. (Of course, I'll need to test it much before I can say this for
>> sure, that's current impression).
>>
>> The strongest concern with bcolz so far is that it seems to be completely
>> non-trivial to install on windows systems, while pip provides binaries for
>> most (or all?) OS for numpy.
>> I didn't build pip binary wheels myself, but is it hard / impossible to
>> cook pip-installabel binaries?
>>
>> ​You can change shapes of numpy arrays, but that usually involves copies
>> of the whole container.
>>
>> sure, but this is ok for me, as I plan to organize column editing in
>> 'batches', so this should require seldom copying.
>> It would be nice to see an example to understand how deep I need to go
>> inside numpy.
>>
>> Cheers,
>> Alex.
>>
>>
>>
>>
>> 22 февр. 2017 г., в 17:03, Francesc Alted  написал(а):
>>
>> Hi Alex,
>>
>> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov :
>>
>>> Hi Nathaniel,
>>>
>>>
>>> pandas
>>>
>>>
>>> yup, the idea was to have minimal pandas.DataFrame-like storage (which I
>>> was using for a long time),
>>> but without irritating problems with its row indexing and some other
>>> problems like interaction with matplotlib.
>>>
>>> A dict of arrays?
>>>
>>>
>>> that's what I've started from and implemented, but at some point I
>>> decided that I'm reinventing the wheel and numpy has something already. In
>>> principle, I can ignore this 'column-oriented' storage requirement, but
>>> potentially it may turn out to be quite slow-ish if dtype's size is large.
>>>
>>> Suggestions are welcome.
>>>
>>
>> ​You may want to try bcolz:
>>
>> https://github.com/Blosc/bcolz
>>
>> bcolz is a columnar storage, basically as you require, but data is
>> compressed by default even when stored in-memory (although you can disable
>> compression if you want to).​
>>
>>
>>
>>>
>>> Another strange question:
>>> in general, it is considered that once numpy.array is created, it's
>>> shape not changed.
>>> But if i want to keep the same recarray and change it's dtype and/or
>>> shape, is there a way to do this?
>>>
>>
>> ​You can change shapes of numpy arrays, but that usually involves copies
>> of the whole container.  With bcolz you can change length and add/del
>> columns without copies.​  If your containers are large, it is better to
>> inform bcolz on its final estimated size.  See:
>>
>> http://bcolz.blosc.org/en/latest/opt-tips.html
>>
>> ​Francesc​
>>
>>
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>>
>>> 22 февр. 2017 г., в 3:53, Nathaniel Smith  написал(а):
>>>
>>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" 
>>> wrote:
>>>
>>> Ah, got it. Thanks, Chris!
>>> I thought recarray can be only one-dimensional (like tables with named
>>> columns).
>>>
>>> Maybe it's better to ask directly what I was looking for:
>>> something that works like a table with named columns (but no labelling
>>> for rows), and keeps data (of different dtypes) in a column-by-column way
>>> (and this is numpy, not pandas).
>>>
>>> Is there such a magic thing?
>>>
>>>
>>> Well, that's what pandas is for...
>>>
>>> A dict of arrays?
>>>
>>> -n
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>>
>> --
>> Francesc Alted
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> Robert McLeod, Ph.D.
> Center for Cellular Imaging and Nano Analytics (C-CINA)
> Biozentrum der Universität Basel
> Mattenstrasse 26, 4058 Basel
> Work: +41.061.387.3225 <+41%2061%20387%2032%2025>
> robert.mcl...@unibas.ch
>

Re: [Numpy-discussion] From Python to Numpy

2017-01-13 Thread Matthew Harrigan
I coded up an all_equal gufunc here
.  Benchmark results
are also in that repo.  For the specific problem in the book which started
this, its 40x faster than the optimized code in the book.  For large arrays
which have any early non equal element, its dramatically faster (1000x)
than the current alternative.  For large arrays which are all equal, its
~10% faster due to eliminating the intermediate boolean array.  For tiny
arrays its much faster due to a single function call instead of at least
two, but its debatable how relevant speed is for tiny problems.
Disclaimer: this is my first ufunc I have every written.

On Tue, Jan 10, 2017 at 8:27 PM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:

> > It seems a generalized ufunc "all_equal" with signature (i),(i)->() and
> short circuit logic once the first non equal element is encountered would
> be an important performance improvement.
>
> How does array_equal() perform?
>
> -CHB
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] From Python to Numpy

2017-01-09 Thread Matthew Harrigan
I also have been stalking this email thread.  First, excellent book!

Regarding the vectorization example mentioned above, one thing to note is
that it increases the order of the algorithm relative to the pure python.
The vectorized approach uses correlate, which requires ~(len(seq) *
len(sub)) FLOPs.  In the case where the first element in sub is not equal
to the vast majority of elements in seq, the basic approach requires
~len(seq) comparisons.  Note that is the case in the SO answer.  One fairly
common thing I have seen in vectorized approaches is that the memory or
operations required scales worse than strictly required.  It may or may not
be an issue, largely depends on the specifics of how its used, but it
usually indicates a better approach exists.  That may be worth mentioning
here.

Given that, I tried to come up with an "ideal" approach.  stride_tricks can
be used to convert seq to a 2D array, and then ideally each row could be
compared to sub.  However I can't think of how to do that with numpy
function calls other than compare each element in the 2D array, requiring
O(n_sub*n_seq) operations again.  array_equal

is an example of that.  Python list equality scales better, for instance if
x = [0]*n and y = [1]*n, x == y is very fast and the time is independent of
the value of n.

It seems a generalized ufunc "all_equal" with signature (i),(i)->() and
short circuit logic once the first non equal element is encountered would
be an important performance improvement.  In the ideal case it is
dramatically faster, and even if every element must be compared then its
still probably meaningfully faster since the boolean intermediate array
isn't created.  Even better would be to get the axis argument in place for
generalized ufuncs.  Then this problem could be vectorized in one line with
far better performance.  If others think this is a good idea I will post an
issue and attempt a solution.


On Sat, Dec 31, 2016 at 5:23 AM, Nicolas P. Rougier <
nicolas.roug...@inria.fr> wrote:

>
> > I’ve seen vectorisation taken to the extreme, with negative consequences
> in terms of both speed and readability, in both Python and MATLAB
> codebases, so I would suggest some discussion / wisdom about when not to
> vectorise.
>
>
> I agree and there is actually a warning in the introduction about
> readability vs speed with an example showing a clever optimization (by
> Jaime Fernández del Río) that is hardly readable for the non-experts
> (including myself).
>
>
> Nicolas
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-16 Thread Matthew Harrigan
Thanks for pointing me to that.  I think its a great match for a 1D case,
like the inner product of two arrays in terms of signature.  I don't see
how it works with higher dimensional arrays, esp with something like the
axis parameter in ufunc.reduce.  With what I proposed for an array of shape
(M, N) for example, result.shape could be (1,) or (M, 1) or (1, N) for
reducing over the flattened array or either axis.  Can you do something
like that with gufunc or do you need to iterate gufunc calls over
slices/views?  Thanks again.

On Mon, Nov 14, 2016 at 8:49 PM, Stephan Hoyer  wrote:

> On Mon, Nov 14, 2016 at 5:40 PM, Matthew Harrigan <
> harrigan.matt...@gmail.com> wrote:
>
>> Essentially it creates a reduce for a function which isn't binary.  I
>> think this would be generally useful.
>>
>
> NumPy already has a generic enough interface for creating such ufuncs. In
> fact, it's called a "generalized ufunc":
> https://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html
>
> I think you could already write "implicit reductions" using gufuncs?
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-14 Thread Matthew Harrigan
Still slower and worse uses 2x the memory for the intermediate temporary
array.

I propose allowing implicit reductions with ufuncs.  Specifically if out is
provided with shape[axis] = 1, then pass it on to the ufunc with a stride
of 0.  That should allow this to work:

x = np.arange(10)
def add_square_diff(x1, x2, x3):
return x1 + (x2-x3)**2
result  =np.zeros(1)
np.frompyfunc(add_square_diff, 3, 1)(result, x, np.mean(x), result)

Essentially it creates a reduce for a function which isn't binary.  I think
this would be generally useful.  For instance, finding the min and max in
one pass would be nice:

def minmax(x1, x2, x3):
return min(x1,x3), max(x2,x3)
minn = np.array([np.inf])
maxx = np.array([-np.inf])
np.frompyfunc(minmax, 3, 2)(minn, maxx, x, minn, maxx)

Note it also allows for arbitrary initial values or identity to be
specified, possibly determined at run time.  I think this would make ufuncs
even more universal.

On Mon, Nov 14, 2016 at 3:38 AM, Jerome Kieffer 
wrote:

> On Fri, 11 Nov 2016 11:25:58 -0500
> Matthew Harrigan  wrote:
>
> > I started a ufunc to compute the sum of square differences here
> > <https://gist.github.com/mattharrigan/6f678b3d6df5efd236fc23bfb59fd3bd>.
> > It is about 4x faster and uses half the memory compared to
> > np.sum(np.square(x-c)).
>
> Hi Matt,
>
> Using *blas* you win already a factor two (maybe more depending on you
> blas implementation):
>
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "np.sum(np.square(x-2.))"
> 10 loops, best of 3: 135 msec per loop
>
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "y=x-2.;np.dot(y,y)"
> 10 loops, best of 3: 70.2 msec per loop
>
>
> Cheers,
> --
> Jérôme Kieffer
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-11 Thread Matthew Harrigan
My possibly poorly thought out API would be something like
np.add_sq_diff.set_c(42.0).reduce(x), basically add a setter method which
returned self.  Thanks for explaining what data was intended to do.  All of
this is really just a dance around the requirement that reduce only works
on binary functions.  Would there be any interest in extending reduce to
allow for arbitrary numbers of input and output arguments?

On Fri, Nov 11, 2016 at 1:38 PM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> here is an old unfinished PR adding max_abs which has a similar purpose
> as yours:
> https://github.com/numpy/numpy/pull/5509
>
> So far I know data is set during runtime construction and is part of the
> ufunc object, so it is not really very suitable to pass in constants on
> call.
> Its intended purpose is pass in functions to be called in generic loops
> like PyUFunc_d_d.
>
> Having an option to mark arguments of a ufunc as special in reductions
> could be useful, e.g. it would allow a potential
> (fused-)multiply-and-add ufunc to be used to implement a weighted sum.
>
> On 11.11.2016 17:25, Matthew Harrigan wrote:
> > I started a ufunc to compute the sum of square differences here
> > <https://gist.github.com/mattharrigan/6f678b3d6df5efd236fc23bfb59fd3bd>.
> It
> > is about 4x faster and uses half the memory compared to
> > np.sum(np.square(x-c)).  This could significantly speed up common
> > operations like std and var (where c=np.mean(x).  It faster because its
> > a single pass instead of 3, and also because the inner loop is
> > specialized for the common reduce case, which might be an optimization
> > considering more generally.
> >
> > I think I have answered my question #2&3 above.
> >
> > Can someone please point me to an example where "data" was used in a
> > ufunc inner loop?  How can that value be set at runtime?  Thanks
> >
> > On Fri, Nov 4, 2016 at 5:33 PM, Sebastian Berg
> > mailto:sebast...@sipsolutions.net>> wrote:
> >
> > On Fr, 2016-11-04 at 15:42 -0400, Matthew Harrigan wrote:
> > > I didn't notice identity before.  Seems like frompyfunc always sets
> > > it to None.  If it were zero maybe it would work as desired here.
> > >
> > > In the writing your own ufunc doc, I was wondering if the pointer
> to
> > > data could be used to get a constant at runtime.  If not, what
> could
> > > that be used for?
> > > static void double_logit(char **args, npy_intp *dimensions,
> > > npy_intp* steps, void* data)
> > > Why would the numerical accuracy be any different?  The subtraction
> > > and square operations look identical and I thought np.sum just
> calls
> > > np.add.reduce, so the reduction step uses the same code and would
> > > therefore have the same accuracy.
> > >
> >
> > Sorry, did not read it carefully, I guess `c` is the mean, so you are
> > doing the two pass method.
> >
> > - Sebastian
> >
> >
> > > Thanks
> > >
> > > On Fri, Nov 4, 2016 at 1:56 PM, Sebastian Berg
>  > > s.net <http://s.net>> wrote:
> > > > On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> > > > > I was reading this and got thinking about if a ufunc could
> > > > compute
> > > > > the sum of squared differences in a single pass without a
> > > > temporary
> > > > > array.  The python code below demonstrates a possible approach.
> > > > >
> > > > > import numpy as np
> > > > > x = np.arange(10)
> > > > > c = 1.0
> > > > > def add_square_diff(x1, x2):
> > > > > return x1 + (x2-c)**2
> > > > > ufunc = np.frompyfunc(add_square_diff, 2, 1)
> > > > > print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> > > > > print(np.sum(np.square(x-c)))
> > > > >
> > > > > I have (at least) 4 questions:
> > > > > 1. Is it possible to pass run time constants to a ufunc written
> > > > in C
> > > > > for use in its inner loop, and if so how?
> > > >
> > > > I don't think its anticipated, since a ufunc could in most cases
> > > > use a
> > > > third argument, but a 3 arg ufunc can't be reduced. Not sure if
> > > > there
> > > > might be some tr

Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-11 Thread Matthew Harrigan
I started a ufunc to compute the sum of square differences here
<https://gist.github.com/mattharrigan/6f678b3d6df5efd236fc23bfb59fd3bd>.
It is about 4x faster and uses half the memory compared to
np.sum(np.square(x-c)).  This could significantly speed up common
operations like std and var (where c=np.mean(x).  It faster because its a
single pass instead of 3, and also because the inner loop is specialized
for the common reduce case, which might be an optimization considering more
generally.

I think I have answered my question #2&3 above.

Can someone please point me to an example where "data" was used in a ufunc
inner loop?  How can that value be set at runtime?  Thanks

On Fri, Nov 4, 2016 at 5:33 PM, Sebastian Berg 
wrote:

> On Fr, 2016-11-04 at 15:42 -0400, Matthew Harrigan wrote:
> > I didn't notice identity before.  Seems like frompyfunc always sets
> > it to None.  If it were zero maybe it would work as desired here.
> >
> > In the writing your own ufunc doc, I was wondering if the pointer to
> > data could be used to get a constant at runtime.  If not, what could
> > that be used for?
> > static void double_logit(char **args, npy_intp *dimensions,
> > npy_intp* steps, void* data)
> > Why would the numerical accuracy be any different?  The subtraction
> > and square operations look identical and I thought np.sum just calls
> > np.add.reduce, so the reduction step uses the same code and would
> > therefore have the same accuracy.
> >
>
> Sorry, did not read it carefully, I guess `c` is the mean, so you are
> doing the two pass method.
>
> - Sebastian
>
>
> > Thanks
> >
> > On Fri, Nov 4, 2016 at 1:56 PM, Sebastian Berg  > s.net> wrote:
> > > On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> > > > I was reading this and got thinking about if a ufunc could
> > > compute
> > > > the sum of squared differences in a single pass without a
> > > temporary
> > > > array.  The python code below demonstrates a possible approach.
> > > >
> > > > import numpy as np
> > > > x = np.arange(10)
> > > > c = 1.0
> > > > def add_square_diff(x1, x2):
> > > > return x1 + (x2-c)**2
> > > > ufunc = np.frompyfunc(add_square_diff, 2, 1)
> > > > print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> > > > print(np.sum(np.square(x-c)))
> > > >
> > > > I have (at least) 4 questions:
> > > > 1. Is it possible to pass run time constants to a ufunc written
> > > in C
> > > > for use in its inner loop, and if so how?
> > >
> > > I don't think its anticipated, since a ufunc could in most cases
> > > use a
> > > third argument, but a 3 arg ufunc can't be reduced. Not sure if
> > > there
> > > might be some trickery possible.
> > >
> > > > 2. Is it possible to pass an initial value to reduce to avoid the
> > > > clean up required for the first element?
> > >
> > > This is the identity normally. But the identity can only be 0, 1 or
> > > -1
> > > right now I think. The identity is what the output array gets
> > > initialized with (which effectively makes it the first value passed
> > > into the inner loop).
> > >
> > > > 3. Does that ufunc work, or are there special cases which cause
> > > it to
> > > > fall apart?
> > > > 4. Would a very specialized ufunc such as this be considered for
> > > > incorporating in numpy since it would help reduce time and memory
> > > of
> > > > functions already in numpy?
> > > >
> > >
> > > Might be mixing up things, however, IIRC the single pass approach
> > > has a
> > > bad numerical accuracy, so that I doubt that it is a good default
> > > algorithm.
> > >
> > > - Sebastian
> > >
> > >
> > > > Thank you,
> > > > Matt
> > > > ___
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@scipy.org
> > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Matthew Harrigan
I didn't notice identity before.  Seems like frompyfunc always sets it to
None.  If it were zero maybe it would work as desired here.

In the writing your own ufunc doc, I was wondering if the pointer to data
could be used to get a constant at runtime.  If not, what could that be
used for?

static void double_logit(char **args, npy_intp *dimensions,
npy_intp* steps, void* data)

Why would the numerical accuracy be any different?  The subtraction and
square operations look identical and I thought np.sum just calls
np.add.reduce, so the reduction step uses the same code and would therefore
have the same accuracy.

Thanks

On Fri, Nov 4, 2016 at 1:56 PM, Sebastian Berg 
wrote:

> On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> > I was reading this and got thinking about if a ufunc could compute
> > the sum of squared differences in a single pass without a temporary
> > array.  The python code below demonstrates a possible approach.
> >
> > import numpy as np
> > x = np.arange(10)
> > c = 1.0
> > def add_square_diff(x1, x2):
> > return x1 + (x2-c)**2
> > ufunc = np.frompyfunc(add_square_diff, 2, 1)
> > print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> > print(np.sum(np.square(x-c)))
> >
> > I have (at least) 4 questions:
> > 1. Is it possible to pass run time constants to a ufunc written in C
> > for use in its inner loop, and if so how?
>
> I don't think its anticipated, since a ufunc could in most cases use a
> third argument, but a 3 arg ufunc can't be reduced. Not sure if there
> might be some trickery possible.
>
> > 2. Is it possible to pass an initial value to reduce to avoid the
> > clean up required for the first element?
>
> This is the identity normally. But the identity can only be 0, 1 or -1
> right now I think. The identity is what the output array gets
> initialized with (which effectively makes it the first value passed
> into the inner loop).
>
> > 3. Does that ufunc work, or are there special cases which cause it to
> > fall apart?
> > 4. Would a very specialized ufunc such as this be considered for
> > incorporating in numpy since it would help reduce time and memory of
> > functions already in numpy?
> >
>
> Might be mixing up things, however, IIRC the single pass approach has a
> bad numerical accuracy, so that I doubt that it is a good default
> algorithm.
>
> - Sebastian
>
>
> > Thank you,
> > Matt
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Matthew Harrigan
I was reading this
 and got
thinking about if a ufunc could compute the sum of squared differences in a
single pass without a temporary array.  The python code below demonstrates
a possible approach.

import numpy as np
x = np.arange(10)
c = 1.0
def add_square_diff(x1, x2):
return x1 + (x2-c)**2
ufunc = np.frompyfunc(add_square_diff, 2, 1)
print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
print(np.sum(np.square(x-c)))

I have (at least) 4 questions:
1. Is it possible to pass run time constants to a ufunc written in C for
use in its inner loop, and if so how?
2. Is it possible to pass an initial value to reduce to avoid the clean up
required for the first element?
3. Does that ufunc work, or are there special cases which cause it to fall
apart?
4. Would a very specialized ufunc such as this be considered for
incorporating in numpy since it would help reduce time and memory of
functions already in numpy?

Thank you,
Matt
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Matthew Harrigan
Would it be preferable to have to_begin='first' as an option under the
existing kwarg to avoid overlapping?

On Wed, Oct 26, 2016 at 3:35 PM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> > Date: Wed, 26 Oct 2016 09:05:41 -0400
> > From: Matthew Harrigan 
> >
> > np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis),
> axis=axis)
> >
> > That's certainly not going to win any beauty contests.  The 1d case is
> > clean though:
> >
> > np.cumsum(np.diff(x, to_begin=x[0]))
> >
> > I'm not sure if this means the API should change, and if so how.  Higher
> > dimensional arrays seem to just have extra complexity.
> >
> >>
> >> I like the proposal, though I suspect that making it general has
> >> obscured that the most common use-case for padding is to make the
> >> inverse of np.cumsum (at least that?s what I frequently need), and now
> >> in the multidimensional case you have the somewhat unwieldy:
> >>
> >> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
> >>
> >> rather than
> >>
> >> >>> np.diff(a, axis=axis, keep_left=True)
> >>
> >> which of course could just be an option upon what you already have.
> >>
>
> So my suggestion was intended that you might want an additional
> keyword argument (keep_left=False) to make the inverse np.cumsum
> use-case easier, i.e. you would have something in your np.diff like:
>
> if keep_left:
> if to_begin is None:
> to_begin = np.take(a, [0], axis=axis)
> else:
> raise ValueError(‘np.diff(a, keep_left=False, to_begin=None)
> can be used with either keep_left or to_begin, but not both.’)
>
> Generally I try to avoid optional keyword argument overlap, but in
> this case it is probably justified.
>
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Matthew Harrigan
The inverse of cumsum is actually a little more unweildy since you can't
drop a dimension with take.  This returns the original array (numerical
caveats aside):

np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis), axis=axis)

That's certainly not going to win any beauty contests.  The 1d case is
clean though:

np.cumsum(np.diff(x, to_begin=x[0]))

I'm not sure if this means the API should change, and if so how.  Higher
dimensional arrays seem to just have extra complexity.

On Tue, Oct 25, 2016 at 1:26 PM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> > Date: Mon, 24 Oct 2016 08:44:46 -0400
> > From: Matthew Harrigan 
> >
> > I posted a pull request <https://github.com/numpy/numpy/pull/8206> which
> > adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
> > options are based on what's available in ediff1d.  It closes this issue
> > <https://github.com/numpy/numpy/issues/8132>
>
> I like the proposal, though I suspect that making it general has
> obscured that the most common use-case for padding is to make the
> inverse of np.cumsum (at least that’s what I frequently need), and now
> in the multidimensional case you have the somewhat unwieldy:
>
> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
>
> rather than
>
> >>> np.diff(a, axis=axis, keep_left=True)
>
> which of course could just be an option upon what you already have.
>
> Best,
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Matthew Harrigan
It seems pickle keeps track of references for basic python types.

x = [1]
y = [x]
x,y = pickle.loads(pickle.dumps((x,y)))
x.append(2)
print(y)
>>> [[1,2]]

Numpy arrays are different but references are forgotten after
pickle/unpickle.  Shared objects do not remain shared.  Based on the quote
below it could be considered bug with numpy/pickle.

Object sharing (references to the same object in different places): This is
similar to self-referencing objects; pickle stores the object once, and
ensures that all other references point to the master copy. Shared objects
remain shared, which can be very important for mutable objects.  link


Another example with ndarrays:


x = np.arange(5)
y = x[::-1]
x, y = pickle.loads(pickle.dumps((x, y)))
x[0] = 9
print(y)
>>> [4, 3, 2, 1, 0]

In this case the two arrays share the exact same object for the data buffer
(although object might not be the right word here)

On Tue, Oct 25, 2016 at 7:28 PM, Robert Kern  wrote:

> On Tue, Oct 25, 2016 at 3:07 PM, Stephan Hoyer  wrote:
> >
> > On Tue, Oct 25, 2016 at 1:07 PM, Nathaniel Smith  wrote:
> >>
> >> Concretely, what do would you suggest should happen with:
> >>
> >> base = np.zeros(1)
> >> view = base[:10]
> >>
> >> # case 1
> >> pickle.dump(view, file)
> >>
> >> # case 2
> >> pickle.dump(base, file)
> >> pickle.dump(view, file)
> >>
> >> # case 3
> >> pickle.dump(view, file)
> >> pickle.dump(base, file)
> >>
> >> ?
> >
> > I see what you're getting at here. We would need a rule for when to
> include the base in the pickle and when not to. Otherwise,
> pickle.dump(view, file) always contains data from the base pickle, even
> with view is much smaller than base.
> >
> > The safe answer is "only use views in the pickle when base is already
> being pickled", but that isn't possible to check unless all the arrays are
> together in a custom container. So, this isn't really feasible for NumPy.
>
> It would be possible with a custom Pickler/Unpickler since they already
> keep track of objects previously (un)pickled. That would handle [base,
> view] okay but not [view, base], so it's probably not going to be all that
> useful outside of special situations. It would make a neat recipe, but I
> probably would not provide it in numpy itself.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] padding options for diff

2016-10-24 Thread Matthew Harrigan
I posted a pull request  which
adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
options are based on what's available in ediff1d.  It closes this issue

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] add elementwise addition & subtraction to einsum

2016-10-16 Thread Matthew Harrigan
Hello,

This is a follow on for issue 8139
.  I propose adding elementwise
addition and subtraction functionality to einsum.  I love einsum as it
clearly and concisely defines complex linear algebra.  However elementwise
addition is a very common linear algebra operation and einsum does not
currently support it.  The Einstein field equations
,
what the notation was originally developed to document, contain that
functionality.  It is fairly common in stress analysis (my background), for
example see these lectures notes

.

Specifically I propose adding "+" and "-" characters which separate current
einsum statements which are then combined elementwise.  An example is A =
np.einsum('ij,jk+ij,jk', B, C, D, E), which is A = B * C + D * E.  I wrote
a crude function

to demonstrate the functionality.

I believe the functionality is useful, in keeping with the spirit of a
clean concise API, and doesn't break the existing API, which could warrant
acceptance.

Additionally I believe it opens the possibility of many interesting
performance optimizations.  For instance, many of the optimizations in this
NEP 
could be done internally to the einsum function, which may be easier to
accomplish given the narrower scope (but I am ignorant of all the low level
C internals of numpy).  The example in the beginning could become A =
np.einsum('...+...+...', B, C, D).

Thank you for your time and consideration.
Matt
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion