from:"Neil Girdhar"

Re: [Numpy-discussion] Should ndarray subclasses support the keepdims arg?

2015-05-05 Thread Neil Girdhar

Maybe they should have written their code with **kwargs that consumes all
keyword arguments rather than assuming that no keyword arguments would be
added?  The problem with this approach in general is that it makes writing
code unnecessarily convoluted.

On Tue, May 5, 2015 at 1:55 PM, Nathaniel Smith n...@pobox.com wrote:

 AFAICT the only real solution here is for np.sum and friends to propagate
 the keepdims argument if and only if it was explicitly passed to them (or
 maybe the slightly different, if and only if it has a non-default value).
 If we just started requiring code to handle it and passing it
 unconditionally, then as soon as someone upgraded numpy all their existing
 code might break for no good reason.
 On May 5, 2015 8:13 AM, Allan Haldane allanhald...@gmail.com wrote:

 Hello all,

 A question:

 Many ndarray methods (eg sum, mean, any, min) have a keepdims keyword
 argument, but ndarray subclass methods sometimes don't. The 'matrix'
 subclass doesn't, and numpy functions like 'np.sum' intentionally
 drop/ignore the keepdims argument when called with an ndarray subclass
 as first argument.

 This means you can't always use ndarray subclasses as 'drop in'
 replacement for ndarrays if the code uses keepdims (even indirectly),
 and it means code that deals with keepdims (eg np.sum and more) has to
 detect ndarray subclasses and drop keepdims even if the subclass
 supports it (since there is no good way to detect support). It seems to
 me that if we are going to use inheritance, subclass methods should keep
 the signature of the parent class methods. What does the list think?

  Details: 

 This problem comes up in a PR I'm working on (#5706) to add the keepdims
 arg to masked array methods. In order to support masked matrices (which
 a lot of unit tests check), I would have to detect and drop the keepdims
 arg to avoid an exception. This would be solved if the matrix class
 supported keepdims (plus an update to np.sum). Similarly,
 `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it
 could work if all subclasses supported keepdims.

 I do not foresee immediate problems with adding keepdims to the matrix
 methods, except that it would be an unused argument. Modifying `np.sum`
 to always pass on the keepdims arg is trickier, since it would break any
 code that tried to np.sum a subclass that doesn't support keepdims, eg
 pandas.DataFrame. **kwargs tricks might work. But if it's permissible I
 think it would be better to require subclasses to support all the
 keyword args ndarray supports.

 Allan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-29 Thread Neil Girdhar

Sorry for the late reply.   I will definitely consider submitting a pull
request to numexpr if it's the direction I decide to go.  Right now I'm
still evaluating all of the many options for my project.

I am implementing a machine learning algorithm as part of my thesis work.
I'm in the make it work, but quickly approaching the make it fast part.

With research, you usually want to iterate quickly, and so whatever
solution I choose has to be automated.  I can't be coding things in an
intuitive, natural way, and then porting it to a different implementation
to make it fast.  What I want is for that conversion to be automated.  I'm
still evaluating how to best achieve that.

On Tue, Apr 28, 2015 at 6:08 AM, Francesc Alted fal...@gmail.com wrote:

 2015-04-28 4:59 GMT+02:00 Neil Girdhar mistersh...@gmail.com:

 I don't think I'm asking for so much.  Somewhere inside numexpr it builds
 an AST of its own, which it converts into the optimized code.   It would be
 more useful to me if that AST were in the same format as the one returned
 by Python's ast module.  This way, I could glue in the bits of numexpr that
 I like with my code.  For my purpose, this would have been the more ideal
 design.


 I don't think implementing this for numexpr would be that complex. So for
 example, one could add a new numexpr.eval_ast(ast_expr) function.  Pull
 requests are welcome.

 At any rate, which is your use case?  I am curious.

 --
 Francesc Alted

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Neil Girdhar

I've always wondered why numexpr accepts strings rather than looking a
function's source code, using ast to parse it, and then transforming the
AST.  I just looked at another project, pyautodiff, which does that.  And I
think numba does that for llvm code generation.  Wouldn't it be nicer to
just apply a decorator to a function than to write the function as a Python
string?


On Mon, Apr 27, 2015 at 11:50 AM, Francesc Alted fal...@gmail.com wrote:

  Announcing Numexpr 2.4.3
 =

 Numexpr is a fast numerical expression evaluator for NumPy.  With it,
 expressions that operate on arrays (like 3*a+4*b) are accelerated
 and use less memory than doing the same calculation in Python.

 It wears multi-threaded capabilities, as well as support for Intel's
 MKL (Math Kernel Library), which allows an extremely fast evaluation
 of transcendental functions (sin, cos, tan, exp, log...)  while
 squeezing the last drop of performance out of your multi-core
 processors.  Look here for a some benchmarks of numexpr using MKL:

 https://github.com/pydata/numexpr/wiki/NumexprMKL

 Its only dependency is NumPy (MKL is optional), so it works well as an
 easy-to-deploy, easy-to-use, computational engine for projects that
 don't want to adopt other solutions requiring more heavy dependencies.

 What's new
 ==

 This is a maintenance release to cope with an old bug affecting
 comparisons with empty strings.  Fixes #121 and PyTables #184.

 In case you want to know more in detail what has changed in this
 version, see:

 https://github.com/pydata/numexpr/wiki/Release-Notes

 or have a look at RELEASE_NOTES.txt in the tarball.

 Where I can find Numexpr?
 =

 The project is hosted at GitHub in:

 https://github.com/pydata/numexpr

 You can get the packages from PyPI as well (but not for RC releases):

 http://pypi.python.org/pypi/numexpr

 Share your experience
 =

 Let us know of any bugs, suggestions, gripes, kudos, etc. you may
 have.


 Enjoy data!

 --
 Francesc Alted

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Neil Girdhar

Also, FYI: http://numba.pydata.org/numba-doc/0.6/doc/modules/transforms.html

It appears that numba does get the ast similar to pyautodiff and only get
the ast from source code as a fallback?

On Mon, Apr 27, 2015 at 7:23 PM, Neil Girdhar mistersh...@gmail.com wrote:

 I was told that numba did similar ast parsing, but maybe that's not true.
 Regarding the ast, I don't know about reliability, but take a look at
 get_ast in pyautodiff:
 https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py
 It looks up the __file__ attribute and passes that through compile to get
 the ast.  Of course that won't work when you don't have source code (a .pyc
 only module, or when else?)

 Since I'm looking into this kind of solution for the future of my code,
 I'm curious if you think that's too unreliable for some reason?  From a
 usability standpoint, I do think that's better than feeding in strings,
 which:
 * are not syntax highlighted, and
 * require porting code from regular numpy expressions to numexpr strings
 (applying a decorator is so much easier).

 Best,

 Neil

 On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith n...@pobox.com wrote:

 On Apr 27, 2015 1:44 PM, Neil Girdhar mistersh...@gmail.com wrote:
 
  I've always wondered why numexpr accepts strings rather than looking a
 function's source code, using ast to parse it, and then transforming the
 AST.  I just looked at another project, pyautodiff, which does that.  And I
 think numba does that for llvm code generation.  Wouldn't it be nicer to
 just apply a decorator to a function than to write the function as a Python
 string?

 Numba works from byte code, not the ast. There's no way to access the ast
 reliably at runtime in python -- it gets thrown away during compilation.

 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Neil Girdhar

On Mon, Apr 27, 2015 at 7:42 PM, Nathaniel Smith n...@pobox.com wrote:

On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar mistersh...@gmail.com
wrote:
I was told that numba did similar ast parsing, but maybe that's not true.
Regarding the ast, I don't know about reliability, but take a look at
get_ast in pyautodiff:

https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py
It looks up the __file__ attribute and passes that through compile to get
the ast. Of course that won't work when you don't have source code (a
.pyc
only module, or when else?)

Since I'm looking into this kind of solution for the future of my code,
I'm
curious if you think that's too unreliable for some reason?

I'd certainly hesitate to rely on it for anything I cared about or
would be used by a lot of people... it's just intrinsically pretty
hacky. No guarantee that the source code you find via __file__ will
match what was used to compile the function, doesn't work when working
interactively or from the ipython notebook, etc. Or else you have to
trust a decompiler, which is a pretty serious complex chunk of code
just to avoid typing quote marks.

Those are all good points. However, it's more than just typing quote
marks. The code might have non-numpy things mixed in. It might have
context managers and function calls and so on. More comments below.

From a
usability standpoint, I do think that's better than feeding in strings,
which:
* are not syntax highlighted, and
* require porting code from regular numpy expressions to numexpr strings
(applying a decorator is so much easier).

Yes, but then you have to write a program that knows how to port code
from numpy expressions to numexpr strings :-). numexpr only knows a
tiny restricted subset of Python...

The general approach I'd take to solve these kinds of problems would
be similar to that used by Theano or dask -- use regular python source
code that generates an expression graph in memory. E.g. this could
look like

def do_stuff(arr1, arr2):
arr1 = deferred(arr1)
arr2 = deferred(arr2)
arr3 = np.sum(arr1 + (arr2 ** 2))
return force(arr3 / np.sum(arr3))

-n

Right, there are three basic approaches: string processing, AST
processing, and compile-time expression graphs.

The big advantage to AST processing over the other two is that you can
write and test your code as regular numpy code along with regular tests.
Then, with the application of a decorator, you get the speedup you're
looking for. The problem with porting the numpy code to numexpr strings or
Theano-like expression-graphs is that porting can introduce bugs, and even
if you're careful, every time you make a change to the numpy version of the
code, you have port it again.

Also, I personally want to do more than just AST transformations of the
numpy code. For example, I have some methods that call super. The super
calls can be collapsed since the mro is known at compile time.

Best,

Neil

--
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Neil Girdhar

I was told that numba did similar ast parsing, but maybe that's not true.
Regarding the ast, I don't know about reliability, but take a look at
get_ast in pyautodiff:
https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py
It looks up the __file__ attribute and passes that through compile to get
the ast.  Of course that won't work when you don't have source code (a .pyc
only module, or when else?)

Since I'm looking into this kind of solution for the future of my code, I'm
curious if you think that's too unreliable for some reason?  From a
usability standpoint, I do think that's better than feeding in strings,
which:
* are not syntax highlighted, and
* require porting code from regular numpy expressions to numexpr strings
(applying a decorator is so much easier).

Best,

Neil

On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith n...@pobox.com wrote:

 On Apr 27, 2015 1:44 PM, Neil Girdhar mistersh...@gmail.com wrote:
 
  I've always wondered why numexpr accepts strings rather than looking a
 function's source code, using ast to parse it, and then transforming the
 AST.  I just looked at another project, pyautodiff, which does that.  And I
 think numba does that for llvm code generation.  Wouldn't it be nicer to
 just apply a decorator to a function than to write the function as a Python
 string?

 Numba works from byte code, not the ast. There's no way to access the ast
 reliably at runtime in python -- it gets thrown away during compilation.

 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Neil Girdhar

Wow, cool!  Are there any users of this package?

On Mon, Apr 27, 2015 at 9:07 PM, Alexander Belopolsky ndar...@mac.com
wrote:


 On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith n...@pobox.com wrote:

 There's no way to access the ast reliably at runtime in python -- it gets
 thrown away during compilation.


 The meta package supports bytecode to ast translation.  See 
 http://meta.readthedocs.org/en/latest/api/decompile.html.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Neil Girdhar

I don't think I'm asking for so much. Somewhere inside numexpr it builds
an AST of its own, which it converts into the optimized code. It would be
more useful to me if that AST were in the same format as the one returned
by Python's ast module. This way, I could glue in the bits of numexpr that
I like with my code. For my purpose, this would have been the more ideal
design.

On Mon, Apr 27, 2015 at 10:47 PM, Nathaniel Smith n...@pobox.com wrote:

On Apr 27, 2015 5:30 PM, Neil Girdhar mistersh...@gmail.com wrote:

On Mon, Apr 27, 2015 at 7:42 PM, Nathaniel Smith n...@pobox.com wrote:

On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar mistersh...@gmail.com
wrote:
I was told that numba did similar ast parsing, but maybe that's not
true.
Regarding the ast, I don't know about reliability, but take a look at
get_ast in pyautodiff:

https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py
It looks up the __file__ attribute and passes that through compile to
get
the ast. Of course that won't work when you don't have source code
(a .pyc
only module, or when else?)

Since I'm looking into this kind of solution for the future of my
code, I'm
curious if you think that's too unreliable for some reason?

From a
usability standpoint, I do think that's better than feeding in
strings,
which:
* are not syntax highlighted, and
* require porting code from regular numpy expressions to numexpr
strings
(applying a decorator is so much easier).

Yes, but then you have to write a program that knows how to port code
from numpy expressions to numexpr strings :-). numexpr only knows a
tiny restricted subset of Python...

def do_stuff(arr1, arr2):
arr1 = deferred(arr1)
arr2 = deferred(arr2)
arr3 = np.sum(arr1 + (arr2 ** 2))
return force(arr3 / np.sum(arr3))

-n

Right, there are three basic approaches: string processing, AST
processing, and compile-time expression graphs.

If you want something that handles arbitrary python code ('with' etc.),
and produces results identical to cpython (so tests are reliable), except
in cases where it violates the semantics for speed (super), then yeah, you
want a full replacement python implementation, and I agree that the proper
input to a python implementation is .py files :-). That's getting a bit far
afield from numexpr's goals though...

-n

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread Neil Girdhar

On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
  Hi,
 
  snip
 
  So, how about a slight modification of your proposal?
 
  1) Raise deprecation warning for np.outer for non 1D arrays for a few
  versions, with depraction in favor of np.multiply.outer, then
  2) Raise error for np.outer on non 1D arrays
 
 
  I think that was Neil's proposal a bit earlier, too. +1 for it in any
  case, since at least for the moment I doubt outer is used a lot for non
  1-d arrays. Possible step 3) make it work on higher dims after a long
  period.

 sounds ok to me

 Some random comments of what I remember or guess in terms of usage

 I think there are at most very few np.outer usages with 2d or higher
 dimension.
 (statsmodels has two models that switch between 2d and 1d
 parameterization where we don't use outer but it has similar
 characteristics. However, we need to control the ravel order, which
 IIRC is Fortran)

 The current behavior of 0-D scalars in the initial post might be
 useful if a numpy function returns a scalar instead of a 1-D array in
 size=1. np.diag which is a common case, doesn't return a scalar (in my
 version of numpy).

 I don't know any use case where I would ever want to have the 2d
 behavior of np.multiply.outer.


My use case is pretty simple.  Given an input vector x, and a weight matrix
W, and a model y=Wx, I calculate the gradient of the loss L with respect
W.  It is the outer product of x with the vector of gradients dL/dy.  So
the code is simply:

W -= outer(x, dL_by_dy)

Sometimes, I have some x_indices and y_indices.  Now I want to do:

W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices])

Unfortunately, if x_indices or y_indices are int or slice in some way
that removes a dimension, the left side will have fewer dimensions than the
right.  np.multipy.outer does the right thing without the ugly cases:

if isinstance(x_indices, int): … # ugly hacks follow.

I guess we will or would have applications for outer along an axis,
 for example if x.shape = (100, 10), then we have
 x[:,None, :] * x[:, :, None] (I guess)
 Something like this shows up reasonably often in econometrics as
 Outer Product. However in most cases we can avoid constructing this
 matrix and get the final results in a more memory efficient or faster
 way.
 (example an array of covariance matrices)


Not sure I see this.  outer(a, b) should return something that has shape:
(a.shape + b.shape).  If you're doing it along an axis, you mean you're
reshuffling the resulting shape vector?


 Josef




 
  - Sebastian
 
 
  Best,
 
  Matthew
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread Neil Girdhar

On Fri, Apr 17, 2015 at 12:09 PM, josef.p...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
 
 
  On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote:
 
  On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
   On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
   Hi,
  
   snip
  
   So, how about a slight modification of your proposal?
  
   1) Raise deprecation warning for np.outer for non 1D arrays for a few
   versions, with depraction in favor of np.multiply.outer, then
   2) Raise error for np.outer on non 1D arrays
  
  
   I think that was Neil's proposal a bit earlier, too. +1 for it in any
   case, since at least for the moment I doubt outer is used a lot for
 non
   1-d arrays. Possible step 3) make it work on higher dims after a long
   period.
 
  sounds ok to me
 
  Some random comments of what I remember or guess in terms of usage
 
  I think there are at most very few np.outer usages with 2d or higher
  dimension.
  (statsmodels has two models that switch between 2d and 1d
  parameterization where we don't use outer but it has similar
  characteristics. However, we need to control the ravel order, which
  IIRC is Fortran)
 
  The current behavior of 0-D scalars in the initial post might be
  useful if a numpy function returns a scalar instead of a 1-D array in
  size=1. np.diag which is a common case, doesn't return a scalar (in my
  version of numpy).
 
  I don't know any use case where I would ever want to have the 2d
  behavior of np.multiply.outer.
 

 I only understand part of your example, but it looks similar to what
 we are doing in statsmodels.

 
  My use case is pretty simple.  Given an input vector x, and a weight
 matrix
  W, and a model y=Wx, I calculate the gradient of the loss L with respect
 W.
  It is the outer product of x with the vector of gradients dL/dy.  So the
  code is simply:
 
  W -= outer(x, dL_by_dy)

 if you sum/subtract over all the values, isn't this the same as
 np.dot(x, dL_by_dy)


 
  Sometimes, I have some x_indices and y_indices.  Now I want to do:
 
  W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices])
 
  Unfortunately, if x_indices or y_indices are int or slice in some way
 that
  removes a dimension, the left side will have fewer dimensions than the
  right.  np.multipy.outer does the right thing without the ugly cases:
 
  if isinstance(x_indices, int): … # ugly hacks follow.

 My usual hacks are either to use np.atleast_1d or np.atleast_1d or
 np.squeeze if there is shape mismatch in some cases.

 
  I guess we will or would have applications for outer along an axis,
  for example if x.shape = (100, 10), then we have
  x[:,None, :] * x[:, :, None] (I guess)
  Something like this shows up reasonably often in econometrics as
  Outer Product. However in most cases we can avoid constructing this
  matrix and get the final results in a more memory efficient or faster
  way.
  (example an array of covariance matrices)
 
 
  Not sure I see this.  outer(a, b) should return something that has shape:
  (a.shape + b.shape).  If you're doing it along an axis, you mean you're
  reshuffling the resulting shape vector?

 No I'm not reshaping the full tensor product.

 It's a vectorized version of looping over independent outer products

 np.array([outer(xi, yi) for xi,yi in zip(x, y)])
 (which I would never use with outer)

 but I have code that works similar for a reduce (or reduce_at) loop over
 this.


Hmmm… I see what your'e writing.   This doesn't really have a geometrical
meaning as far as I can tell.  You're interpreting the first index of x, y,
and your result, as if it were a list — as if x and y are lists of vectors,
and you want a list of matrices.   That really should be written as a loop
in my opinion.



 Josef


 
 
  Josef
 
 
 
 
  
   - Sebastian
  
  
   Best,
  
   Matthew
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread Neil Girdhar

On Fri, Apr 17, 2015 at 12:09 PM, josef.p...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
 
 
  On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote:
 
  On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
   On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
   Hi,
  
   snip
  
   So, how about a slight modification of your proposal?
  
   1) Raise deprecation warning for np.outer for non 1D arrays for a few
   versions, with depraction in favor of np.multiply.outer, then
   2) Raise error for np.outer on non 1D arrays
  
  
   I think that was Neil's proposal a bit earlier, too. +1 for it in any
   case, since at least for the moment I doubt outer is used a lot for
 non
   1-d arrays. Possible step 3) make it work on higher dims after a long
   period.
 
  sounds ok to me
 
  Some random comments of what I remember or guess in terms of usage
 
  I think there are at most very few np.outer usages with 2d or higher
  dimension.
  (statsmodels has two models that switch between 2d and 1d
  parameterization where we don't use outer but it has similar
  characteristics. However, we need to control the ravel order, which
  IIRC is Fortran)
 
  The current behavior of 0-D scalars in the initial post might be
  useful if a numpy function returns a scalar instead of a 1-D array in
  size=1. np.diag which is a common case, doesn't return a scalar (in my
  version of numpy).
 
  I don't know any use case where I would ever want to have the 2d
  behavior of np.multiply.outer.
 

 I only understand part of your example, but it looks similar to what
 we are doing in statsmodels.

 
  My use case is pretty simple.  Given an input vector x, and a weight
 matrix
  W, and a model y=Wx, I calculate the gradient of the loss L with respect
 W.
  It is the outer product of x with the vector of gradients dL/dy.  So the
  code is simply:
 
  W -= outer(x, dL_by_dy)

 if you sum/subtract over all the values, isn't this the same as
 np.dot(x, dL_by_dy)


What?  Matrix subtraction is element-wise:

In [1]: x = np.array([2,3,4])

In [2]: dL_by_dy = np.array([7,9])

In [5]: W = np.zeros((3, 2))

In [6]: W -= np.outer(x, dL_by_dy)

In [7]: W
Out[7]:
array([[-14., -18.],
   [-21., -27.],
   [-28., -36.]])


  Sometimes, I have some x_indices and y_indices.  Now I want to do:
 
  W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices])
 
  Unfortunately, if x_indices or y_indices are int or slice in some way
 that
  removes a dimension, the left side will have fewer dimensions than the
  right.  np.multipy.outer does the right thing without the ugly cases:
 
  if isinstance(x_indices, int): … # ugly hacks follow.

 My usual hacks are either to use np.atleast_1d or np.atleast_1d or
 np.squeeze if there is shape mismatch in some cases.


Yes, but in this case, the left side is the problem, which has too few
dimensions.  So atleast_1d doesn't work.  I was conditionally squeezing,
but that is extremely ugly.  Especially if you're conditionally squeezing
based on both x_indices and y_indices.



 
  I guess we will or would have applications for outer along an axis,
  for example if x.shape = (100, 10), then we have
  x[:,None, :] * x[:, :, None] (I guess)
  Something like this shows up reasonably often in econometrics as
  Outer Product. However in most cases we can avoid constructing this
  matrix and get the final results in a more memory efficient or faster
  way.
  (example an array of covariance matrices)
 
 
  Not sure I see this.  outer(a, b) should return something that has shape:
  (a.shape + b.shape).  If you're doing it along an axis, you mean you're
  reshuffling the resulting shape vector?

 No I'm not reshaping the full tensor product.

 It's a vectorized version of looping over independent outer products

 np.array([outer(xi, yi) for xi,yi in zip(x, y)])
 (which I would never use with outer)

 but I have code that works similar for a reduce (or reduce_at) loop over
 this.

 Josef


 
 
  Josef
 
 
 
 
  
   - Sebastian
  
  
   Best,
  
   Matthew
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread Neil Girdhar

This relationship between outer an dot only holds for vectors.  For
tensors, and other kinds of vector spaces, I'm not sure if outer products
and dot products have anything to do with each other.

On Fri, Apr 17, 2015 at 11:11 AM, josef.p...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 10:59 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Fr, 2015-04-17 at 10:47 -0400, josef.p...@gmail.com wrote:
  On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
   On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
   Hi,
  
   snip
  
   So, how about a slight modification of your proposal?
  
   1) Raise deprecation warning for np.outer for non 1D arrays for a few
   versions, with depraction in favor of np.multiply.outer, then
   2) Raise error for np.outer on non 1D arrays
  
  
   I think that was Neil's proposal a bit earlier, too. +1 for it in any
   case, since at least for the moment I doubt outer is used a lot for
 non
   1-d arrays. Possible step 3) make it work on higher dims after a long
   period.
 
  sounds ok to me
 
  Some random comments of what I remember or guess in terms of usage
 
  I think there are at most very few np.outer usages with 2d or higher
 dimension.
  (statsmodels has two models that switch between 2d and 1d
  parameterization where we don't use outer but it has similar
  characteristics. However, we need to control the ravel order, which
  IIRC is Fortran)
 
  The current behavior of 0-D scalars in the initial post might be
  useful if a numpy function returns a scalar instead of a 1-D array in
  size=1. np.diag which is a common case, doesn't return a scalar (in my
  version of numpy).
 
  I don't know any use case where I would ever want to have the 2d
  behavior of np.multiply.outer.
  I guess we will or would have applications for outer along an axis,
  for example if x.shape = (100, 10), then we have
  x[:,None, :] * x[:, :, None] (I guess)
  Something like this shows up reasonably often in econometrics as
  Outer Product. However in most cases we can avoid constructing this
  matrix and get the final results in a more memory efficient or faster
  way.
  (example an array of covariance matrices)
 
 
  So basically outer product of stacked vectors (fitting basically into
  how np.linalg functions now work). I think that might be a good idea,
  but even then we first need to do the deprecation and it would be a long
  term project. Or you add np.linalg.outer or such sooner and in the
  longer run it will be an alias to that instead of np.multiple.outer.


 Essentially yes, but I don't have an opinion about location or
 implementation in numpy, nor do I know enough.

 I always considered np.outer conceptually as belonging to linalg that
 provides a more convenient interface than np.dot if both arrays are
 1-D.  (no need to add extra axis and transpose)

 Josef

 
 
  Josef
 
 
 
 
  
   - Sebastian
  
  
   Best,
  
   Matthew
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-16 Thread Neil Girdhar

Would it be possible to deprecate np.outer's usage on non one-dimensional
vectors for a few versions, and then reintroduce it with definition
np.outer == np.multiply.outer?

On Wed, Apr 15, 2015 at 8:02 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith n...@pobox.com wrote:
  On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
  On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Does it work for you to set
 
  outer = np.multiply.outer
 
  ?
 
  It's actually faster on my machine.
 
  I assume it does because np.corrcoeff uses it, and it's the same type
  of use cases.
  However, I'm not using it very often (I prefer broadcasting), but I've
  seen it often enough when reviewing code.
 
  This is mainly to point out that it could be a popular function (that
  maybe shouldn't be deprecated)
 
  https://github.com/search?utf8=%E2%9C%93q=np.outer
  416914
 
  For future reference, that's not the number -- you have to click
  through to Code and then look at a single-language result to get
  anything remotely meaningful. In this case b/c they're different by an
  order of magnitude, and in general because sometimes the top line
  number is completely made up (like it has no relation to the
  per-language numbers on the left and then changes around randomly if
  you simply reload the page).
 
  (So 29,397 is what you want in this case.)
 
  Also that count then tends to have tons of duplicates (e.g. b/c there
  are hundreds of copies of numpy itself on github), so you need a big
  grain of salt when looking at the absolute number, but it can be
  useful, esp. for relative comparisons.

 My mistake, rushing too much.
 github show only 25 code references in numpy itself.

 in quotes, python only  (namespace conscious packages on github)
 (I think github counts modules not instances)

 np.cumsum 11,022
 np.cumprod 1,290
 np.outer 6,838

 statsmodels
 np.cumsum 21
 np.cumprod  2
 np.outer 15

 Josef

 
  -n
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-16 Thread Neil Girdhar

Right.

On Thu, Apr 16, 2015 at 6:44 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Apr 16, 2015 at 6:37 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  I can always put np.outer = np.multiply.outer at the start of my code to
 get
  what I want.  Or could that break things?

 Please don't do this. It means that there are any calls to np.outer in
 libraries you are using (or other libraries that are also used by
 anyone who is using your code), they will silently get
 np.multiply.outer instead of np.outer. And then if this breaks things
 we end up getting extremely confusing bug reports from angry users who
 think we broke np.outer.

 Just do 'outer = np.multiply.outer' and leave the np namespace alone :-)

 -n

 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-16 Thread Neil Girdhar

That sounds good to me.

I can always put np.outer = np.multiply.outer at the start of my code to
get what I want.  Or could that break things?

On Thu, Apr 16, 2015 at 6:28 PM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Apr 16, 2015 at 3:19 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Actually, looking at the docs, numpy.outer is *only* defined for 1-d
  vectors.  Should anyone who used it with multi-dimensional arrays have an
  expectation that it will keep working in the same way?
 
  On Thu, Apr 16, 2015 at 10:53 AM, Neil Girdhar mistersh...@gmail.com
  wrote:
 
  Would it be possible to deprecate np.outer's usage on non
 one-dimensional
  vectors for a few versions, and then reintroduce it with definition
 np.outer
  == np.multiply.outer?

 I think the general idea is that

 a) people often miss deprecation warnings
 b) there is lots of legacy code out there, and
 c) it's very bad if legacy code silently gives different answers in
 newer numpy versions
 d) it's not so bad if newer numpy gives an intelligible error for code
 that used to work.

 So, how about a slight modification of your proposal?

 1) Raise deprecation warning for np.outer for non 1D arrays for a few
 versions, with depraction in favor of np.multiply.outer, then
 2) Raise error for np.outer on non 1D arrays

 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-16 Thread Neil Girdhar

On Thu, Apr 16, 2015 at 6:32 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Apr 16, 2015 at 6:19 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Actually, looking at the docs, numpy.outer is *only* defined for 1-d
  vectors.  Should anyone who used it with multi-dimensional arrays have an
  expectation that it will keep working in the same way?

 Yes. Generally what we do is more important than what we say we do.
 Changing behaviour can break code. Changing docs can change whose
 fault this is, but broken code is still broken code. And if you put
 on your user hat, what do you do when numpy acts weird -- shake your
 fist at the heavens and give up, or sigh and update your code to
 match? It's pretty common for even undocumented behaviour to still be
 depended on.

 Also FWIW, np.outer's docstring says Input is flattened if not
 already 1-dimensional, so we actually did document this.


Ah, yeah, somehow I missed that!


 -n

 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-16 Thread Neil Girdhar

Actually, looking at the docs, numpy.outer is *only* defined for 1-d
vectors.  Should anyone who used it with multi-dimensional arrays have an
expectation that it will keep working in the same way?

On Thu, Apr 16, 2015 at 10:53 AM, Neil Girdhar mistersh...@gmail.com
wrote:

 Would it be possible to deprecate np.outer's usage on non one-dimensional
 vectors for a few versions, and then reintroduce it with definition
 np.outer == np.multiply.outer?

 On Wed, Apr 15, 2015 at 8:02 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith n...@pobox.com wrote:
  On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
  On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Does it work for you to set
 
  outer = np.multiply.outer
 
  ?
 
  It's actually faster on my machine.
 
  I assume it does because np.corrcoeff uses it, and it's the same type
  of use cases.
  However, I'm not using it very often (I prefer broadcasting), but I've
  seen it often enough when reviewing code.
 
  This is mainly to point out that it could be a popular function (that
  maybe shouldn't be deprecated)
 
  https://github.com/search?utf8=%E2%9C%93q=np.outer
  416914
 
  For future reference, that's not the number -- you have to click
  through to Code and then look at a single-language result to get
  anything remotely meaningful. In this case b/c they're different by an
  order of magnitude, and in general because sometimes the top line
  number is completely made up (like it has no relation to the
  per-language numbers on the left and then changes around randomly if
  you simply reload the page).
 
  (So 29,397 is what you want in this case.)
 
  Also that count then tends to have tons of duplicates (e.g. b/c there
  are hundreds of copies of numpy itself on github), so you need a big
  grain of salt when looking at the absolute number, but it can be
  useful, esp. for relative comparisons.

 My mistake, rushing too much.
 github show only 25 code references in numpy itself.

 in quotes, python only  (namespace conscious packages on github)
 (I think github counts modules not instances)

 np.cumsum 11,022
 np.cumprod 1,290
 np.outer 6,838

 statsmodels
 np.cumsum 21
 np.cumprod  2
 np.outer 15

 Josef

 
  -n
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar

You got it.  I remember this from when I worked at Google and we would
process (many many) logs.  With enough bins, the approximation is still
really close.  It's great if you want to make an automatic plot of data.
Calling numpy.partition a hundred times is probably slower than calling P^2
with n=100 bins.  I don't think it does O(n) computations per point.  I
think it's more like O(log(n)).

Best,

Neil

On Wed, Apr 15, 2015 at 10:02 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar mistersh...@gmail.com
 wrote:

 Yeah, I'm not arguing, I'm just curious about your reasoning.  That
 explains why not C++.  Why would you want to do this in C and not Python?


 Well, the algorithm has to iterate over all the inputs, updating the
 estimated percentile positions at every iteration. Because the estimated
 percentiles may change in every iteration, I don't think there is an easy
 way of vectorizing the calculation with numpy. So I think it would be very
 slow if done in Python.

 Looking at this in some more details, how is this typically used? Because
 it gives you approximate values that should split your sample into
 similarly filled bins, but because the values are approximate, to compute a
 proper histogram you would still need to do the binning to get the exact
 results, right? Even with this drawback P-2 does have an algorithmic
 advantage, so for huge inputs and many bins it should come ahead. But for
 many medium sized problems it may be faster to simply use np.partition,
 which gives you the whole thing in a single go. And it would be much
 simpler to implement.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar

I don't understand.  Are you at pycon by any chance?

On Wed, Apr 15, 2015 at 6:12 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
  On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Does it work for you to set
 
  outer = np.multiply.outer
 
  ?
 
  It's actually faster on my machine.
 
  I assume it does because np.corrcoeff uses it, and it's the same type
  of use cases.
  However, I'm not using it very often (I prefer broadcasting), but I've
  seen it often enough when reviewing code.
 
  This is mainly to point out that it could be a popular function (that
  maybe shouldn't be deprecated)
 
  https://github.com/search?utf8=%E2%9C%93q=np.outer
  416914

 After thinking another minute:

 I think it should not be deprecated, it's like toepliz. We can use it
 also to normalize 2d arrays where columns and rows are different not
 symmetric as in the corrcoef case.

 Josef


 
  Josef
 
 
 
  On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:
 
  On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
  wrote:
   Yes, I totally agree.  If I get started on the PR to deprecate
 np.outer,
   maybe I can do it as part of the same PR?
  
   On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg
   sebast...@sipsolutions.net
   wrote:
  
   Just a general thing, if someone has a few minutes, I think it would
   make sense to add the ufunc.reduce thing to all of these functions
 at
   least in the See Also or Notes section in the documentation.
  
   These special attributes are not that well known, and I think that
   might
   be a nice way to make it easier to find.
  
   - Sebastian
  
   On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
I am, yes.
   
On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
wrote:
Ok, I didn't know that.  Are you at pycon by any chance?
   
On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
n...@pobox.com wrote:
On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
mistersh...@gmail.com wrote:
 Yes, I totally agree with you regarding np.sum
 and
np.product, which is why
 I didn't suggest np.add.reduce,
 np.multiply.reduce.
I wasn't sure whether
 cumsum and cumprod might be on the line in your
judgment.
   
Ah, I see. I think we should treat them the same
 for
now -- all the
comments I made apply to a lesser or greater
 extent
(in particular,
cumsum and cumprod both do the thing where they
dispatch to .cumsum()
.cumprod() method).
   
-n
   
--
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
   
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
   
   
   
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
 
 
  I'm just looking at this thread.
 
  I see outer used quite often
 
  corrcoef = cov / np.outer(std, std)
 
  (even I use it sometimes instead of
  cov / std[:,None] / std
 
  Josef
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar

Cool, thanks for looking at this.  P2 might still be better even if the
whole dataset is in memory because of cache misses.  Partition, which I
guess is based on quickselect, is going to run over all of the data as many
times as there are bins roughly, whereas p2 only runs over it once.  From a
cache miss standpoint, I think p2 is better?  Anyway, it might be worth
maybe coding to verify any performance advantages?  Not sure if it should
be in numpy or not since it really should accept an iterable rather than a
numpy vector, right?

Best,

Neil

On Wed, Apr 15, 2015 at 12:40 PM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar mistersh...@gmail.com
 wrote:

 You got it.  I remember this from when I worked at Google and we would
 process (many many) logs.  With enough bins, the approximation is still
 really close.  It's great if you want to make an automatic plot of data.
 Calling numpy.partition a hundred times is probably slower than calling P^2
 with n=100 bins.  I don't think it does O(n) computations per point.  I
 think it's more like O(log(n)).


 Looking at it again, it probably is O(n) after all: it does a binary
 search, which is O(log n), but it then goes on to update all the n bin
 counters and estimations, so O(n) I'm afraid. So there is no algorithmic
 advantage over partition/percentile: if there are m samples and n bins, P-2
 that O(n) m times, while partition does O(m) n times, so both end up being
 O(m n). It seems to me that the big thing of P^2 is not having to hold the
 full dataset in memory. Online statistics (is that the name for this?),
 even if only estimations, is a cool thing, but I am not sure numpy is the
 place for them. That's not to say that we couldn't eventually have P^2
 implemented for histogram, but I would start off with a partition based one.

 Would SciPy have a place for online statistics? Perhaps there's room for
 yet another scikit?

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar

Does it work for you to set

outer = np.multiply.outer

?

It's actually faster on my machine.

On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
  maybe I can do it as part of the same PR?
 
  On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg 
 sebast...@sipsolutions.net
  wrote:
 
  Just a general thing, if someone has a few minutes, I think it would
  make sense to add the ufunc.reduce thing to all of these functions at
  least in the See Also or Notes section in the documentation.
 
  These special attributes are not that well known, and I think that might
  be a nice way to make it easier to find.
 
  - Sebastian
 
  On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
   I am, yes.
  
   On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
   Ok, I didn't know that.  Are you at pycon by any chance?
  
   On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
   n...@pobox.com wrote:
   On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
   mistersh...@gmail.com wrote:
Yes, I totally agree with you regarding np.sum and
   np.product, which is why
I didn't suggest np.add.reduce, np.multiply.reduce.
   I wasn't sure whether
cumsum and cumprod might be on the line in your
   judgment.
  
   Ah, I see. I think we should treat them the same for
   now -- all the
   comments I made apply to a lesser or greater extent
   (in particular,
   cumsum and cumprod both do the thing where they
   dispatch to .cumsum()
   .cumprod() method).
  
   -n
  
   --
   Nathaniel J. Smith -- http://vorpus.org
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 I'm just looking at this thread.

 I see outer used quite often

 corrcoef = cov / np.outer(std, std)

 (even I use it sometimes instead of
 cov / std[:,None] / std

 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar

Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
maybe I can do it as part of the same PR?

On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 Just a general thing, if someone has a few minutes, I think it would
 make sense to add the ufunc.reduce thing to all of these functions at
 least in the See Also or Notes section in the documentation.

 These special attributes are not that well known, and I think that might
 be a nice way to make it easier to find.

 - Sebastian

 On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
  I am, yes.
 
  On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote:
  Ok, I didn't know that.  Are you at pycon by any chance?
 
  On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
  n...@pobox.com wrote:
  On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
  mistersh...@gmail.com wrote:
   Yes, I totally agree with you regarding np.sum and
  np.product, which is why
   I didn't suggest np.add.reduce, np.multiply.reduce.
  I wasn't sure whether
   cumsum and cumprod might be on the line in your
  judgment.
 
  Ah, I see. I think we should treat them the same for
  now -- all the
  comments I made apply to a lesser or greater extent
  (in particular,
  cumsum and cumprod both do the thing where they
  dispatch to .cumsum()
  .cumprod() method).
 
  -n
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar

Yeah, I'm not arguing, I'm just curious about your reasoning.  That
explains why not C++.  Why would you want to do this in C and not Python?

On Wed, Apr 15, 2015 at 1:48 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar mistersh...@gmail.com
 wrote:

 If you're going to C, is there a reason not to go to C++ and include the
 already-written Boost code?  Otherwise, why not use Python?


 I think we have an explicit rule against C++, although I may be wrong. Not
 sure how much of boost we would have to make part of numpy to use that, the
 whole accumulators lib I'm guessing? Seems like an awful lot given what we
 are after.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-14 Thread Neil Girdhar

It also appears that cumsum has a lot of unnecessary overhead over
add.accumulate:

In [51]: %timeit np.add.accumulate(a)
The slowest run took 46.31 times longer than the fastest. This could mean
that an intermediate result is being cached
100 loops, best of 3: 372 ns per loop

In [52]: %timeit np.cum
np.cumprod np.cumproduct  np.cumsum

In [52]: %timeit np.cumsum(a)
The slowest run took 18.44 times longer than the fastest. This could mean
that an intermediate result is being cached
100 loops, best of 3: 912 ns per loop

In [53]: %timeit np.add.accumulate(a.flatten())
The slowest run took 25.59 times longer than the fastest. This could mean
that an intermediate result is being cached
100 loops, best of 3: 834 ns per loop


On Tue, Apr 14, 2015 at 7:42 AM, Neil Girdhar mistersh...@gmail.com wrote:

 Okay, but by the same token, why do we have cumsum?  Isn't it identical to

 np.add.accumulate

 — or if you're passing in multidimensional data —

 np.add.accumulate(a.flatten())

 ?

 add.accumulate feels more generic, would make the other ufunc things more
 discoverable, and is self-documenting.

 Similarly, cumprod is just np.multiply.accumulate.

 Best,

 Neil


 On Sat, Apr 11, 2015 at 12:49 PM, Nathaniel Smith n...@pobox.com wrote:

 Documentation and a call to warnings.warn(DeprecationWarning(...)), I
 guess.

 On Sat, Apr 11, 2015 at 12:39 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  I would be happy to, but I'm not sure what that involves?  It's just a
  documentation changelist?
 
  On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith n...@pobox.com
 wrote:
 
  On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar mistersh...@gmail.com
  wrote:
   On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar mistersh...@gmail.com
 
   wrote:
Numpy's outer product works fine with vectors. However, I seem to
always
want len(outer(a, b).shape) to be equal to len(a.shape) +
len(b.shape).
Wolfram-alpha seems to agree
https://reference.wolfram.com/language/ref/Outer.html with
 respect to
matrix
outer products.
   You're probably right that this is the correct definition of the
 outer
   product in an n-dimensional world. But this seems to go beyond being
   just a bug in handling 0-d arrays (which is the kind of corner case
   we've fixed in the past); np.outer is documented to always ravel its
   inputs to 1d.
   In fact the implementation is literally just:
   a = asarray(a)
   b = asarray(b)
   return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out)
   Sebastian's np.multiply.outer is much more generic and effective.
   Maybe we should just deprecate np.outer? I don't see what use it
   serves. (When and whether it actually got removed after being
   deprecated would depend on how much use it actually gets in real
 code,
   which I certainly don't know while typing a quick email. But we
 could
   start telling people not to use it any time.)
  
  
   +1 with everything you said.
 
  Want to write a PR? :-)
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 



 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-14 Thread Neil Girdhar

Ok, I didn't know that.  Are you at pycon by any chance?

On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Yes, I totally agree with you regarding np.sum and np.product, which is
 why
  I didn't suggest np.add.reduce, np.multiply.reduce.  I wasn't sure
 whether
  cumsum and cumprod might be on the line in your judgment.

 Ah, I see. I think we should treat them the same for now -- all the
 comments I made apply to a lesser or greater extent (in particular,
 cumsum and cumprod both do the thing where they dispatch to .cumsum()
 .cumprod() method).

 -n

 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar

If you're going to C, is there a reason not to go to C++ and include the
already-written Boost code?  Otherwise, why not use Python?

On Tue, Apr 14, 2015 at 7:24 PM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Can I suggest that we instead add the P-square algorithm for the dynamic
  calculation of histograms?
  (
 http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
 )
 
  This is already implemented in C++'s boost library
  (
 http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
 )
 
  I implemented it in Boost Python as a module, which I'm happy to share.
  This is much better than fixed-width histograms in practice.  Rather
 than
  adjusting the number of bins, it adjusts what you really want, which is
 the
  resolution of the bins throughout the domain.

 This definitely sounds like a useful thing to have in numpy or scipy
 (though if it's possible to do without using Boost/C++ that would be
 nice). But yeah, we should leave the existing histogram alone (in this
 regard) and add a new name for this like adaptive_histogram or
 something. Then you can set about convincing matplotlib and friends to
 use it by default :-)


 Would having a negative number of bins mean this many, but with optimized
 boundaries be too clever an interface?

 I have taken a look at the paper linked, and the P-2 algorithm would not
 be too complicated to implement from scratch, although it would require
 writing some C code I'm afraid.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar

By the way, the p^2 algorithm still needs to know how many bins you want.
It just adapts the endpoints of the bins. I like adaptive=True. However,
you will have to find a way to return both the bins and and their
calculated endpoints.

The P^2 algorithm can also give approximate answers to numpy.percentile,
numpy.median. How approximate they are depends on the number of bins you
let it keep track of. I believe the authors bound the error as a function
of number of points and bins.

On Tue, Apr 14, 2015 at 10:00 PM, Paul Hobson pmhob...@gmail.com wrote:

On Tue, Apr 14, 2015 at 4:24 PM, Jaime Fernández del Río
jaime.f...@gmail.com wrote:

On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith n...@pobox.com wrote:

On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar mistersh...@gmail.com
wrote:
Can I suggest that we instead add the P-square algorithm for the
dynamic
calculation of histograms?
(
http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
)

This is already implemented in C++'s boost library
(
http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
)

I implemented it in Boost Python as a module, which I'm happy to share.
This is much better than fixed-width histograms in practice. Rather
than
adjusting the number of bins, it adjusts what you really want, which
is the
resolution of the bins throughout the domain.

This definitely sounds like a useful thing to have in numpy or scipy
(though if it's possible to do without using Boost/C++ that would be
nice). But yeah, we should leave the existing histogram alone (in this
regard) and add a new name for this like adaptive_histogram or
something. Then you can set about convincing matplotlib and friends to
use it by default :-)

Would having a negative number of bins mean this many, but with
optimized boundaries be too clever an interface?

As a user, I think so. Wouldn't np.histogram(..., adaptive=True) do well
enough?
-p

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-14 Thread Neil Girdhar

Okay, but by the same token, why do we have cumsum?  Isn't it identical to

np.add.accumulate

— or if you're passing in multidimensional data —

np.add.accumulate(a.flatten())

?

add.accumulate feels more generic, would make the other ufunc things more
discoverable, and is self-documenting.

Similarly, cumprod is just np.multiply.accumulate.

Best,

Neil


On Sat, Apr 11, 2015 at 12:49 PM, Nathaniel Smith n...@pobox.com wrote:

 Documentation and a call to warnings.warn(DeprecationWarning(...)), I
 guess.

 On Sat, Apr 11, 2015 at 12:39 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  I would be happy to, but I'm not sure what that involves?  It's just a
  documentation changelist?
 
  On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith n...@pobox.com wrote:
 
  On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar mistersh...@gmail.com
  wrote:
   On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar mistersh...@gmail.com
   wrote:
Numpy's outer product works fine with vectors. However, I seem to
always
want len(outer(a, b).shape) to be equal to len(a.shape) +
len(b.shape).
Wolfram-alpha seems to agree
https://reference.wolfram.com/language/ref/Outer.html with
 respect to
matrix
outer products.
   You're probably right that this is the correct definition of the
 outer
   product in an n-dimensional world. But this seems to go beyond being
   just a bug in handling 0-d arrays (which is the kind of corner case
   we've fixed in the past); np.outer is documented to always ravel its
   inputs to 1d.
   In fact the implementation is literally just:
   a = asarray(a)
   b = asarray(b)
   return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out)
   Sebastian's np.multiply.outer is much more generic and effective.
   Maybe we should just deprecate np.outer? I don't see what use it
   serves. (When and whether it actually got removed after being
   deprecated would depend on how much use it actually gets in real
 code,
   which I certainly don't know while typing a quick email. But we could
   start telling people not to use it any time.)
  
  
   +1 with everything you said.
 
  Want to write a PR? :-)
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 



 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-14 Thread Neil Girdhar

Yes, I totally agree with you regarding np.sum and np.product, which is why
I didn't suggest np.add.reduce, np.multiply.reduce.  I wasn't sure whether
cumsum and cumprod might be on the line in your judgment.

Best,

Neil

On Tue, Apr 14, 2015 at 3:37 PM, Nathaniel Smith n...@pobox.com wrote:

 On Apr 14, 2015 2:48 PM, Neil Girdhar mistersh...@gmail.com wrote:
 
  Okay, but by the same token, why do we have cumsum?  Isn't it identical
 to
 
  np.add.accumulate
 
  — or if you're passing in multidimensional data —
 
  np.add.accumulate(a.flatten())
 
  ?
 
  add.accumulate feels more generic, would make the other ufunc things
 more discoverable, and is self-documenting.
 
  Similarly, cumprod is just np.multiply.accumulate.

 Yeah, but these do have several differences than np.outer:

 - they get used much more
 - their definitions are less obviously broken (cumsum has no obvious
 definition for an n-d array so you have to pick one; outer does have an
 obvious definition and np.outer got it wrong)
 - they're more familiar from other systems (R, MATLAB)
 - they allow for special dispatch rules (e.g. np.sum(a) will try calling
 a.sum() before it tries coercing a to an ndarray, so e.g. on np.ma
 objects np.sum works and np.add.accumulate doesn't. Eventually this will
 perhaps be obviated by __numpy_ufunc__, but that is still some ways off.)

 So the situation is much less clear cut.

 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar

Can I suggest that we instead add the P-square algorithm for the dynamic
calculation of histograms? (
http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
)

This is already implemented in C++'s boost library (
http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
)

I implemented it in Boost Python as a module, which I'm happy to share.
This is much better than fixed-width histograms in practice. Rather than
adjusting the number of bins, it adjusts what you really want, which is the
resolution of the bins throughout the domain.

Best,

Neil

On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers ralf.gomm...@gmail.com
wrote:

On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río
jaime.f...@gmail.com wrote:

On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote:

http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
tistics/A
http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A
utomating%20Binwidth%20Choice%20for%20Histogram.ipynb

Long story short, histogram visualisations that depend on numpy (such as
matplotlib, or nearly all of them) have poor default behaviour as I
have to
constantly play around with the number of bins to get a good idea of
what I'm
looking at. The bins=10 works ok for up to 1000 points or very normal
data,
but has poor performance for anything else, and doesn't account for
variability either. I don't have a method easily available to scale the
number
of bins given the data.

R doesn't suffer from these problems and provides methods for use with
it's
hist method. I would like to provide similar functionality for
matplotlib, to
at least provide some kind of good starting point, as histograms are
very
useful for initial data discovery.

The notebook above provides an explanation of the problem as well as some
proposed alternatives. Use different datasets (type and size) to see the
performance of the suggestions. All of the methods proposed exist in R
and
literature.

I've put together an implementation to add this new functionality, but am
hesitant to make a pull request as I would like some feedback from a
maintainer before doing so.

+1 on the PR.

+1 as well.

Unfortunately we can't change the default of 10, but a number of string
methods, with a bins=auto or some such name prominently recommended in
the docstring, would be very good to have.

Ralf

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Neil Girdhar

Yes, you're right. Although in practice, people almost always want
adaptive bins.

On Tue, Apr 14, 2015 at 5:08 PM, Chris Barker chris.bar...@noaa.gov wrote:

On Mon, Apr 13, 2015 at 5:02 AM, Neil Girdhar mistersh...@gmail.com
wrote:

This look slike a great thing to have in numpy. However, I suspect that a
lot of the downstream code that uses histogram expects equally-spaced bins.

So this should probably be a in addition to, rather than an instead of

-CHB

This is already implemented in C++'s boost library (
http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
)

I implemented it in Boost Python as a module, which I'm happy to share.
This is much better than fixed-width histograms in practice. Rather than
adjusting the number of bins, it adjusts what you really want, which is the
resolution of the bins throughout the domain.

Best,

Neil

On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers ralf.gomm...@gmail.com
wrote:

On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río
jaime.f...@gmail.com wrote:

On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote:

Long story short, histogram visualisations that depend on numpy (such
as
matplotlib, or nearly all of them) have poor default behaviour as I
have to
constantly play around with the number of bins to get a good idea of
what I'm
looking at. The bins=10 works ok for up to 1000 points or very normal
data,
but has poor performance for anything else, and doesn't account for
variability either. I don't have a method easily available to scale
the number
of bins given the data.

The notebook above provides an explanation of the problem as well as
some
proposed alternatives. Use different datasets (type and size) to see
the
performance of the suggestions. All of the methods proposed exist in
R and
literature.

I've put together an implementation to add this new functionality, but
am
hesitant to make a pull request as I would like some feedback from a
maintainer before doing so.

+1 on the PR.

+1 as well.

Unfortunately we can't change the default of 10, but a number of string
methods, with a bins=auto or some such name prominently recommended in
the docstring, would be very good to have.

Ralf

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

chris.bar...@noaa.gov

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Bug in 1.9?

2014-10-22 Thread Neil Girdhar

Hello,

Is this desired behaviour or a regression or a bug?

http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray

Thanks,

Neil
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] A context manager for print options

2013-10-27 Thread Neil Girdhar

Why not replace get_printoptions/set_printoptions with a context manager
accessed using numpy.printoptions in the same way that numpy.errstate
exposes a context manager to seterr/geterr?  This would make the set method
redundant.

Also, the context manager returned by numpy.errstate, numpy.printoptions,
etc. could expose the dictionary directly.  This would make the get methods
redundant.

Best,

Neil
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Testing

2013-10-27 Thread Neil Girdhar

How do I test a patch that I've made locally?  I can't seem to import numpy
locally:

Error importing numpy: you should not try to import numpy from
its source directory; please exit the numpy source tree, and
relaunch
your python intepreter from there.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Testing

2013-10-27 Thread Neil Girdhar

Ah, sorry, didn't see that I can do that from runtests!!  Thanks!!


On Sun, Oct 27, 2013 at 7:13 PM, Neil Girdhar mistersh...@gmail.com wrote:

 Since I am trying to add a printoptions context manager, I would like to
 test it.  Should I add tests, or can I somehow use it from an ipython shell?


 On Sun, Oct 27, 2013 at 7:12 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:




 On Sun, Oct 27, 2013 at 4:59 PM, Neil Girdhar mistersh...@gmail.comwrote:

 How do I test a patch that I've made locally?  I can't seem to import
 numpy locally:

 Error importing numpy: you should not try to import numpy from
 its source directory; please exit the numpy source tree, and
 relaunch
 your python intepreter from there.



 If you are running current master do

 python runtests.py --help

 Chuck



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Testing

2013-10-27 Thread Neil Girdhar

Since I am trying to add a printoptions context manager, I would like to
test it.  Should I add tests, or can I somehow use it from an ipython shell?


On Sun, Oct 27, 2013 at 7:12 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:




 On Sun, Oct 27, 2013 at 4:59 PM, Neil Girdhar mistersh...@gmail.comwrote:

 How do I test a patch that I've made locally?  I can't seem to import
 numpy locally:

 Error importing numpy: you should not try to import numpy from
 its source directory; please exit the numpy source tree, and
 relaunch
 your python intepreter from there.



 If you are running current master do

 python runtests.py --help

 Chuck



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Code review request: PrintOptions

2013-10-27 Thread Neil Girdhar

This is my first code review request, so I may have done some things wrong.
 I think the following URL should work?
https://github.com/MisterSheik/numpy/compare

Best,

Neil
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Code review request: PrintOptions

2013-10-27 Thread Neil Girdhar

Yeah, I realized that I missed that and figured it wouldn't matter since it
was my own master and I don't plan on making other changes to numpy.  If
you don't mind, how do I move my changelist into a branch?  I'm really
worried I'm going to lose my changes.


On Sun, Oct 27, 2013 at 9:38 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:




 On Sun, Oct 27, 2013 at 7:23 PM, Neil Girdhar mistersh...@gmail.comwrote:

 This is my first code review request, so I may have done some things
 wrong.  I think the following URL should work?
 https://github.com/MisterSheik/numpy/compare

 The first thing to do is make a new branch for your work. Probably the
 easiest way from where you are is to make the branch, which will have your
 changes in it, then go back to master and git reset --hard to the last
 commit before your work. Working in master is a big no-no. See
 `doc/source/dev/gitwash/development_workflow.rst`. When you are ready, make
 a PR for that branch. The code will get reviewed at that point.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Code review request: PrintOptions

2013-10-27 Thread Neil Girdhar

Is this what I want?  https://github.com/numpy/numpy/pull/3987


On Sun, Oct 27, 2013 at 9:42 PM, Neil Girdhar mistersh...@gmail.com wrote:

 Yeah, I realized that I missed that and figured it wouldn't matter since
 it was my own master and I don't plan on making other changes to numpy.  If
 you don't mind, how do I move my changelist into a branch?  I'm really
 worried I'm going to lose my changes.


 On Sun, Oct 27, 2013 at 9:38 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:




 On Sun, Oct 27, 2013 at 7:23 PM, Neil Girdhar mistersh...@gmail.comwrote:

 This is my first code review request, so I may have done some things
 wrong.  I think the following URL should work?
 https://github.com/MisterSheik/numpy/compare

 The first thing to do is make a new branch for your work. Probably the
 easiest way from where you are is to make the branch, which will have your
 changes in it, then go back to master and git reset --hard to the last
 commit before your work. Working in master is a big no-no. See
 `doc/source/dev/gitwash/development_workflow.rst`. When you are ready, make
 a PR for that branch. The code will get reviewed at that point.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

39 matches

Mail list logo