Re: [Numpy-discussion] Should ndarray subclasses support the keepdims arg?
Maybe they should have written their code with **kwargs that consumes all keyword arguments rather than assuming that no keyword arguments would be added? The problem with this approach in general is that it makes writing code unnecessarily convoluted. On Tue, May 5, 2015 at 1:55 PM, Nathaniel Smith n...@pobox.com wrote: AFAICT the only real solution here is for np.sum and friends to propagate the keepdims argument if and only if it was explicitly passed to them (or maybe the slightly different, if and only if it has a non-default value). If we just started requiring code to handle it and passing it unconditionally, then as soon as someone upgraded numpy all their existing code might break for no good reason. On May 5, 2015 8:13 AM, Allan Haldane allanhald...@gmail.com wrote: Hello all, A question: Many ndarray methods (eg sum, mean, any, min) have a keepdims keyword argument, but ndarray subclass methods sometimes don't. The 'matrix' subclass doesn't, and numpy functions like 'np.sum' intentionally drop/ignore the keepdims argument when called with an ndarray subclass as first argument. This means you can't always use ndarray subclasses as 'drop in' replacement for ndarrays if the code uses keepdims (even indirectly), and it means code that deals with keepdims (eg np.sum and more) has to detect ndarray subclasses and drop keepdims even if the subclass supports it (since there is no good way to detect support). It seems to me that if we are going to use inheritance, subclass methods should keep the signature of the parent class methods. What does the list think? Details: This problem comes up in a PR I'm working on (#5706) to add the keepdims arg to masked array methods. In order to support masked matrices (which a lot of unit tests check), I would have to detect and drop the keepdims arg to avoid an exception. This would be solved if the matrix class supported keepdims (plus an update to np.sum). Similarly, `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it could work if all subclasses supported keepdims. I do not foresee immediate problems with adding keepdims to the matrix methods, except that it would be an unused argument. Modifying `np.sum` to always pass on the keepdims arg is trickier, since it would break any code that tried to np.sum a subclass that doesn't support keepdims, eg pandas.DataFrame. **kwargs tricks might work. But if it's permissible I think it would be better to require subclasses to support all the keyword args ndarray supports. Allan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: numexpr 2.4.3 released
Sorry for the late reply. I will definitely consider submitting a pull request to numexpr if it's the direction I decide to go. Right now I'm still evaluating all of the many options for my project. I am implementing a machine learning algorithm as part of my thesis work. I'm in the make it work, but quickly approaching the make it fast part. With research, you usually want to iterate quickly, and so whatever solution I choose has to be automated. I can't be coding things in an intuitive, natural way, and then porting it to a different implementation to make it fast. What I want is for that conversion to be automated. I'm still evaluating how to best achieve that. On Tue, Apr 28, 2015 at 6:08 AM, Francesc Alted fal...@gmail.com wrote: 2015-04-28 4:59 GMT+02:00 Neil Girdhar mistersh...@gmail.com: I don't think I'm asking for so much. Somewhere inside numexpr it builds an AST of its own, which it converts into the optimized code. It would be more useful to me if that AST were in the same format as the one returned by Python's ast module. This way, I could glue in the bits of numexpr that I like with my code. For my purpose, this would have been the more ideal design. I don't think implementing this for numexpr would be that complex. So for example, one could add a new numexpr.eval_ast(ast_expr) function. Pull requests are welcome. At any rate, which is your use case? I am curious. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: numexpr 2.4.3 released
I've always wondered why numexpr accepts strings rather than looking a function's source code, using ast to parse it, and then transforming the AST. I just looked at another project, pyautodiff, which does that. And I think numba does that for llvm code generation. Wouldn't it be nicer to just apply a decorator to a function than to write the function as a Python string? On Mon, Apr 27, 2015 at 11:50 AM, Francesc Alted fal...@gmail.com wrote: Announcing Numexpr 2.4.3 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like 3*a+4*b) are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == This is a maintenance release to cope with an old bug affecting comparisons with empty strings. Fixes #121 and PyTables #184. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/wiki/Release-Notes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: numexpr 2.4.3 released
Also, FYI: http://numba.pydata.org/numba-doc/0.6/doc/modules/transforms.html It appears that numba does get the ast similar to pyautodiff and only get the ast from source code as a fallback? On Mon, Apr 27, 2015 at 7:23 PM, Neil Girdhar mistersh...@gmail.com wrote: I was told that numba did similar ast parsing, but maybe that's not true. Regarding the ast, I don't know about reliability, but take a look at get_ast in pyautodiff: https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py It looks up the __file__ attribute and passes that through compile to get the ast. Of course that won't work when you don't have source code (a .pyc only module, or when else?) Since I'm looking into this kind of solution for the future of my code, I'm curious if you think that's too unreliable for some reason? From a usability standpoint, I do think that's better than feeding in strings, which: * are not syntax highlighted, and * require porting code from regular numpy expressions to numexpr strings (applying a decorator is so much easier). Best, Neil On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith n...@pobox.com wrote: On Apr 27, 2015 1:44 PM, Neil Girdhar mistersh...@gmail.com wrote: I've always wondered why numexpr accepts strings rather than looking a function's source code, using ast to parse it, and then transforming the AST. I just looked at another project, pyautodiff, which does that. And I think numba does that for llvm code generation. Wouldn't it be nicer to just apply a decorator to a function than to write the function as a Python string? Numba works from byte code, not the ast. There's no way to access the ast reliably at runtime in python -- it gets thrown away during compilation. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: numexpr 2.4.3 released
On Mon, Apr 27, 2015 at 7:42 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar mistersh...@gmail.com wrote: I was told that numba did similar ast parsing, but maybe that's not true. Regarding the ast, I don't know about reliability, but take a look at get_ast in pyautodiff: https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py It looks up the __file__ attribute and passes that through compile to get the ast. Of course that won't work when you don't have source code (a .pyc only module, or when else?) Since I'm looking into this kind of solution for the future of my code, I'm curious if you think that's too unreliable for some reason? I'd certainly hesitate to rely on it for anything I cared about or would be used by a lot of people... it's just intrinsically pretty hacky. No guarantee that the source code you find via __file__ will match what was used to compile the function, doesn't work when working interactively or from the ipython notebook, etc. Or else you have to trust a decompiler, which is a pretty serious complex chunk of code just to avoid typing quote marks. Those are all good points. However, it's more than just typing quote marks. The code might have non-numpy things mixed in. It might have context managers and function calls and so on. More comments below. From a usability standpoint, I do think that's better than feeding in strings, which: * are not syntax highlighted, and * require porting code from regular numpy expressions to numexpr strings (applying a decorator is so much easier). Yes, but then you have to write a program that knows how to port code from numpy expressions to numexpr strings :-). numexpr only knows a tiny restricted subset of Python... The general approach I'd take to solve these kinds of problems would be similar to that used by Theano or dask -- use regular python source code that generates an expression graph in memory. E.g. this could look like def do_stuff(arr1, arr2): arr1 = deferred(arr1) arr2 = deferred(arr2) arr3 = np.sum(arr1 + (arr2 ** 2)) return force(arr3 / np.sum(arr3)) -n Right, there are three basic approaches: string processing, AST processing, and compile-time expression graphs. The big advantage to AST processing over the other two is that you can write and test your code as regular numpy code along with regular tests. Then, with the application of a decorator, you get the speedup you're looking for. The problem with porting the numpy code to numexpr strings or Theano-like expression-graphs is that porting can introduce bugs, and even if you're careful, every time you make a change to the numpy version of the code, you have port it again. Also, I personally want to do more than just AST transformations of the numpy code. For example, I have some methods that call super. The super calls can be collapsed since the mro is known at compile time. Best, Neil -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: numexpr 2.4.3 released
I was told that numba did similar ast parsing, but maybe that's not true. Regarding the ast, I don't know about reliability, but take a look at get_ast in pyautodiff: https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py It looks up the __file__ attribute and passes that through compile to get the ast. Of course that won't work when you don't have source code (a .pyc only module, or when else?) Since I'm looking into this kind of solution for the future of my code, I'm curious if you think that's too unreliable for some reason? From a usability standpoint, I do think that's better than feeding in strings, which: * are not syntax highlighted, and * require porting code from regular numpy expressions to numexpr strings (applying a decorator is so much easier). Best, Neil On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith n...@pobox.com wrote: On Apr 27, 2015 1:44 PM, Neil Girdhar mistersh...@gmail.com wrote: I've always wondered why numexpr accepts strings rather than looking a function's source code, using ast to parse it, and then transforming the AST. I just looked at another project, pyautodiff, which does that. And I think numba does that for llvm code generation. Wouldn't it be nicer to just apply a decorator to a function than to write the function as a Python string? Numba works from byte code, not the ast. There's no way to access the ast reliably at runtime in python -- it gets thrown away during compilation. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: numexpr 2.4.3 released
Wow, cool! Are there any users of this package? On Mon, Apr 27, 2015 at 9:07 PM, Alexander Belopolsky ndar...@mac.com wrote: On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith n...@pobox.com wrote: There's no way to access the ast reliably at runtime in python -- it gets thrown away during compilation. The meta package supports bytecode to ast translation. See http://meta.readthedocs.org/en/latest/api/decompile.html. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: numexpr 2.4.3 released
I don't think I'm asking for so much. Somewhere inside numexpr it builds an AST of its own, which it converts into the optimized code. It would be more useful to me if that AST were in the same format as the one returned by Python's ast module. This way, I could glue in the bits of numexpr that I like with my code. For my purpose, this would have been the more ideal design. On Mon, Apr 27, 2015 at 10:47 PM, Nathaniel Smith n...@pobox.com wrote: On Apr 27, 2015 5:30 PM, Neil Girdhar mistersh...@gmail.com wrote: On Mon, Apr 27, 2015 at 7:42 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar mistersh...@gmail.com wrote: I was told that numba did similar ast parsing, but maybe that's not true. Regarding the ast, I don't know about reliability, but take a look at get_ast in pyautodiff: https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py It looks up the __file__ attribute and passes that through compile to get the ast. Of course that won't work when you don't have source code (a .pyc only module, or when else?) Since I'm looking into this kind of solution for the future of my code, I'm curious if you think that's too unreliable for some reason? I'd certainly hesitate to rely on it for anything I cared about or would be used by a lot of people... it's just intrinsically pretty hacky. No guarantee that the source code you find via __file__ will match what was used to compile the function, doesn't work when working interactively or from the ipython notebook, etc. Or else you have to trust a decompiler, which is a pretty serious complex chunk of code just to avoid typing quote marks. Those are all good points. However, it's more than just typing quote marks. The code might have non-numpy things mixed in. It might have context managers and function calls and so on. More comments below. From a usability standpoint, I do think that's better than feeding in strings, which: * are not syntax highlighted, and * require porting code from regular numpy expressions to numexpr strings (applying a decorator is so much easier). Yes, but then you have to write a program that knows how to port code from numpy expressions to numexpr strings :-). numexpr only knows a tiny restricted subset of Python... The general approach I'd take to solve these kinds of problems would be similar to that used by Theano or dask -- use regular python source code that generates an expression graph in memory. E.g. this could look like def do_stuff(arr1, arr2): arr1 = deferred(arr1) arr2 = deferred(arr2) arr3 = np.sum(arr1 + (arr2 ** 2)) return force(arr3 / np.sum(arr3)) -n Right, there are three basic approaches: string processing, AST processing, and compile-time expression graphs. The big advantage to AST processing over the other two is that you can write and test your code as regular numpy code along with regular tests. Then, with the application of a decorator, you get the speedup you're looking for. The problem with porting the numpy code to numexpr strings or Theano-like expression-graphs is that porting can introduce bugs, and even if you're careful, every time you make a change to the numpy version of the code, you have port it again. Also, I personally want to do more than just AST transformations of the numpy code. For example, I have some methods that call super. The super calls can be collapsed since the mro is known at compile time. If you want something that handles arbitrary python code ('with' etc.), and produces results identical to cpython (so tests are reliable), except in cases where it violates the semantics for speed (super), then yeah, you want a full replacement python implementation, and I agree that the proper input to a python implementation is .py files :-). That's getting a bit far afield from numexpr's goals though... -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: Hi, snip So, how about a slight modification of your proposal? 1) Raise deprecation warning for np.outer for non 1D arrays for a few versions, with depraction in favor of np.multiply.outer, then 2) Raise error for np.outer on non 1D arrays I think that was Neil's proposal a bit earlier, too. +1 for it in any case, since at least for the moment I doubt outer is used a lot for non 1-d arrays. Possible step 3) make it work on higher dims after a long period. sounds ok to me Some random comments of what I remember or guess in terms of usage I think there are at most very few np.outer usages with 2d or higher dimension. (statsmodels has two models that switch between 2d and 1d parameterization where we don't use outer but it has similar characteristics. However, we need to control the ravel order, which IIRC is Fortran) The current behavior of 0-D scalars in the initial post might be useful if a numpy function returns a scalar instead of a 1-D array in size=1. np.diag which is a common case, doesn't return a scalar (in my version of numpy). I don't know any use case where I would ever want to have the 2d behavior of np.multiply.outer. My use case is pretty simple. Given an input vector x, and a weight matrix W, and a model y=Wx, I calculate the gradient of the loss L with respect W. It is the outer product of x with the vector of gradients dL/dy. So the code is simply: W -= outer(x, dL_by_dy) Sometimes, I have some x_indices and y_indices. Now I want to do: W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) Unfortunately, if x_indices or y_indices are int or slice in some way that removes a dimension, the left side will have fewer dimensions than the right. np.multipy.outer does the right thing without the ugly cases: if isinstance(x_indices, int): … # ugly hacks follow. I guess we will or would have applications for outer along an axis, for example if x.shape = (100, 10), then we have x[:,None, :] * x[:, :, None] (I guess) Something like this shows up reasonably often in econometrics as Outer Product. However in most cases we can avoid constructing this matrix and get the final results in a more memory efficient or faster way. (example an array of covariance matrices) Not sure I see this. outer(a, b) should return something that has shape: (a.shape + b.shape). If you're doing it along an axis, you mean you're reshuffling the resulting shape vector? Josef - Sebastian Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
On Fri, Apr 17, 2015 at 12:09 PM, josef.p...@gmail.com wrote: On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar mistersh...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: Hi, snip So, how about a slight modification of your proposal? 1) Raise deprecation warning for np.outer for non 1D arrays for a few versions, with depraction in favor of np.multiply.outer, then 2) Raise error for np.outer on non 1D arrays I think that was Neil's proposal a bit earlier, too. +1 for it in any case, since at least for the moment I doubt outer is used a lot for non 1-d arrays. Possible step 3) make it work on higher dims after a long period. sounds ok to me Some random comments of what I remember or guess in terms of usage I think there are at most very few np.outer usages with 2d or higher dimension. (statsmodels has two models that switch between 2d and 1d parameterization where we don't use outer but it has similar characteristics. However, we need to control the ravel order, which IIRC is Fortran) The current behavior of 0-D scalars in the initial post might be useful if a numpy function returns a scalar instead of a 1-D array in size=1. np.diag which is a common case, doesn't return a scalar (in my version of numpy). I don't know any use case where I would ever want to have the 2d behavior of np.multiply.outer. I only understand part of your example, but it looks similar to what we are doing in statsmodels. My use case is pretty simple. Given an input vector x, and a weight matrix W, and a model y=Wx, I calculate the gradient of the loss L with respect W. It is the outer product of x with the vector of gradients dL/dy. So the code is simply: W -= outer(x, dL_by_dy) if you sum/subtract over all the values, isn't this the same as np.dot(x, dL_by_dy) Sometimes, I have some x_indices and y_indices. Now I want to do: W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) Unfortunately, if x_indices or y_indices are int or slice in some way that removes a dimension, the left side will have fewer dimensions than the right. np.multipy.outer does the right thing without the ugly cases: if isinstance(x_indices, int): … # ugly hacks follow. My usual hacks are either to use np.atleast_1d or np.atleast_1d or np.squeeze if there is shape mismatch in some cases. I guess we will or would have applications for outer along an axis, for example if x.shape = (100, 10), then we have x[:,None, :] * x[:, :, None] (I guess) Something like this shows up reasonably often in econometrics as Outer Product. However in most cases we can avoid constructing this matrix and get the final results in a more memory efficient or faster way. (example an array of covariance matrices) Not sure I see this. outer(a, b) should return something that has shape: (a.shape + b.shape). If you're doing it along an axis, you mean you're reshuffling the resulting shape vector? No I'm not reshaping the full tensor product. It's a vectorized version of looping over independent outer products np.array([outer(xi, yi) for xi,yi in zip(x, y)]) (which I would never use with outer) but I have code that works similar for a reduce (or reduce_at) loop over this. Hmmm… I see what your'e writing. This doesn't really have a geometrical meaning as far as I can tell. You're interpreting the first index of x, y, and your result, as if it were a list — as if x and y are lists of vectors, and you want a list of matrices. That really should be written as a loop in my opinion. Josef Josef - Sebastian Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
On Fri, Apr 17, 2015 at 12:09 PM, josef.p...@gmail.com wrote: On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar mistersh...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: Hi, snip So, how about a slight modification of your proposal? 1) Raise deprecation warning for np.outer for non 1D arrays for a few versions, with depraction in favor of np.multiply.outer, then 2) Raise error for np.outer on non 1D arrays I think that was Neil's proposal a bit earlier, too. +1 for it in any case, since at least for the moment I doubt outer is used a lot for non 1-d arrays. Possible step 3) make it work on higher dims after a long period. sounds ok to me Some random comments of what I remember or guess in terms of usage I think there are at most very few np.outer usages with 2d or higher dimension. (statsmodels has two models that switch between 2d and 1d parameterization where we don't use outer but it has similar characteristics. However, we need to control the ravel order, which IIRC is Fortran) The current behavior of 0-D scalars in the initial post might be useful if a numpy function returns a scalar instead of a 1-D array in size=1. np.diag which is a common case, doesn't return a scalar (in my version of numpy). I don't know any use case where I would ever want to have the 2d behavior of np.multiply.outer. I only understand part of your example, but it looks similar to what we are doing in statsmodels. My use case is pretty simple. Given an input vector x, and a weight matrix W, and a model y=Wx, I calculate the gradient of the loss L with respect W. It is the outer product of x with the vector of gradients dL/dy. So the code is simply: W -= outer(x, dL_by_dy) if you sum/subtract over all the values, isn't this the same as np.dot(x, dL_by_dy) What? Matrix subtraction is element-wise: In [1]: x = np.array([2,3,4]) In [2]: dL_by_dy = np.array([7,9]) In [5]: W = np.zeros((3, 2)) In [6]: W -= np.outer(x, dL_by_dy) In [7]: W Out[7]: array([[-14., -18.], [-21., -27.], [-28., -36.]]) Sometimes, I have some x_indices and y_indices. Now I want to do: W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) Unfortunately, if x_indices or y_indices are int or slice in some way that removes a dimension, the left side will have fewer dimensions than the right. np.multipy.outer does the right thing without the ugly cases: if isinstance(x_indices, int): … # ugly hacks follow. My usual hacks are either to use np.atleast_1d or np.atleast_1d or np.squeeze if there is shape mismatch in some cases. Yes, but in this case, the left side is the problem, which has too few dimensions. So atleast_1d doesn't work. I was conditionally squeezing, but that is extremely ugly. Especially if you're conditionally squeezing based on both x_indices and y_indices. I guess we will or would have applications for outer along an axis, for example if x.shape = (100, 10), then we have x[:,None, :] * x[:, :, None] (I guess) Something like this shows up reasonably often in econometrics as Outer Product. However in most cases we can avoid constructing this matrix and get the final results in a more memory efficient or faster way. (example an array of covariance matrices) Not sure I see this. outer(a, b) should return something that has shape: (a.shape + b.shape). If you're doing it along an axis, you mean you're reshuffling the resulting shape vector? No I'm not reshaping the full tensor product. It's a vectorized version of looping over independent outer products np.array([outer(xi, yi) for xi,yi in zip(x, y)]) (which I would never use with outer) but I have code that works similar for a reduce (or reduce_at) loop over this. Josef Josef - Sebastian Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
This relationship between outer an dot only holds for vectors. For tensors, and other kinds of vector spaces, I'm not sure if outer products and dot products have anything to do with each other. On Fri, Apr 17, 2015 at 11:11 AM, josef.p...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:59 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Fr, 2015-04-17 at 10:47 -0400, josef.p...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: Hi, snip So, how about a slight modification of your proposal? 1) Raise deprecation warning for np.outer for non 1D arrays for a few versions, with depraction in favor of np.multiply.outer, then 2) Raise error for np.outer on non 1D arrays I think that was Neil's proposal a bit earlier, too. +1 for it in any case, since at least for the moment I doubt outer is used a lot for non 1-d arrays. Possible step 3) make it work on higher dims after a long period. sounds ok to me Some random comments of what I remember or guess in terms of usage I think there are at most very few np.outer usages with 2d or higher dimension. (statsmodels has two models that switch between 2d and 1d parameterization where we don't use outer but it has similar characteristics. However, we need to control the ravel order, which IIRC is Fortran) The current behavior of 0-D scalars in the initial post might be useful if a numpy function returns a scalar instead of a 1-D array in size=1. np.diag which is a common case, doesn't return a scalar (in my version of numpy). I don't know any use case where I would ever want to have the 2d behavior of np.multiply.outer. I guess we will or would have applications for outer along an axis, for example if x.shape = (100, 10), then we have x[:,None, :] * x[:, :, None] (I guess) Something like this shows up reasonably often in econometrics as Outer Product. However in most cases we can avoid constructing this matrix and get the final results in a more memory efficient or faster way. (example an array of covariance matrices) So basically outer product of stacked vectors (fitting basically into how np.linalg functions now work). I think that might be a good idea, but even then we first need to do the deprecation and it would be a long term project. Or you add np.linalg.outer or such sooner and in the longer run it will be an alias to that instead of np.multiple.outer. Essentially yes, but I don't have an opinion about location or implementation in numpy, nor do I know enough. I always considered np.outer conceptually as belonging to linalg that provides a more convenient interface than np.dot if both arrays are 1-D. (no need to add extra axis and transpose) Josef Josef - Sebastian Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Would it be possible to deprecate np.outer's usage on non one-dimensional vectors for a few versions, and then reintroduce it with definition np.outer == np.multiply.outer? On Wed, Apr 15, 2015 at 8:02 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Apr 15, 2015 at 6:08 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. I assume it does because np.corrcoeff uses it, and it's the same type of use cases. However, I'm not using it very often (I prefer broadcasting), but I've seen it often enough when reviewing code. This is mainly to point out that it could be a popular function (that maybe shouldn't be deprecated) https://github.com/search?utf8=%E2%9C%93q=np.outer 416914 For future reference, that's not the number -- you have to click through to Code and then look at a single-language result to get anything remotely meaningful. In this case b/c they're different by an order of magnitude, and in general because sometimes the top line number is completely made up (like it has no relation to the per-language numbers on the left and then changes around randomly if you simply reload the page). (So 29,397 is what you want in this case.) Also that count then tends to have tons of duplicates (e.g. b/c there are hundreds of copies of numpy itself on github), so you need a big grain of salt when looking at the absolute number, but it can be useful, esp. for relative comparisons. My mistake, rushing too much. github show only 25 code references in numpy itself. in quotes, python only (namespace conscious packages on github) (I think github counts modules not instances) np.cumsum 11,022 np.cumprod 1,290 np.outer 6,838 statsmodels np.cumsum 21 np.cumprod 2 np.outer 15 Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Right. On Thu, Apr 16, 2015 at 6:44 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Apr 16, 2015 at 6:37 PM, Neil Girdhar mistersh...@gmail.com wrote: I can always put np.outer = np.multiply.outer at the start of my code to get what I want. Or could that break things? Please don't do this. It means that there are any calls to np.outer in libraries you are using (or other libraries that are also used by anyone who is using your code), they will silently get np.multiply.outer instead of np.outer. And then if this breaks things we end up getting extremely confusing bug reports from angry users who think we broke np.outer. Just do 'outer = np.multiply.outer' and leave the np namespace alone :-) -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
That sounds good to me. I can always put np.outer = np.multiply.outer at the start of my code to get what I want. Or could that break things? On Thu, Apr 16, 2015 at 6:28 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Thu, Apr 16, 2015 at 3:19 PM, Neil Girdhar mistersh...@gmail.com wrote: Actually, looking at the docs, numpy.outer is *only* defined for 1-d vectors. Should anyone who used it with multi-dimensional arrays have an expectation that it will keep working in the same way? On Thu, Apr 16, 2015 at 10:53 AM, Neil Girdhar mistersh...@gmail.com wrote: Would it be possible to deprecate np.outer's usage on non one-dimensional vectors for a few versions, and then reintroduce it with definition np.outer == np.multiply.outer? I think the general idea is that a) people often miss deprecation warnings b) there is lots of legacy code out there, and c) it's very bad if legacy code silently gives different answers in newer numpy versions d) it's not so bad if newer numpy gives an intelligible error for code that used to work. So, how about a slight modification of your proposal? 1) Raise deprecation warning for np.outer for non 1D arrays for a few versions, with depraction in favor of np.multiply.outer, then 2) Raise error for np.outer on non 1D arrays Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
On Thu, Apr 16, 2015 at 6:32 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Apr 16, 2015 at 6:19 PM, Neil Girdhar mistersh...@gmail.com wrote: Actually, looking at the docs, numpy.outer is *only* defined for 1-d vectors. Should anyone who used it with multi-dimensional arrays have an expectation that it will keep working in the same way? Yes. Generally what we do is more important than what we say we do. Changing behaviour can break code. Changing docs can change whose fault this is, but broken code is still broken code. And if you put on your user hat, what do you do when numpy acts weird -- shake your fist at the heavens and give up, or sigh and update your code to match? It's pretty common for even undocumented behaviour to still be depended on. Also FWIW, np.outer's docstring says Input is flattened if not already 1-dimensional, so we actually did document this. Ah, yeah, somehow I missed that! -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Actually, looking at the docs, numpy.outer is *only* defined for 1-d vectors. Should anyone who used it with multi-dimensional arrays have an expectation that it will keep working in the same way? On Thu, Apr 16, 2015 at 10:53 AM, Neil Girdhar mistersh...@gmail.com wrote: Would it be possible to deprecate np.outer's usage on non one-dimensional vectors for a few versions, and then reintroduce it with definition np.outer == np.multiply.outer? On Wed, Apr 15, 2015 at 8:02 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Apr 15, 2015 at 6:08 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. I assume it does because np.corrcoeff uses it, and it's the same type of use cases. However, I'm not using it very often (I prefer broadcasting), but I've seen it often enough when reviewing code. This is mainly to point out that it could be a popular function (that maybe shouldn't be deprecated) https://github.com/search?utf8=%E2%9C%93q=np.outer 416914 For future reference, that's not the number -- you have to click through to Code and then look at a single-language result to get anything remotely meaningful. In this case b/c they're different by an order of magnitude, and in general because sometimes the top line number is completely made up (like it has no relation to the per-language numbers on the left and then changes around randomly if you simply reload the page). (So 29,397 is what you want in this case.) Also that count then tends to have tons of duplicates (e.g. b/c there are hundreds of copies of numpy itself on github), so you need a big grain of salt when looking at the absolute number, but it can be useful, esp. for relative comparisons. My mistake, rushing too much. github show only 25 code references in numpy itself. in quotes, python only (namespace conscious packages on github) (I think github counts modules not instances) np.cumsum 11,022 np.cumprod 1,290 np.outer 6,838 statsmodels np.cumsum 21 np.cumprod 2 np.outer 15 Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
You got it. I remember this from when I worked at Google and we would process (many many) logs. With enough bins, the approximation is still really close. It's great if you want to make an automatic plot of data. Calling numpy.partition a hundred times is probably slower than calling P^2 with n=100 bins. I don't think it does O(n) computations per point. I think it's more like O(log(n)). Best, Neil On Wed, Apr 15, 2015 at 10:02 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar mistersh...@gmail.com wrote: Yeah, I'm not arguing, I'm just curious about your reasoning. That explains why not C++. Why would you want to do this in C and not Python? Well, the algorithm has to iterate over all the inputs, updating the estimated percentile positions at every iteration. Because the estimated percentiles may change in every iteration, I don't think there is an easy way of vectorizing the calculation with numpy. So I think it would be very slow if done in Python. Looking at this in some more details, how is this typically used? Because it gives you approximate values that should split your sample into similarly filled bins, but because the values are approximate, to compute a proper histogram you would still need to do the binning to get the exact results, right? Even with this drawback P-2 does have an algorithmic advantage, so for huge inputs and many bins it should come ahead. But for many medium sized problems it may be faster to simply use np.partition, which gives you the whole thing in a single go. And it would be much simpler to implement. Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
I don't understand. Are you at pycon by any chance? On Wed, Apr 15, 2015 at 6:12 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 6:08 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. I assume it does because np.corrcoeff uses it, and it's the same type of use cases. However, I'm not using it very often (I prefer broadcasting), but I've seen it often enough when reviewing code. This is mainly to point out that it could be a popular function (that maybe shouldn't be deprecated) https://github.com/search?utf8=%E2%9C%93q=np.outer 416914 After thinking another minute: I think it should not be deprecated, it's like toepliz. We can use it also to normalize 2d arrays where columns and rows are different not symmetric as in the corrcoef case. Josef Josef On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree. If I get started on the PR to deprecate np.outer, maybe I can do it as part of the same PR? On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net wrote: Just a general thing, if someone has a few minutes, I think it would make sense to add the ufunc.reduce thing to all of these functions at least in the See Also or Notes section in the documentation. These special attributes are not that well known, and I think that might be a nice way to make it easier to find. - Sebastian On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: I am, yes. On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote: Ok, I didn't know that. Are you at pycon by any chance? On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree with you regarding np.sum and np.product, which is why I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure whether cumsum and cumprod might be on the line in your judgment. Ah, I see. I think we should treat them the same for now -- all the comments I made apply to a lesser or greater extent (in particular, cumsum and cumprod both do the thing where they dispatch to .cumsum() .cumprod() method). -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I'm just looking at this thread. I see outer used quite often corrcoef = cov / np.outer(std, std) (even I use it sometimes instead of cov / std[:,None] / std Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
Cool, thanks for looking at this. P2 might still be better even if the whole dataset is in memory because of cache misses. Partition, which I guess is based on quickselect, is going to run over all of the data as many times as there are bins roughly, whereas p2 only runs over it once. From a cache miss standpoint, I think p2 is better? Anyway, it might be worth maybe coding to verify any performance advantages? Not sure if it should be in numpy or not since it really should accept an iterable rather than a numpy vector, right? Best, Neil On Wed, Apr 15, 2015 at 12:40 PM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar mistersh...@gmail.com wrote: You got it. I remember this from when I worked at Google and we would process (many many) logs. With enough bins, the approximation is still really close. It's great if you want to make an automatic plot of data. Calling numpy.partition a hundred times is probably slower than calling P^2 with n=100 bins. I don't think it does O(n) computations per point. I think it's more like O(log(n)). Looking at it again, it probably is O(n) after all: it does a binary search, which is O(log n), but it then goes on to update all the n bin counters and estimations, so O(n) I'm afraid. So there is no algorithmic advantage over partition/percentile: if there are m samples and n bins, P-2 that O(n) m times, while partition does O(m) n times, so both end up being O(m n). It seems to me that the big thing of P^2 is not having to hold the full dataset in memory. Online statistics (is that the name for this?), even if only estimations, is a cool thing, but I am not sure numpy is the place for them. That's not to say that we couldn't eventually have P^2 implemented for histogram, but I would start off with a partition based one. Would SciPy have a place for online statistics? Perhaps there's room for yet another scikit? Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree. If I get started on the PR to deprecate np.outer, maybe I can do it as part of the same PR? On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net wrote: Just a general thing, if someone has a few minutes, I think it would make sense to add the ufunc.reduce thing to all of these functions at least in the See Also or Notes section in the documentation. These special attributes are not that well known, and I think that might be a nice way to make it easier to find. - Sebastian On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: I am, yes. On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote: Ok, I didn't know that. Are you at pycon by any chance? On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree with you regarding np.sum and np.product, which is why I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure whether cumsum and cumprod might be on the line in your judgment. Ah, I see. I think we should treat them the same for now -- all the comments I made apply to a lesser or greater extent (in particular, cumsum and cumprod both do the thing where they dispatch to .cumsum() .cumprod() method). -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I'm just looking at this thread. I see outer used quite often corrcoef = cov / np.outer(std, std) (even I use it sometimes instead of cov / std[:,None] / std Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Yes, I totally agree. If I get started on the PR to deprecate np.outer, maybe I can do it as part of the same PR? On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net wrote: Just a general thing, if someone has a few minutes, I think it would make sense to add the ufunc.reduce thing to all of these functions at least in the See Also or Notes section in the documentation. These special attributes are not that well known, and I think that might be a nice way to make it easier to find. - Sebastian On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: I am, yes. On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote: Ok, I didn't know that. Are you at pycon by any chance? On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree with you regarding np.sum and np.product, which is why I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure whether cumsum and cumprod might be on the line in your judgment. Ah, I see. I think we should treat them the same for now -- all the comments I made apply to a lesser or greater extent (in particular, cumsum and cumprod both do the thing where they dispatch to .cumsum() .cumprod() method). -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
Yeah, I'm not arguing, I'm just curious about your reasoning. That explains why not C++. Why would you want to do this in C and not Python? On Wed, Apr 15, 2015 at 1:48 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar mistersh...@gmail.com wrote: If you're going to C, is there a reason not to go to C++ and include the already-written Boost code? Otherwise, why not use Python? I think we have an explicit rule against C++, although I may be wrong. Not sure how much of boost we would have to make part of numpy to use that, the whole accumulators lib I'm guessing? Seems like an awful lot given what we are after. Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
It also appears that cumsum has a lot of unnecessary overhead over add.accumulate: In [51]: %timeit np.add.accumulate(a) The slowest run took 46.31 times longer than the fastest. This could mean that an intermediate result is being cached 100 loops, best of 3: 372 ns per loop In [52]: %timeit np.cum np.cumprod np.cumproduct np.cumsum In [52]: %timeit np.cumsum(a) The slowest run took 18.44 times longer than the fastest. This could mean that an intermediate result is being cached 100 loops, best of 3: 912 ns per loop In [53]: %timeit np.add.accumulate(a.flatten()) The slowest run took 25.59 times longer than the fastest. This could mean that an intermediate result is being cached 100 loops, best of 3: 834 ns per loop On Tue, Apr 14, 2015 at 7:42 AM, Neil Girdhar mistersh...@gmail.com wrote: Okay, but by the same token, why do we have cumsum? Isn't it identical to np.add.accumulate — or if you're passing in multidimensional data — np.add.accumulate(a.flatten()) ? add.accumulate feels more generic, would make the other ufunc things more discoverable, and is self-documenting. Similarly, cumprod is just np.multiply.accumulate. Best, Neil On Sat, Apr 11, 2015 at 12:49 PM, Nathaniel Smith n...@pobox.com wrote: Documentation and a call to warnings.warn(DeprecationWarning(...)), I guess. On Sat, Apr 11, 2015 at 12:39 PM, Neil Girdhar mistersh...@gmail.com wrote: I would be happy to, but I'm not sure what that involves? It's just a documentation changelist? On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar mistersh...@gmail.com wrote: On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar mistersh...@gmail.com wrote: Numpy's outer product works fine with vectors. However, I seem to always want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape). Wolfram-alpha seems to agree https://reference.wolfram.com/language/ref/Outer.html with respect to matrix outer products. You're probably right that this is the correct definition of the outer product in an n-dimensional world. But this seems to go beyond being just a bug in handling 0-d arrays (which is the kind of corner case we've fixed in the past); np.outer is documented to always ravel its inputs to 1d. In fact the implementation is literally just: a = asarray(a) b = asarray(b) return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) Sebastian's np.multiply.outer is much more generic and effective. Maybe we should just deprecate np.outer? I don't see what use it serves. (When and whether it actually got removed after being deprecated would depend on how much use it actually gets in real code, which I certainly don't know while typing a quick email. But we could start telling people not to use it any time.) +1 with everything you said. Want to write a PR? :-) -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Ok, I didn't know that. Are you at pycon by any chance? On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree with you regarding np.sum and np.product, which is why I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure whether cumsum and cumprod might be on the line in your judgment. Ah, I see. I think we should treat them the same for now -- all the comments I made apply to a lesser or greater extent (in particular, cumsum and cumprod both do the thing where they dispatch to .cumsum() .cumprod() method). -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
If you're going to C, is there a reason not to go to C++ and include the already-written Boost code? Otherwise, why not use Python? On Tue, Apr 14, 2015 at 7:24 PM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar mistersh...@gmail.com wrote: Can I suggest that we instead add the P-square algorithm for the dynamic calculation of histograms? ( http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf ) This is already implemented in C++'s boost library ( http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp ) I implemented it in Boost Python as a module, which I'm happy to share. This is much better than fixed-width histograms in practice. Rather than adjusting the number of bins, it adjusts what you really want, which is the resolution of the bins throughout the domain. This definitely sounds like a useful thing to have in numpy or scipy (though if it's possible to do without using Boost/C++ that would be nice). But yeah, we should leave the existing histogram alone (in this regard) and add a new name for this like adaptive_histogram or something. Then you can set about convincing matplotlib and friends to use it by default :-) Would having a negative number of bins mean this many, but with optimized boundaries be too clever an interface? I have taken a look at the paper linked, and the P-2 algorithm would not be too complicated to implement from scratch, although it would require writing some C code I'm afraid. Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
By the way, the p^2 algorithm still needs to know how many bins you want. It just adapts the endpoints of the bins. I like adaptive=True. However, you will have to find a way to return both the bins and and their calculated endpoints. The P^2 algorithm can also give approximate answers to numpy.percentile, numpy.median. How approximate they are depends on the number of bins you let it keep track of. I believe the authors bound the error as a function of number of points and bins. On Tue, Apr 14, 2015 at 10:00 PM, Paul Hobson pmhob...@gmail.com wrote: On Tue, Apr 14, 2015 at 4:24 PM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar mistersh...@gmail.com wrote: Can I suggest that we instead add the P-square algorithm for the dynamic calculation of histograms? ( http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf ) This is already implemented in C++'s boost library ( http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp ) I implemented it in Boost Python as a module, which I'm happy to share. This is much better than fixed-width histograms in practice. Rather than adjusting the number of bins, it adjusts what you really want, which is the resolution of the bins throughout the domain. This definitely sounds like a useful thing to have in numpy or scipy (though if it's possible to do without using Boost/C++ that would be nice). But yeah, we should leave the existing histogram alone (in this regard) and add a new name for this like adaptive_histogram or something. Then you can set about convincing matplotlib and friends to use it by default :-) Would having a negative number of bins mean this many, but with optimized boundaries be too clever an interface? As a user, I think so. Wouldn't np.histogram(..., adaptive=True) do well enough? -p ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Okay, but by the same token, why do we have cumsum? Isn't it identical to np.add.accumulate — or if you're passing in multidimensional data — np.add.accumulate(a.flatten()) ? add.accumulate feels more generic, would make the other ufunc things more discoverable, and is self-documenting. Similarly, cumprod is just np.multiply.accumulate. Best, Neil On Sat, Apr 11, 2015 at 12:49 PM, Nathaniel Smith n...@pobox.com wrote: Documentation and a call to warnings.warn(DeprecationWarning(...)), I guess. On Sat, Apr 11, 2015 at 12:39 PM, Neil Girdhar mistersh...@gmail.com wrote: I would be happy to, but I'm not sure what that involves? It's just a documentation changelist? On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar mistersh...@gmail.com wrote: On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar mistersh...@gmail.com wrote: Numpy's outer product works fine with vectors. However, I seem to always want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape). Wolfram-alpha seems to agree https://reference.wolfram.com/language/ref/Outer.html with respect to matrix outer products. You're probably right that this is the correct definition of the outer product in an n-dimensional world. But this seems to go beyond being just a bug in handling 0-d arrays (which is the kind of corner case we've fixed in the past); np.outer is documented to always ravel its inputs to 1d. In fact the implementation is literally just: a = asarray(a) b = asarray(b) return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) Sebastian's np.multiply.outer is much more generic and effective. Maybe we should just deprecate np.outer? I don't see what use it serves. (When and whether it actually got removed after being deprecated would depend on how much use it actually gets in real code, which I certainly don't know while typing a quick email. But we could start telling people not to use it any time.) +1 with everything you said. Want to write a PR? :-) -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors
Yes, I totally agree with you regarding np.sum and np.product, which is why I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure whether cumsum and cumprod might be on the line in your judgment. Best, Neil On Tue, Apr 14, 2015 at 3:37 PM, Nathaniel Smith n...@pobox.com wrote: On Apr 14, 2015 2:48 PM, Neil Girdhar mistersh...@gmail.com wrote: Okay, but by the same token, why do we have cumsum? Isn't it identical to np.add.accumulate — or if you're passing in multidimensional data — np.add.accumulate(a.flatten()) ? add.accumulate feels more generic, would make the other ufunc things more discoverable, and is self-documenting. Similarly, cumprod is just np.multiply.accumulate. Yeah, but these do have several differences than np.outer: - they get used much more - their definitions are less obviously broken (cumsum has no obvious definition for an n-d array so you have to pick one; outer does have an obvious definition and np.outer got it wrong) - they're more familiar from other systems (R, MATLAB) - they allow for special dispatch rules (e.g. np.sum(a) will try calling a.sum() before it tries coercing a to an ndarray, so e.g. on np.ma objects np.sum works and np.add.accumulate doesn't. Eventually this will perhaps be obviated by __numpy_ufunc__, but that is still some ways off.) So the situation is much less clear cut. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
Can I suggest that we instead add the P-square algorithm for the dynamic calculation of histograms? ( http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf ) This is already implemented in C++'s boost library ( http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp ) I implemented it in Boost Python as a module, which I'm happy to share. This is much better than fixed-width histograms in practice. Rather than adjusting the number of bins, it adjusts what you really want, which is the resolution of the bins throughout the domain. Best, Neil On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote: http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to constantly play around with the number of bins to get a good idea of what I'm looking at. The bins=10 works ok for up to 1000 points or very normal data, but has poor performance for anything else, and doesn't account for variability either. I don't have a method easily available to scale the number of bins given the data. R doesn't suffer from these problems and provides methods for use with it's hist method. I would like to provide similar functionality for matplotlib, to at least provide some kind of good starting point, as histograms are very useful for initial data discovery. The notebook above provides an explanation of the problem as well as some proposed alternatives. Use different datasets (type and size) to see the performance of the suggestions. All of the methods proposed exist in R and literature. I've put together an implementation to add this new functionality, but am hesitant to make a pull request as I would like some feedback from a maintainer before doing so. +1 on the PR. +1 as well. Unfortunately we can't change the default of 10, but a number of string methods, with a bins=auto or some such name prominently recommended in the docstring, would be very good to have. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
Yes, you're right. Although in practice, people almost always want adaptive bins. On Tue, Apr 14, 2015 at 5:08 PM, Chris Barker chris.bar...@noaa.gov wrote: On Mon, Apr 13, 2015 at 5:02 AM, Neil Girdhar mistersh...@gmail.com wrote: Can I suggest that we instead add the P-square algorithm for the dynamic calculation of histograms? ( http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf ) This look slike a great thing to have in numpy. However, I suspect that a lot of the downstream code that uses histogram expects equally-spaced bins. So this should probably be a in addition to, rather than an instead of -CHB This is already implemented in C++'s boost library ( http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp ) I implemented it in Boost Python as a module, which I'm happy to share. This is much better than fixed-width histograms in practice. Rather than adjusting the number of bins, it adjusts what you really want, which is the resolution of the bins throughout the domain. Best, Neil On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote: http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to constantly play around with the number of bins to get a good idea of what I'm looking at. The bins=10 works ok for up to 1000 points or very normal data, but has poor performance for anything else, and doesn't account for variability either. I don't have a method easily available to scale the number of bins given the data. R doesn't suffer from these problems and provides methods for use with it's hist method. I would like to provide similar functionality for matplotlib, to at least provide some kind of good starting point, as histograms are very useful for initial data discovery. The notebook above provides an explanation of the problem as well as some proposed alternatives. Use different datasets (type and size) to see the performance of the suggestions. All of the methods proposed exist in R and literature. I've put together an implementation to add this new functionality, but am hesitant to make a pull request as I would like some feedback from a maintainer before doing so. +1 on the PR. +1 as well. Unfortunately we can't change the default of 10, but a number of string methods, with a bins=auto or some such name prominently recommended in the docstring, would be very good to have. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in 1.9?
Hello, Is this desired behaviour or a regression or a bug? http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray Thanks, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] A context manager for print options
Why not replace get_printoptions/set_printoptions with a context manager accessed using numpy.printoptions in the same way that numpy.errstate exposes a context manager to seterr/geterr? This would make the set method redundant. Also, the context manager returned by numpy.errstate, numpy.printoptions, etc. could expose the dictionary directly. This would make the get methods redundant. Best, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Testing
How do I test a patch that I've made locally? I can't seem to import numpy locally: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python intepreter from there. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Testing
Ah, sorry, didn't see that I can do that from runtests!! Thanks!! On Sun, Oct 27, 2013 at 7:13 PM, Neil Girdhar mistersh...@gmail.com wrote: Since I am trying to add a printoptions context manager, I would like to test it. Should I add tests, or can I somehow use it from an ipython shell? On Sun, Oct 27, 2013 at 7:12 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Oct 27, 2013 at 4:59 PM, Neil Girdhar mistersh...@gmail.comwrote: How do I test a patch that I've made locally? I can't seem to import numpy locally: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python intepreter from there. If you are running current master do python runtests.py --help Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Testing
Since I am trying to add a printoptions context manager, I would like to test it. Should I add tests, or can I somehow use it from an ipython shell? On Sun, Oct 27, 2013 at 7:12 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Oct 27, 2013 at 4:59 PM, Neil Girdhar mistersh...@gmail.comwrote: How do I test a patch that I've made locally? I can't seem to import numpy locally: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python intepreter from there. If you are running current master do python runtests.py --help Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Code review request: PrintOptions
This is my first code review request, so I may have done some things wrong. I think the following URL should work? https://github.com/MisterSheik/numpy/compare Best, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Code review request: PrintOptions
Yeah, I realized that I missed that and figured it wouldn't matter since it was my own master and I don't plan on making other changes to numpy. If you don't mind, how do I move my changelist into a branch? I'm really worried I'm going to lose my changes. On Sun, Oct 27, 2013 at 9:38 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Oct 27, 2013 at 7:23 PM, Neil Girdhar mistersh...@gmail.comwrote: This is my first code review request, so I may have done some things wrong. I think the following URL should work? https://github.com/MisterSheik/numpy/compare The first thing to do is make a new branch for your work. Probably the easiest way from where you are is to make the branch, which will have your changes in it, then go back to master and git reset --hard to the last commit before your work. Working in master is a big no-no. See `doc/source/dev/gitwash/development_workflow.rst`. When you are ready, make a PR for that branch. The code will get reviewed at that point. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Code review request: PrintOptions
Is this what I want? https://github.com/numpy/numpy/pull/3987 On Sun, Oct 27, 2013 at 9:42 PM, Neil Girdhar mistersh...@gmail.com wrote: Yeah, I realized that I missed that and figured it wouldn't matter since it was my own master and I don't plan on making other changes to numpy. If you don't mind, how do I move my changelist into a branch? I'm really worried I'm going to lose my changes. On Sun, Oct 27, 2013 at 9:38 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Oct 27, 2013 at 7:23 PM, Neil Girdhar mistersh...@gmail.comwrote: This is my first code review request, so I may have done some things wrong. I think the following URL should work? https://github.com/MisterSheik/numpy/compare The first thing to do is make a new branch for your work. Probably the easiest way from where you are is to make the branch, which will have your changes in it, then go back to master and git reset --hard to the last commit before your work. Working in master is a big no-no. See `doc/source/dev/gitwash/development_workflow.rst`. When you are ready, make a PR for that branch. The code will get reviewed at that point. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion