Re: [Numpy-discussion] ANN: NumPy 1.7.1 release
Hi, A big thanks for that release. I also think it would be useful to do a release candidate about this. This release changed the behavior releated to python long and broke a test in Theano. Nothing important, but we could have fixed this before the release. The numpy change is that a python long that don't fit in an int64, but fit in an uint64, was throwing an overflow exception. Now it return an uint64. thanks again! Fred On Sun, Apr 7, 2013 at 4:09 AM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Hi, I'm pleased to announce the availability of the final NumPy 1.7.1 release. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.7.1/ Only three simple bugs were fixed since 1.7.1rc1 (#3166, #3179, #3187). I would like to thank everybody who contributed patches since 1.7.1rc1: Eric Fode, Nathaniel J. Smith and Charles Harris. Cheers, Ondrej P.S. I'll create the Mac binary installers in a few days. Pypi is updated. = NumPy 1.7.1 Release Notes = This is a bugfix only release in the 1.7.x series. Issues fixed gh-2973 Fix `1` is printed during numpy.test() gh-2983 BUG: gh-2969: Backport memory leak fix 80b3a34. gh-3007 Backport gh-3006 gh-2984 Backport fix complex polynomial fit gh-2982 BUG: Make nansum work with booleans. gh-2985 Backport large sort fixes gh-3039 Backport object take gh-3105 Backport nditer fix op axes initialization gh-3108 BUG: npy-pkg-config ini files were missing after Bento build. gh-3124 BUG: PyArray_LexSort allocates too much temporary memory. gh-3131 BUG: Exported f2py_size symbol prevents linking multiple f2py modules. gh-3117 Backport gh-2992 gh-3135 DOC: Add mention of PyArray_SetBaseObject stealing a reference gh-3134 DOC: Fix typo in fft docs (the indexing variable is 'm', not 'n'). gh-3136 Backport #3128 Checksums = 9e369a96b94b107bf3fab7e07fef8557 release/installers/numpy-1.7.1-win32-superpack-python2.6.exe 0ab72b3b83528a7ae79c6df9042d61c6 release/installers/numpy-1.7.1.tar.gz bb0d30de007d649757a2d6d2e1c59c9a release/installers/numpy-1.7.1-win32-superpack-python3.2.exe 9a72db3cad7a6286c0d22ee43ad9bc6c release/installers/numpy-1.7.1.zip 0842258fad82060800b8d1f0896cb83b release/installers/numpy-1.7.1-win32-superpack-python3.1.exe 1b8f29b1fa89a801f83f551adc13aaf5 release/installers/numpy-1.7.1-win32-superpack-python2.7.exe 9ca22df942e5d5362cf7154217cb4b69 release/installers/numpy-1.7.1-win32-superpack-python2.5.exe 2fd475b893d8427e26153e03ad7d5b69 release/installers/numpy-1.7.1-win32-superpack-python3.3.exe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ANN: pandas 0.11.0 released!
hi all, We've released pandas 0.11.0, a big release that span 3 months of continuous development, led primarily by the intrepid Jeff Reback and y-p. The release brings many new features, performance and API improvements, bug fixes, and other goodies. Some highlights: - New precision indexing fields loc, iloc, at, and iat, to reduce occasional ambiguity in the catch-all hitherto ix method. - Expanded support for NumPy data types in DataFrame - NumExpr integration to accelerate various operator evaluation - New Cookbook and 10 minutes to pandas pages in the documentation by Jeff Reback - Improved DataFrame to CSV exporting performance - Experimental rplot branch with faceted plots with matplotlib merged and open for community hacking Source archives and Windows installers are on PyPI. Thanks to all who contributed to this release, especially Jeff and y-p. What's new: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html Installers: http://pypi.python.org/pypi/pandas $ git log v0.10.1..v0.11.0 --pretty=format:%aN | sort | uniq -c | sort -rn 308 y-p 279 jreback 85 Vytautas Jancauskas 74 Wes McKinney 25 Stephen Lin 22 Andy Hayden 19 Chang She 13 Wouter Overmeire 8 Spencer Lyon 6 Phillip Cloud 6 Nicholaus E. Halecky 5 Thierry Moisan 5 Skipper Seabold 4 waitingkuo 4 Loïc Estève 4 Jeff Reback 4 Garrett Drapala 4 Alvaro Tejero-Cantero 3 lexual 3 Dražen Lučanin 3 dieterv77 3 dengemann 3 Dan Birken 3 Adam Greenhall 2 Will Furnass 2 Vytautas Jančauskas 2 Robert Gieseke 2 Peter Prettenhofer 2 Jonathan Chambers 2 Dieter Vandenbussche 2 Damien Garaud 2 Christopher Whelan 2 Chapman Siu 2 Brad Buran 1 vytas 1 Tim Akinbo 1 Thomas Kluyver 1 thauck 1 stephenwlin 1 K.-Michael Aye 1 Karmel Allison 1 Jeremy Wagner 1 James Casbon 1 Illia Polosukhin 1 Dražen Lučanin 1 davidjameshumphreys 1 Dan Davison 1 Chris Withers 1 Christian Geier 1 anomrake Happy data hacking! - Wes What is it == pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with relational, time series, or any other kind of labeled data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Links = Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst Documentation: http://pandas.pydata.org Installers: http://pypi.python.org/pypi/pandas Code Repository: http://github.com/pydata/pandas Mailing List: http://groups.google.com/group/pydata ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MapIter api
Hi, this is currently used in Theano! In fact, it is a John S. that implemented it in NumPy to allow fast gradient of the advanced indexing in Theano. It allow code like: matrix1[vector1, vector2] += matrix2 where there is duplicate indices in the vector In looking at the code, I saw it use at least those part of the interface. PyArrayMapIterObject PyArray_MapIterNext PyArray_ITER_NEXT PyArray_MapIterSwapAxes PyArray_BroadcastToShape I lost the end of this discussion, but I think this is not possible in NumPy as there was not an agreement to include that. But I remember a few other user on this list asking for this(and they where Theano user to my knowledge). So I would prefer that you don't remove the part that we use for the next 1.8 release. thanks Frédéric On Tue, Apr 16, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote: On Mon, Apr 15, 2013 at 5:29 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hey, the MapIter API has only been made public in master right? So it is no problem at all to change at least the mapiter struct, right? I got annoyed at all those special cases that make things difficult to get an idea where to put i.e. to fix the boolean array-like stuff. So actually started rewriting it (and I already got one big function that does all index preparation -- ok it is untested but its basically there). I would guess it is not really a big problem even if it was public for longer, since you shouldn't do those direct struct access probably? But just checking. Why don't we just make the struct opaque, i.e., just declare it in the public header file and move the actual definition to an internal header file? If it's too annoying I guess we could even make it non-public, at least in 1.8 -- IIRC it's only there so we can use it in umath, and IIRC the patch to use it hasn't landed yet. Or we could just merge umath and multiarray into a single .so, that would save a *lot* of annoying fiddling with the public API that doesn't actually serve any purpose. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MapIter api
On Tue, 2013-04-23 at 17:08 -0400, Frédéric Bastien wrote: Hi, this is currently used in Theano! In fact, it is a John S. that implemented it in NumPy to allow fast gradient of the advanced indexing in Theano. It allow code like: matrix1[vector1, vector2] += matrix2 Yes, I had missed that and thought maybe nobody actually used it yet. I gave some points why I think there should be some changes in the original pull request [1]. Mostly I think it would make sense (also a lot for theano) to rewrite it with the new iterators and expose the subspace more directly. That would give vast speedups for mixed fancy/non-fancy indices. But if this is useful to you, I guess one can also just create a new one if someone finds time, leaving the old MapIter deprecated and unmaintained. [1] https://github.com/numpy/numpy/pull/377 where there is duplicate indices in the vector In looking at the code, I saw it use at least those part of the interface. PyArrayMapIterObject PyArray_MapIterNext PyArray_ITER_NEXT PyArray_MapIterSwapAxes PyArray_BroadcastToShape There is likely no reason for changing these, but improving MapIter would likely break binary compatibility because of struct access. - Sebastian I lost the end of this discussion, but I think this is not possible in NumPy as there was not an agreement to include that. But I remember a few other user on this list asking for this(and they where Theano user to my knowledge). So I would prefer that you don't remove the part that we use for the next 1.8 release. thanks Frédéric On Tue, Apr 16, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote: On Mon, Apr 15, 2013 at 5:29 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hey, the MapIter API has only been made public in master right? So it is no problem at all to change at least the mapiter struct, right? I got annoyed at all those special cases that make things difficult to get an idea where to put i.e. to fix the boolean array-like stuff. So actually started rewriting it (and I already got one big function that does all index preparation -- ok it is untested but its basically there). I would guess it is not really a big problem even if it was public for longer, since you shouldn't do those direct struct access probably? But just checking. Why don't we just make the struct opaque, i.e., just declare it in the public header file and move the actual definition to an internal header file? If it's too annoying I guess we could even make it non-public, at least in 1.8 -- IIRC it's only there so we can use it in umath, and IIRC the patch to use it hasn't landed yet. Or we could just merge umath and multiarray into a single .so, that would save a *lot* of annoying fiddling with the public API that doesn't actually serve any purpose. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Vectorized percentile function in Numpy (PR #2970)
On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote: Back in December it was pointed out on the scipy-user list[1] that numpy has a percentile function which has similar functionality to scipy's stats.scoreatpercentile. I've been trying to harmonize these two functions into a single version which has the features of both. Scipy PR 374[2] introduced a version which look the parameters from both the scipy and numpy percentile function and was accepted into Scipy with the plan that it would be depreciated when a similar function was introduced into Numpy. Then I moved to enhancing the Numpy version with Pull Request 2970 [3]. With some input from Sebastian Berg the percentile function was rewritten with further vectorization, but neither of us felt fully comfortable with the final product. Can someone look at implementation in the PR and suggest what should be done from here? Thanks! For me the main question is the vectorized usage when both haystack (`a`) and needle (`q`) are vectorized. What I mean is for: np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1) I would probably expect an output shape of (n1, n2, 3), but currently you will get the needle dimensions first, because it is roughly the same as [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., 75.]] so for the (probably rare) vectorization of both `a` and `q`, would it be preferable to do some kind of long term behaviour change, or just put the dimensions in `q` first, which should be compatible to the current list? Regards, Sebastian Cheers, - Jonathan Helmus [1] http://thread.gmane.org/gmane.comp.python.scientific.user/1 [2] https://github.com/scipy/scipy/pull/374 [3] https://github.com/numpy/numpy/pull/2970 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MapIter api
On Tue, Apr 23, 2013 at 4:06 PM, Sebastian Berg sebast...@sipsolutions.netwrote: On Tue, 2013-04-23 at 17:08 -0400, Frédéric Bastien wrote: Hi, this is currently used in Theano! In fact, it is a John S. that implemented it in NumPy to allow fast gradient of the advanced indexing in Theano. It allow code like: matrix1[vector1, vector2] += matrix2 Yes, I had missed that and thought maybe nobody actually used it yet. I gave some points why I think there should be some changes in the original pull request [1]. Mostly I think it would make sense (also a lot for theano) to rewrite it with the new iterators and expose the subspace more directly. That would give vast speedups for mixed fancy/non-fancy indices. But if this is useful to you, I guess one can also just create a new one if someone finds time, leaving the old MapIter deprecated and unmaintained. [1] https://github.com/numpy/numpy/pull/377 where there is duplicate indices in the vector In looking at the code, I saw it use at least those part of the interface. PyArrayMapIterObject PyArray_MapIterNext PyArray_ITER_NEXT PyArray_MapIterSwapAxes PyArray_BroadcastToShape There is likely no reason for changing these, but improving MapIter would likely break binary compatibility because of struct access. - Sebastian I lost the end of this discussion, but I think this is not possible in NumPy as there was not an agreement to include that. But I remember a few other user on this list asking for this(and they where Theano user to my knowledge). So I would prefer that you don't remove the part that we use for the next 1.8 release. thanks Frédéric On Tue, Apr 16, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote: On Mon, Apr 15, 2013 at 5:29 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Hey, the MapIter API has only been made public in master right? So it is no problem at all to change at least the mapiter struct, right? I got annoyed at all those special cases that make things difficult to get an idea where to put i.e. to fix the boolean array-like stuff. So actually started rewriting it (and I already got one big function that does all index preparation -- ok it is untested but its basically there). I would guess it is not really a big problem even if it was public for longer, since you shouldn't do those direct struct access probably? But just checking. Why don't we just make the struct opaque, i.e., just declare it in the public header file and move the actual definition to an internal header file? If it's too annoying I guess we could even make it non-public, at least in 1.8 -- IIRC it's only there so we can use it in umath, and IIRC the patch to use it hasn't landed yet. Or we could just merge umath and multiarray into a single .so, that would save a *lot* of annoying fiddling with the public API that doesn't actually serve any purpose. Does this have any overlap with https://github.com/numpy/numpy/pull/2821 ? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Vectorized percentile function in Numpy (PR #2970)
On Tue, Apr 23, 2013 at 6:16 PM, Sebastian Berg sebast...@sipsolutions.net wrote: On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote: Back in December it was pointed out on the scipy-user list[1] that numpy has a percentile function which has similar functionality to scipy's stats.scoreatpercentile. I've been trying to harmonize these two functions into a single version which has the features of both. Scipy PR 374[2] introduced a version which look the parameters from both the scipy and numpy percentile function and was accepted into Scipy with the plan that it would be depreciated when a similar function was introduced into Numpy. Then I moved to enhancing the Numpy version with Pull Request 2970 [3]. With some input from Sebastian Berg the percentile function was rewritten with further vectorization, but neither of us felt fully comfortable with the final product. Can someone look at implementation in the PR and suggest what should be done from here? Thanks! For me the main question is the vectorized usage when both haystack (`a`) and needle (`q`) are vectorized. What I mean is for: np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1) I would probably expect an output shape of (n1, n2, 3), but currently you will get the needle dimensions first, because it is roughly the same as [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., 75.]] so for the (probably rare) vectorization of both `a` and `q`, would it be preferable to do some kind of long term behaviour change, or just put the dimensions in `q` first, which should be compatible to the current list? I don't have much of a preference either way, but I'm glad this is going into numpy. We can work with it either way. In stats, the most common case will be axis=0, and then the two are the same, aren't they? What I like about the second version is unrolling (with 2 or 3 quantiles), which I think will work u, l = np.random.randn(2,5) or res = np.percentile(...) func(*res) The first case will be nicer when there are lots of percentiles, but I guess I won't need it much except for axis=0. Actually, I would prefer the second version, because it might be a bit more cumbersome to get the individual percentiles out if the axis is somewhere in the middle, however I don't think I have a case like that. The first version would be consistent with reduceat, and that would be more numpythonic. I would go for that in numpy. my 2.5c Josef Regards, Sebastian Cheers, - Jonathan Helmus [1] http://thread.gmane.org/gmane.comp.python.scientific.user/1 [2] https://github.com/scipy/scipy/pull/374 [3] https://github.com/numpy/numpy/pull/2970 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion