Re: [Numpy-discussion] ANN: NumPy 1.7.1 release

2013-04-23 Thread Frédéric Bastien
Hi,

A big thanks for that release.

I also think it would be useful to do a release candidate about this. This
release changed the behavior releated to python long and broke a test in
Theano. Nothing important, but we could have fixed this before the release.

The numpy change is that a python long that don't fit in an int64, but fit
in an uint64, was throwing an overflow exception. Now it return an uint64.

thanks again!

Fred


On Sun, Apr 7, 2013 at 4:09 AM, Ondřej Čertík ondrej.cer...@gmail.comwrote:

 Hi,

 I'm pleased to announce the availability of the final NumPy 1.7.1 release.

 Sources and binary installers can be found at
 https://sourceforge.net/projects/numpy/files/NumPy/1.7.1/

 Only three simple bugs were fixed since 1.7.1rc1 (#3166, #3179, #3187).

 I would like to thank everybody who contributed patches since 1.7.1rc1:
 Eric Fode, Nathaniel J. Smith and Charles Harris.

 Cheers,
 Ondrej

 P.S. I'll create the Mac binary installers in a few days. Pypi is updated.


 =
 NumPy 1.7.1 Release Notes
 =

 This is a bugfix only release in the 1.7.x series.


 Issues fixed
 

 gh-2973   Fix `1` is printed during numpy.test()
 gh-2983   BUG: gh-2969: Backport memory leak fix 80b3a34.
 gh-3007   Backport gh-3006
 gh-2984   Backport fix complex polynomial fit
 gh-2982   BUG: Make nansum work with booleans.
 gh-2985   Backport large sort fixes
 gh-3039   Backport object take
 gh-3105   Backport nditer fix op axes initialization
 gh-3108   BUG: npy-pkg-config ini files were missing after Bento build.
 gh-3124   BUG: PyArray_LexSort allocates too much temporary memory.
 gh-3131   BUG: Exported f2py_size symbol prevents linking multiple f2py
 modules.
 gh-3117   Backport gh-2992
 gh-3135   DOC: Add mention of PyArray_SetBaseObject stealing a reference
 gh-3134   DOC: Fix typo in fft docs (the indexing variable is 'm', not
 'n').
 gh-3136   Backport #3128

 Checksums
 =

 9e369a96b94b107bf3fab7e07fef8557
 release/installers/numpy-1.7.1-win32-superpack-python2.6.exe
 0ab72b3b83528a7ae79c6df9042d61c6  release/installers/numpy-1.7.1.tar.gz
 bb0d30de007d649757a2d6d2e1c59c9a
 release/installers/numpy-1.7.1-win32-superpack-python3.2.exe
 9a72db3cad7a6286c0d22ee43ad9bc6c  release/installers/numpy-1.7.1.zip
 0842258fad82060800b8d1f0896cb83b
 release/installers/numpy-1.7.1-win32-superpack-python3.1.exe
 1b8f29b1fa89a801f83f551adc13aaf5
 release/installers/numpy-1.7.1-win32-superpack-python2.7.exe
 9ca22df942e5d5362cf7154217cb4b69
 release/installers/numpy-1.7.1-win32-superpack-python2.5.exe
 2fd475b893d8427e26153e03ad7d5b69
 release/installers/numpy-1.7.1-win32-superpack-python3.3.exe
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: pandas 0.11.0 released!

2013-04-23 Thread Wes McKinney
hi all,

We've released pandas 0.11.0, a big release that span 3 months of
continuous development, led primarily by the intrepid Jeff Reback
and y-p. The release brings many new features, performance and
API improvements, bug fixes, and other goodies.

Some highlights:

- New precision indexing fields loc, iloc, at, and iat, to reduce
  occasional ambiguity in the catch-all hitherto ix method.
- Expanded support for NumPy data types in DataFrame
- NumExpr integration to accelerate various operator evaluation
- New Cookbook and 10 minutes to pandas pages in the documentation by
Jeff Reback
- Improved DataFrame to CSV exporting performance
- Experimental rplot branch with faceted plots with matplotlib
  merged and open for community hacking

Source archives and Windows installers are on PyPI. Thanks to all
who contributed to this release, especially Jeff and y-p.

What's new: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html
Installers: http://pypi.python.org/pypi/pandas

$ git log v0.10.1..v0.11.0 --pretty=format:%aN | sort | uniq -c | sort -rn
308 y-p
279 jreback
 85 Vytautas Jancauskas
 74 Wes McKinney
 25 Stephen Lin
 22 Andy Hayden
 19 Chang She
 13 Wouter Overmeire
  8 Spencer Lyon
  6 Phillip Cloud
  6 Nicholaus E. Halecky
  5 Thierry Moisan
  5 Skipper Seabold
  4 waitingkuo
  4 Loïc Estève
  4 Jeff Reback
  4 Garrett Drapala
  4 Alvaro Tejero-Cantero
  3 lexual
  3 Dražen Lučanin
  3 dieterv77
  3 dengemann
  3 Dan Birken
  3 Adam Greenhall
  2 Will Furnass
  2 Vytautas Jančauskas
  2 Robert Gieseke
  2 Peter Prettenhofer
  2 Jonathan Chambers
  2 Dieter Vandenbussche
  2 Damien Garaud
  2 Christopher Whelan
  2 Chapman Siu
  2 Brad Buran
  1 vytas
  1 Tim Akinbo
  1 Thomas Kluyver
  1 thauck
  1 stephenwlin
  1 K.-Michael Aye
  1 Karmel Allison
  1 Jeremy Wagner
  1 James Casbon
  1 Illia Polosukhin
  1 Dražen Lučanin
  1 davidjameshumphreys
  1 Dan Davison
  1 Chris Withers
  1 Christian Geier
  1 anomrake

Happy data hacking!

- Wes

What is it
==
pandas is a Python package providing fast, flexible, and
expressive data structures designed to make working with
relational, time series, or any other kind of labeled data both
easy and intuitive. It aims to be the fundamental high-level
building block for doing practical, real world data analysis in
Python.

Links
=
Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst
Documentation: http://pandas.pydata.org
Installers: http://pypi.python.org/pypi/pandas
Code Repository: http://github.com/pydata/pandas
Mailing List: http://groups.google.com/group/pydata
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MapIter api

2013-04-23 Thread Frédéric Bastien
Hi,

this is currently used in Theano! In fact, it is a John S. that implemented
it in NumPy to allow fast gradient of the advanced indexing in Theano. It
allow code like:

matrix1[vector1, vector2] += matrix2

where there is duplicate indices in the vector

In looking at the code, I saw it use at least those part of the interface.

PyArrayMapIterObject
PyArray_MapIterNext
PyArray_ITER_NEXT
PyArray_MapIterSwapAxes
PyArray_BroadcastToShape

I lost the end of this discussion, but I think this is not possible in
NumPy as there was not an agreement to include that. But I remember a few
other user on this list asking for this(and they where Theano user to my
knowledge).

So I would prefer that you don't remove the part that we use for the next
1.8 release.

thanks

Frédéric


On Tue, Apr 16, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Apr 15, 2013 at 5:29 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  Hey,
 
  the MapIter API has only been made public in master right? So it is no
  problem at all to change at least the mapiter struct, right?
 
  I got annoyed at all those special cases that make things difficult to
  get an idea where to put i.e. to fix the boolean array-like stuff. So
  actually started rewriting it (and I already got one big function that
  does all index preparation -- ok it is untested but its basically
  there).
 
  I would guess it is not really a big problem even if it was public for
  longer, since you shouldn't do those direct struct access probably? But
  just checking.

 Why don't we just make the struct opaque, i.e., just declare it in the
 public header file and move the actual definition to an internal
 header file?

 If it's too annoying I guess we could even make it non-public, at
 least in 1.8 -- IIRC it's only there so we can use it in umath, and
 IIRC the patch to use it hasn't landed yet. Or we could just merge
 umath and multiarray into a single .so, that would save a *lot* of
 annoying fiddling with the public API that doesn't actually serve any
 purpose.

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MapIter api

2013-04-23 Thread Sebastian Berg
On Tue, 2013-04-23 at 17:08 -0400, Frédéric Bastien wrote:
 Hi,
 
 this is currently used in Theano! In fact, it is a John S. that
 implemented it in NumPy to allow fast gradient of the advanced
 indexing in Theano. It allow code like:
 
 
 matrix1[vector1, vector2] += matrix2
 
Yes, I had missed that and thought maybe nobody actually used it yet. I
gave some points why I think there should be some changes in the
original pull request [1]. Mostly I think it would make sense (also a
lot for theano) to rewrite it with the new iterators and expose the
subspace more directly. That would give vast speedups for mixed
fancy/non-fancy indices.

But if this is useful to you, I guess one can also just create a new one
if someone finds time, leaving the old MapIter deprecated and
unmaintained.

[1] https://github.com/numpy/numpy/pull/377

 where there is duplicate indices in the vector
 
 In looking at the code, I saw it use at least those part of the
 interface.
 
 PyArrayMapIterObject
 PyArray_MapIterNext
 PyArray_ITER_NEXT
 PyArray_MapIterSwapAxes
 PyArray_BroadcastToShape
 

There is likely no reason for changing these, but improving MapIter
would likely break binary compatibility because of struct access.

- Sebastian
 
 I lost the end of this discussion, but I think this is not possible in
 NumPy as there was not an agreement to include that. But I remember a
 few other user on this list asking for this(and they where Theano user
 to my knowledge).
 
 
 So I would prefer that you don't remove the part that we use for the
 next 1.8 release.
 
 thanks
 
 Frédéric
 
 
 
 On Tue, Apr 16, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com
 wrote:
 On Mon, Apr 15, 2013 at 5:29 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  Hey,
 
  the MapIter API has only been made public in master right?
 So it is no
  problem at all to change at least the mapiter struct, right?
 
  I got annoyed at all those special cases that make things
 difficult to
  get an idea where to put i.e. to fix the boolean array-like
 stuff. So
  actually started rewriting it (and I already got one big
 function that
  does all index preparation -- ok it is untested but its
 basically
  there).
 
  I would guess it is not really a big problem even if it was
 public for
  longer, since you shouldn't do those direct struct access
 probably? But
  just checking.
 
 
 Why don't we just make the struct opaque, i.e., just declare
 it in the
 public header file and move the actual definition to an
 internal
 header file?
 
 If it's too annoying I guess we could even make it non-public,
 at
 least in 1.8 -- IIRC it's only there so we can use it in
 umath, and
 IIRC the patch to use it hasn't landed yet. Or we could just
 merge
 umath and multiarray into a single .so, that would save a
 *lot* of
 annoying fiddling with the public API that doesn't actually
 serve any
 purpose.
 
 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorized percentile function in Numpy (PR #2970)

2013-04-23 Thread Sebastian Berg
On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote:
  Back in December it was pointed out on the scipy-user list[1] that 
 numpy has a percentile function which has similar functionality to 
 scipy's stats.scoreatpercentile.  I've been trying to harmonize these 
 two functions into a single version which has the features of both.
  Scipy PR 374[2] introduced a version which look the parameters from 
 both the scipy and numpy percentile function and was accepted into Scipy 
 with the plan that it would be depreciated when a similar function was 
 introduced into Numpy.  Then I moved to enhancing the Numpy version with 
 Pull Request 2970 [3].  With some input from Sebastian Berg the 
 percentile function was rewritten with further vectorization, but 
 neither of us felt fully comfortable with the final product.  Can 
 someone look at implementation in the PR and suggest what should be done 
 from here?
 

Thanks! For me the main question is the vectorized usage when both
haystack (`a`) and needle (`q`) are vectorized. What I mean is for:

np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1)

I would probably expect an output shape of (n1, n2, 3), but currently
you will get the needle dimensions first, because it is roughly the same
as

[np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., 75.]]

so for the (probably rare) vectorization of both `a` and `q`, would it
be preferable to do some kind of long term behaviour change, or just put
the dimensions in `q` first, which should be compatible to the current
list?

Regards,

Sebastian

   Cheers,
 
  - Jonathan Helmus
 
 
 [1] http://thread.gmane.org/gmane.comp.python.scientific.user/1
 [2] https://github.com/scipy/scipy/pull/374
 [3] https://github.com/numpy/numpy/pull/2970
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MapIter api

2013-04-23 Thread Charles R Harris
On Tue, Apr 23, 2013 at 4:06 PM, Sebastian Berg
sebast...@sipsolutions.netwrote:

 On Tue, 2013-04-23 at 17:08 -0400, Frédéric Bastien wrote:
  Hi,
 
  this is currently used in Theano! In fact, it is a John S. that
  implemented it in NumPy to allow fast gradient of the advanced
  indexing in Theano. It allow code like:
 
 
  matrix1[vector1, vector2] += matrix2
 
 Yes, I had missed that and thought maybe nobody actually used it yet. I
 gave some points why I think there should be some changes in the
 original pull request [1]. Mostly I think it would make sense (also a
 lot for theano) to rewrite it with the new iterators and expose the
 subspace more directly. That would give vast speedups for mixed
 fancy/non-fancy indices.

 But if this is useful to you, I guess one can also just create a new one
 if someone finds time, leaving the old MapIter deprecated and
 unmaintained.

 [1] https://github.com/numpy/numpy/pull/377

  where there is duplicate indices in the vector
 
  In looking at the code, I saw it use at least those part of the
  interface.
 
  PyArrayMapIterObject
  PyArray_MapIterNext
  PyArray_ITER_NEXT
  PyArray_MapIterSwapAxes
  PyArray_BroadcastToShape
 

 There is likely no reason for changing these, but improving MapIter
 would likely break binary compatibility because of struct access.

 - Sebastian
 
  I lost the end of this discussion, but I think this is not possible in
  NumPy as there was not an agreement to include that. But I remember a
  few other user on this list asking for this(and they where Theano user
  to my knowledge).
 
 
  So I would prefer that you don't remove the part that we use for the
  next 1.8 release.
 
  thanks
 
  Frédéric
 
 
 
  On Tue, Apr 16, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com
  wrote:
  On Mon, Apr 15, 2013 at 5:29 PM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
   Hey,
  
   the MapIter API has only been made public in master right?
  So it is no
   problem at all to change at least the mapiter struct, right?
  
   I got annoyed at all those special cases that make things
  difficult to
   get an idea where to put i.e. to fix the boolean array-like
  stuff. So
   actually started rewriting it (and I already got one big
  function that
   does all index preparation -- ok it is untested but its
  basically
   there).
  
   I would guess it is not really a big problem even if it was
  public for
   longer, since you shouldn't do those direct struct access
  probably? But
   just checking.
 
 
  Why don't we just make the struct opaque, i.e., just declare
  it in the
  public header file and move the actual definition to an
  internal
  header file?
 
  If it's too annoying I guess we could even make it non-public,
  at
  least in 1.8 -- IIRC it's only there so we can use it in
  umath, and
  IIRC the patch to use it hasn't landed yet. Or we could just
  merge
  umath and multiarray into a single .so, that would save a
  *lot* of
  annoying fiddling with the public API that doesn't actually
  serve any
  purpose.
 


Does this have any overlap with https://github.com/numpy/numpy/pull/2821 ?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorized percentile function in Numpy (PR #2970)

2013-04-23 Thread josef . pktd
On Tue, Apr 23, 2013 at 6:16 PM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote:
  Back in December it was pointed out on the scipy-user list[1] that
 numpy has a percentile function which has similar functionality to
 scipy's stats.scoreatpercentile.  I've been trying to harmonize these
 two functions into a single version which has the features of both.
  Scipy PR 374[2] introduced a version which look the parameters from
 both the scipy and numpy percentile function and was accepted into Scipy
 with the plan that it would be depreciated when a similar function was
 introduced into Numpy.  Then I moved to enhancing the Numpy version with
 Pull Request 2970 [3].  With some input from Sebastian Berg the
 percentile function was rewritten with further vectorization, but
 neither of us felt fully comfortable with the final product.  Can
 someone look at implementation in the PR and suggest what should be done
 from here?


 Thanks! For me the main question is the vectorized usage when both
 haystack (`a`) and needle (`q`) are vectorized. What I mean is for:

 np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1)

 I would probably expect an output shape of (n1, n2, 3), but currently
 you will get the needle dimensions first, because it is roughly the same
 as

 [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., 
 75.]]

 so for the (probably rare) vectorization of both `a` and `q`, would it
 be preferable to do some kind of long term behaviour change, or just put
 the dimensions in `q` first, which should be compatible to the current
 list?

I don't have much of a preference either way, but I'm glad this is
going into numpy.
We can work with it either way.

In stats, the most common case will be axis=0, and then the two are
the same, aren't they?

What I like about the second version is unrolling (with 2 or 3
quantiles), which I think will work

u, l = np.random.randn(2,5)
or
res = np.percentile(...)
func(*res)

The first case will be nicer when there are lots of percentiles, but I
guess I won't need it much except for axis=0.

Actually, I would prefer the second version, because it might be a bit
more cumbersome to get the individual percentiles out if the axis is
somewhere in the middle, however I don't think I have a case like
that.

The first version would be consistent with reduceat, and that would be
more numpythonic. I would go for that in numpy.

my 2.5c

Josef


 Regards,

 Sebastian

   Cheers,

  - Jonathan Helmus


 [1] http://thread.gmane.org/gmane.comp.python.scientific.user/1
 [2] https://github.com/scipy/scipy/pull/374
 [3] https://github.com/numpy/numpy/pull/2970
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion