[Numpy-discussion] ANN: pandas v0.17.0rc1 - RELEASE CANDIDATE

2015-09-11 Thread Jeff Reback
Hi,

I'm pleased to announce the availability of the first release candidate of
Pandas 0.17.0.
Please try this RC and report any issues here: Pandas Issues

We will be releasing officially in 1-2 weeks or so.

**RELEASE CANDIDATE 1**

This is a major release from 0.16.2 and includes a small number of API
changes, several new features, enhancements, and performance improvements
along with a large number of bug fixes. We recommend that all users upgrade
to this version.

Highlights include:


   - Release the Global Interpreter Lock (GIL) on some cython operations,
   see here
   

   - Plotting methods are now available as attributes of the .plot
   accessor, see here
   

   - The sorting API has been revamped to remove some long-time
   inconsistencies, see here
   

   - Support for a datetime64[ns] with timezones as a first-class dtype,
   see here
   

   - The default for to_datetime will now be to raise when presented with
   unparseable formats, previously this would return the original input, see
   here
   

   - The default for dropna in HDFStore has changed to False, to store by
   default all rows even if they are all NaN, see here
   

   - Support for Series.dt.strftime to generate formatted strings for
   datetime-likes, see here
   

   - Development installed versions of pandas will now have PEP440
   compliant version strings GH9518
   
   - Development support for benchmarking with the Air Speed Velocity
   library GH8316 
   - Support for reading SAS xport files, see here
   

   - Removal of the automatic TimeSeries broadcasting, deprecated since
   0.8.0, see here
   


See the Whatsnew
 for much
more information.

Best way to get this is to install via conda

from
our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7
and Python 3.4 are all available.

conda install pandas -c pandas

Thanks to all who made this release happen. It is a very large release!

Jeff
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] feature request - increment counter on write check

2015-09-11 Thread Daniel
Nathaniel Smith wrote
> Actually, now that I think about it there is a much worse general version
> of this problem, which probably kills the idea dead. Via the buffer
> interface (among others), it is totally legal and encouraged to create
> non-numpy views of numpy arrays. They'll check the flags once at creation
> time, but after that they're free to write directly to the underlying
> buffer whenever they please:
> 
> a = np.ones(10)
> a2 = memoryview(a)
> a2[0] = 0
> # now what?

You could say that when you create a memoryview you set a  to being
non-mutablehash-able.  Ideally you would want a C++ RAII-like/python
with-block style implementation so that when the memoryview is destroyed the
mutablehash-able-ness is reinstated (although I guess you would need a ref
count of memoryviews to do that safely).  I don't know how common memoryview
is within numpy itself or whether it's only part of the extension API?  If
it's relatively rare, then given that this hashing thing could easily be
designed to fail gracefully it would seem a shame not to implement it simply
because of this issue...in my opinion of course!


P.S. I had no idea about the view-local write-lock...that's rather a
surprise.



--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/feature-request-increment-counter-on-write-check-tp41015p41023.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] feature request - increment counter on write check

2015-09-11 Thread Daniel
Anne Archibald-3 wrote
> ... You'd also have to make sure that all code that tries to write to an
> array really checks the writable flag.

If there is code out there which modifies arrays without checking the write
flag then it's already on buggy-territory.  In terms of value vs functions,
surely adding an extra pair of empty parens is not going to be a big deal,
although yes it would be a breaking change?  The advantage of piggybacking
on the write-check is that it doesn't really matter how liberal the original
author was with the write checks - you can increment the counter as many
times as you like, though removing unnecessary increments would be vaguely
helpful if this does end up being implemented as I suggest - I suppose you
could also offer an optional keep_it_clean flag in the is_writeable call if
you really wanted.


> Rather than making this happen for all arrays, does it make sense to use
> an
> array subclass with a "dirty flag", maybe even if this requires manual
> setting in some cases?

I have no idea what the reality is of trying to subclass ndarray, neither at
the C level nor at the python level, but perhaps this wouldn't be too bad. 
I just thought that you could do such a neat job down at the C level it
would be nice to make it happen.  Also, regarding "dirty flags", it's harder
to imagine that working with the examples I gave since their may be more
than one "consumers" of the hash and thus an array could be dirty from the
point of view of one consumer, but clean from the point of view of another.


Sebastian wrote
> Another issue is, which may be a non-issue, that this design is 
> inherently not quite thread safe. 

In terms of thread safety, I guess it's a bit complicated...I'd hoped that
it would be covered by the same logic as the simple write check - although
now I think about it again it isn't that simple. It's not really something I
know anything about in the context of numpy, so I guess I shouldn't try and
make any further wild assertions!



--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/feature-request-increment-counter-on-write-check-tp41015p41020.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] feature request - increment counter on write check

2015-09-11 Thread Anne Archibald
On Fri, Sep 11, 2015 at 3:20 PM Sebastian Berg 
wrote:

> On Fr, 2015-09-11 at 13:10 +, Daniel Manson wrote:
> > Originally posted as issue 6301 on github.
> >
> >
> > Presumably any block of code that modifies an ndarray's buffer is
> > wrapped in a (thread safe?) check of the writable flag. Would it be
> > possible to hold a counter rather than a simple bool flag and then
> > increment the counter whenever you test the flag? Hopefully this would
> > introduce only a tiny additional overhead, but would permit
> > pseudo-hashing to test whether a mutable array has changed since you
> > last encountered it.
> >
>
> Just a quick note. This is a bit more complex then it might appear. The
> reason being that when a view of the array is changed, you would have to
> "notify" the array itself that it has changed. So propagation from top
> to bottom does not seem straight forward to me. (the other way is fine,
> since on check you could check all parents, but you cannot check all
> children).
>

Actually not so much. Like the writable flag, you'd make the counter be a
per-buffer piece of information. Each array already has a pointer to the
array object that "owns" the buffer, so you'd just go there in one hop.
This does mean that modifying one view would affect the modified flag on
all views sharing the same buffer, whether there's data overlap or not, but
for caching purposes that's not so bad.

I think a more serious concern is that it may be customary to simply check
the writable flag by hand rather than calling an is_writable function, so
to make this idea work you'd have to change all code that checks the
writable flag, including user code. You'd also have to make sure that all
code that tries to write to an array really checks the writable flag.

Rather than making this happen for all arrays, does it make sense to use an
array subclass with a "dirty flag", maybe even if this requires manual
setting in some cases?

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] feature request - increment counter on write check

2015-09-11 Thread Daniel
>...you would have to "notify" the array itself that it has changed..

I don't get what you mean..or maybe I do...? Do you mean that if there are
two arrays with non-overlapping views of the same data then modifying one
will change the mutablehash of both?  That's true I guess, and I hadn't
considered it, but I don't think it matters hugely because inequality of
mutablehashes is not supposed to imply inequality of data.



--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/feature-request-increment-counter-on-write-check-tp41015p41018.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] feature request - increment counter on write check

2015-09-11 Thread Sebastian Berg
On Fr, 2015-09-11 at 06:00 -0700, Daniel wrote:
> >...you would have to "notify" the array itself that it has changed..
> 
> I don't get what you mean..or maybe I do...? Do you mean that if there are
> two arrays with non-overlapping views of the same data then modifying one
> will change the mutablehash of both?  That's true I guess, and I hadn't
> considered it, but I don't think it matters hugely because inequality of
> mutablehashes is not supposed to imply inequality of data.
> 
> 

I just meant you need to propagate the "changed" flag to the parent,
which yes, means that you also propagate it down again (when testing if
a change occured) to other non-overlapping children (which is not a
problem). As Anne said, propagating to the parents usually means only
the ultimate base (there are some exceptions for subclasses IIRC), so it
might be fine. Though there will be failures cases if the ultimate base
is not a numpy array.

As Anne suggested, I think the writable flag is actually a flag, meaning
there is no room for a counter there. There should be room for a "dirty"
flag.

Another issue is, which may be a non-issue, that this design is
inherently not quite thread safe.

Anyway, I do not know whether or not it is doable or makes sense in
numpy right now, just some things to think about.

- Sebastian


> 
> --
> View this message in context: 
> http://numpy-discussion.10968.n7.nabble.com/feature-request-increment-counter-on-write-check-tp41015p41018.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] review of #6247

2015-09-11 Thread pizza
On Sun, 6 Sep 2015 10:03:16 PM pi...@netspace.net.au wrote:
> On Sat, 5 Sep 2015 10:28:26 PM pi...@netspace.net.au wrote:
> > Hi all, first attempt at this:
> > small doco patch to
> > address https://github.com/numpy/numpy/issues/6247
> > https://github.com/numpy/numpy/compare/master...pizzathief:issue6247?diff=
> > un ified=1=issue6247
> 
> Fixed up next sentence as well.
> 
> https://github.com/numpy/numpy/compare/master...pizzathief:issue6247?diff=un
> ified=1=issue6247

it was pointed out to me that numpy/lib/npyio.py needed updating as well.

https://github.com/numpy/numpy/compare/master...pizzathief:issue6247?diff=unified=1=issue6247
(I'm just repeating the same url, aren't I, sigh)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] review of #6247

2015-09-11 Thread Nathaniel Smith
Please submit this to a pull request as well :-)
On Sep 11, 2015 21:56,  wrote:

> On Sun, 6 Sep 2015 10:03:16 PM pi...@netspace.net.au wrote:
> > On Sat, 5 Sep 2015 10:28:26 PM pi...@netspace.net.au wrote:
> > > Hi all, first attempt at this:
> > > small doco patch to
> > > address https://github.com/numpy/numpy/issues/6247
> > >
> https://github.com/numpy/numpy/compare/master...pizzathief:issue6247?diff=
> > > un ified=1=issue6247
> >
> > Fixed up next sentence as well.
> >
> >
> https://github.com/numpy/numpy/compare/master...pizzathief:issue6247?diff=un
> > ified=1=issue6247
>
> it was pointed out to me that numpy/lib/npyio.py needed updating as well.
>
>
> https://github.com/numpy/numpy/compare/master...pizzathief:issue6247?diff=unified=1=issue6247
> (I'm just repeating the same url, aren't I, sigh)
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] review of #6191

2015-09-11 Thread jason
doco patch to address https://github.com/numpy/numpy/issues/6191

https://github.com/numpy/numpy/compare/master...pizzathief:issue6191

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] review of #6191

2015-09-11 Thread Nathaniel Smith
Can you submit this as a pull request? That's how we normally review
patches.
On Sep 11, 2015 21:59, "jason"  wrote:

> doco patch to address https://github.com/numpy/numpy/issues/6191
>
> https://github.com/numpy/numpy/compare/master...pizzathief:issue6191
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] feature request - increment counter on write check

2015-09-11 Thread Daniel Manson
Originally posted as issue 6301  on
github.

Presumably any block of code that modifies an ndarray's buffer is wrapped
in a (thread safe?) check of the writable flag. Would it be possible to
hold a counter rather than a simple bool flag and then increment the
counter whenever you test the flag? Hopefully this would introduce only a
tiny additional overhead, but would permit pseudo-hashing to test whether a
mutable array has changed since you last encountered it.

Ideally this should be exposed a bit like python's __hash__
method,
lets say __mutablehash__, meaning a hash is returned but be warned that the
object is mutable.  For an ndarray, X,  containing objects that themselves
have a __mutablehash__ method (e.g. other ndarrays, or some user object),
the X.__mutablehash__ method will need to do the full check over all
constituent objects, or simply return None.  Defining and API of this sort
would make it possible to - for example - let pandas DataFrames also
implement this interface.

In terms of usage cases, the one I was motivated by was imagining
improvements to the "variable explorer" in Spyder - roughly speaking, this
widget's job is to always display an up-to-date summary of variables in
current scope, e.g. currently it can show max/min and shape, but you could
imagine also showing graphical summaries of the contents of an ndarray.  If
the widget could cache summaries and check which arrays have really changed
it should be much faster/offer more features/be simpler internally.  Note
that pandas DataFrames are relevant here as an example of complex objects,
containing ndarrays, which would benefit from being able to have their
summaries cached.

A more common/general usage case would be as a check in some kind of
memoization process...

#simple example...
@memoize_please
def hasnans(x):
   return np.any(np.isnan(x))

# more complex example...
def convolve_fft(a,b, _cache={}):
   a_hash = mutablehash(a)
   b_hash = mutablehash(b)
   if a_hash not in _cache:
  _cache[a_hash] = fft(a)
  if b_hash not in _cache:
  _cache[b_hash] = fft(b)
  return ifft(_cache[a_hash] * _cache[b_hash])


A quick though on an implementation detail...

I'm not sure exactly how to deal with the counter overflowing: perhaps if
you treated counter==0 to mean not-writable (i.e. that would be the new
version of the old write flag) then you might get some uint-wraparound
checking for free (because when it wraps back around to zero the buffer
ends up becoming locked)?  Alternatively you could just say that no
guarantee is given of wraparound being caught..though that might seriously
impact on the range of possible uses.

In summary...
Hopefully the stuff needed to make __mutablehash__ work could be
implemented simply by adding a single extra operation to the write-check
(and maybe changing the footprint of the ndarray slightly to accomodate a
counter).  But I suspect someone will tell me that life is never that
simple!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] feature request - increment counter on write check

2015-09-11 Thread Sebastian Berg
On Fr, 2015-09-11 at 13:10 +, Daniel Manson wrote:
> Originally posted as issue 6301 on github.
> 
> 
> Presumably any block of code that modifies an ndarray's buffer is
> wrapped in a (thread safe?) check of the writable flag. Would it be
> possible to hold a counter rather than a simple bool flag and then
> increment the counter whenever you test the flag? Hopefully this would
> introduce only a tiny additional overhead, but would permit
> pseudo-hashing to test whether a mutable array has changed since you
> last encountered it.
>   

Just a quick note. This is a bit more complex then it might appear. The
reason being that when a view of the array is changed, you would have to
"notify" the array itself that it has changed. So propagation from top
to bottom does not seem straight forward to me. (the other way is fine,
since on check you could check all parents, but you cannot check all
children).

- Sebastian


> Ideally this should be exposed a bit like python's __hash__ method,
> lets say __mutablehash__, meaning a hash is returned but be warned
> that the object is mutable.  For an ndarray, X,  containing objects
> that themselves have a __mutablehash__ method (e.g. other ndarrays, or
> some user object), the X.__mutablehash__ method will need to do the
> full check over all constituent objects, or simply return None.
> Defining and API of this sort would make it possible to - for example
> - let pandas DataFrames also implement this interface.
> 
> 
> In terms of usage cases, the one I was motivated by was imagining
> improvements to the "variable explorer" in Spyder - roughly speaking,
> this widget's job is to always display an up-to-date summary of
> variables in current scope, e.g. currently it can show max/min and
> shape, but you could imagine also showing graphical summaries of the
> contents of an ndarray.  If the widget could cache summaries and check
> which arrays have really changed it should be much faster/offer more
> features/be simpler internally.  Note that pandas DataFrames are
> relevant here as an example of complex objects, containing ndarrays,
> which would benefit from being able to have their summaries cached.
> 
> 
> A more common/general usage case would be as a check in some kind of
> memoization process...
> 
> 
> #simple example...
> @memoize_please
> def hasnans(x):
>return np.any(np.isnan(x))
> 
> 
> # more complex example...
> def convolve_fft(a,b, _cache={}):
>a_hash = mutablehash(a)
>b_hash = mutablehash(b)
>if a_hash not in _cache:
>   _cache[a_hash] = fft(a)
>   if b_hash not in _cache:
>   _cache[b_hash] = fft(b)
>   return ifft(_cache[a_hash] * _cache[b_hash])
> 
> 
> 
> 
> A quick though on an implementation detail...
>  
> I'm not sure exactly how to deal with the counter overflowing: perhaps
> if you treated counter==0 to mean not-writable (i.e. that would be the
> new version of the old write flag) then you might get some
> uint-wraparound checking for free (because when it wraps back around
> to zero the buffer ends up becoming locked)?  Alternatively you could
> just say that no guarantee is given of wraparound being caught..though
> that might seriously impact on the range of possible uses.
> 
> 
> In summary...
> Hopefully the stuff needed to make __mutablehash__ work could be
> implemented simply by adding a single extra operation to the
> write-check (and maybe changing the footprint of the ndarray slightly
> to accomodate a counter).  But I suspect someone will tell me that
> life is never that simple! 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion