Re: [Numpy-discussion] Datetime again

2015-01-28 Thread Chris Barker
Sorry not to notice this for a while -- I've been distracted by
python-ideas. (Nathaniel knows what I'm talking about ;-) )

I do like the idea of prototyping some DateTime stuff -- it really  isn't
clear what's needed or how to do it at this point. Though we did more or
less settle on a reasonable minimum set last summer at SciPy (shame on me
for not getting that written up properly!)

Chuck -- what have you got in mind for new functionality here? I tend to
agree with Nathaniel that a ndarray subclass is less than ideal -- they
tend to get ugly fast. But maybe that is the only way to do anything in
Python, short of a major refactor to be able to write a dtype in Python --
which would be great, but sure sounds like a major project to me.

And as for  The 64 bits of long long really isn't enough and leads to all
sorts of compromises. not long enough for what? I've always thought that
what we need is the ability to set the epoch. Does anyone ever need
picoseconds since 100 years ago? And if they did, we'd be in a heck of a
mess with leap seconds and all that anyway.

Or is there a use-case I'm not thinking of?

-Chris


On Thu, Jan 22, 2015 at 12:58 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jan 22, 2015 at 3:18 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
  On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote:
 
  On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
   Hi All,
  
   I'm playing with the idea of building a simplified datetime class on
   top of
   the current numpy implementation. I believe Pandas does something
 like
   this,
   and blaze will (does?) have a simplified version. The reason for the
   new
   class would be to have an easier, and hopefully more portable, API
 that
   can
   be implemented in Python, and maybe pushed down into C when things
   settle.
 
  When you say datetime class what do you mean? A dtype? An ndarray
  subclass? A python class representing a scalar datetime that you can
  put in an object array? ...?
 
 
  I was thinking an ndarray subclass that is based on a single datetime
  type, but part of the reason for this post is to elicit ideas. I'm
  influenced by Mark's  discussion apropos blaze.  I thought it easier to
  start such a project in python, as it is far easier for people
 interested in
  the problem to work with.
 
 
  And if I had my druthers, it would use quad precision floating point at
 it's
  heart. The 64 bits of long long really isn't enough and leads to all
 sorts
  of compromises. But that is probably a pipe dream.

 I guess there are lots of options -- e.g. 32-bit day + 64-bit
 time-of-day (I think that would give 11.8 million years at
 10-femtisecond precision?). Figuring out which clock this is on
 matters a lot more though (e.g. how to handle leap-seconds in absolute
 and relative times -- is adding 1 day always the same as adding 24 *
 60 * 60 seconds?).

 At a very general level, I feel like numpy-qua-numpy's role here
 shouldn't be to try and add special code to handle any one specific
 datetime implementation: that hasn't worked out terribly well
 historically, and as referenced above there's a *ton* of plausible
 ways of approaching datetime handling that people might want, so we
 don't want to be in the position of having to pick the-one-and-only
 implementation. Telling people who want to tweak datetime handling
 that they have to start mucking around in umath.so is terrible.

 Instead, we should be trying to evolve numpy to add generic
 functionality, so that it's prepared to handle multiple third-party
 approaches to date-time handling (among other things).

 Implementing prototypes built on top of numpy could be an excellent
 way to generate ideas for appropriate changes to the numpy core.

 As far as this specific prototype, I should say that I'm dubious that
 subclassing ndarray is actually a *good* long-term solution. I really
 think that the *right* way to solve this would be to improve the dtype
 system so we could define useful date/time types that worked with
 plain vanilla ndarrays. But that approach requires a lot more up-front
 C coding; it's harder to throw together a quick prototype. OTOOH if
 your goal is the moon then you don't want to waste time investing in
 ladder technology... so I dunno.

 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov

Re: [Numpy-discussion] Datetime again

2015-01-28 Thread Stephan Hoyer
On Wed, Jan 28, 2015 at 5:13 PM, Chris Barker chris.bar...@noaa.gov wrote:

 I tend to agree with Nathaniel that a ndarray subclass is less than ideal
 -- they tend to get ugly fast. But maybe that is the only way to do
 anything in Python, short of a major refactor to be able to write a dtype
 in Python -- which would be great, but sure sounds like a major project to
 me.


My vote would be for using composition rather than inheritance. So
DatetimeArray should contain but not be an ndarray, making use of
appropriate APIs like __array__, __array_wrap__ and __numpy_ufunc__.

And as for  The 64 bits of long long really isn't enough and leads to all
 sorts of compromises. not long enough for what? I've always thought that
 what we need is the ability to set the epoch. Does anyone ever need
 picoseconds since 100 years ago? And if they did, we'd be in a heck of a
 mess with leap seconds and all that anyway.


I agree pretty strongly with the Blaze docs with respect to time units. I
think fixed precision int64 is probably OK (simplifying things quite a
bit), but the ns precision chosen by pandas was probably a mistake (not a
big enough range). The main advantage of using a single array for the
underlying data is that it's very straightforward to drop in a Cython or
Numba or whatever for performance critical steps.

In my mind, the main advantage of using floating point math is that NaT
(not a time) becomes much easier to represent and work with -- you can
share map it to NaN. Handling NaT is a major source of complexity for the
datetime operations in pandas.

The other thing to consider is how much progress has been made on the
datetime dype in DyND, which is where the numpy replacement part of Blaze
has ended up. I know some sort of datetime object *has* been implemented,
though from my tests it does not really appear to be in fully working
condition at this point (e.g., there does not appear to be a corresponding
timedelta time):
https://github.com/libdynd/dynd-python

Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datetime again

2015-01-28 Thread Charles R Harris
On Wed, Jan 28, 2015 at 6:13 PM, Chris Barker chris.bar...@noaa.gov wrote:

 Sorry not to notice this for a while -- I've been distracted by
 python-ideas. (Nathaniel knows what I'm talking about ;-) )

 I do like the idea of prototyping some DateTime stuff -- it really  isn't
 clear what's needed or how to do it at this point. Though we did more or
 less settle on a reasonable minimum set last summer at SciPy (shame on me
 for not getting that written up properly!)

 Chuck -- what have you got in mind for new functionality here? I tend to
 agree with Nathaniel that a ndarray subclass is less than ideal -- they
 tend to get ugly fast. But maybe that is the only way to do anything in
 Python, short of a major refactor to be able to write a dtype in Python --
 which would be great, but sure sounds like a major project to me.


I was mostly thinking of implementing a Blaze compatible API without having
to rewrite the numpy datetime stuff. But also, I thought it might be an
easy way to solve some of our problems, or at least experiment.



 And as for  The 64 bits of long long really isn't enough and leads to
 all sorts of compromises. not long enough for what? I've always thought
 that what we need is the ability to set the epoch. Does anyone ever need
 picoseconds since 100 years ago? And if they did, we'd be in a heck of a
 mess with leap seconds and all that anyway.


I was thinking elapsed time. Nanoseconds can be rather crude for that
depending on the measurement. Of course, such short times aren't going to
come from the system clock, but data collected in other ways, interference
between light pulses over microscopic distances for instance. Such data is
likely acquired as, or computed, from simple numbers with a unit, which
gets us back to the numpy version. But that complicates the heck out of
things when you want to start adding times in different units.

snip

Chuck



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datetime again

2015-01-28 Thread Eric Firing
On 2015/01/28 6:29 PM, Charles R Harris wrote:


 And as for The 64 bits of long long really isn't enough and leads
 to all sorts of compromises. not long enough for what? I've always
 thought that what we need is the ability to set the epoch. Does
 anyone ever need picoseconds since 100 years ago? And if they did,
 we'd be in a heck of a mess with leap seconds and all that anyway.


 I was thinking elapsed time. Nanoseconds can be rather crude for that
 depending on the measurement. Of course, such short times aren't going
 to come from the system clock, but data collected in other ways,
 interference between light pulses over microscopic distances for
 instance. Such data is likely acquired as, or computed, from simple
 numbers with a unit, which gets us back to the numpy version. But that
 complicates the heck out of things when you want to start adding times
 in different units.

Chuck,

For any kind of data like that, I fail to see why any special numpy time 
type is needed at all.  Wouldn't the user just keep elapsed time as a 
count, or floating point number, in whatever units the instrument spits 
out?  Why does it need to be treated in a different way from any other 
numeric data?  We don't have special types for length. It seems to me 
that numpy's present experimental datetime64 type has already fallen 
into the trap of overengineering--trying to be too many things to too 
many people.  The main reason for having a special datetime type is to 
deal with the calendar mess, and conventional hours-minutes-seconds 
time.  For very short time intervals, all that is irrelevant.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Datetime again

2015-01-22 Thread Charles R Harris
Hi All,

I'm playing with the idea of building a simplified datetime class on top of
the current numpy implementation. I believe Pandas does something like
this, and blaze will (does?) have a simplified version. The reason for the
new class would be to have an easier, and hopefully more portable, API that
can be implemented in Python, and maybe pushed down into C when things
settle.

Thoughts?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datetime again

2015-01-22 Thread Charles R Harris
On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  I'm playing with the idea of building a simplified datetime class on top
 of
  the current numpy implementation. I believe Pandas does something like
 this,
  and blaze will (does?) have a simplified version. The reason for the new
  class would be to have an easier, and hopefully more portable, API that
 can
  be implemented in Python, and maybe pushed down into C when things
 settle.

 When you say datetime class what do you mean? A dtype? An ndarray
 subclass? A python class representing a scalar datetime that you can
 put in an object array? ...?


I was thinking an ndarray subclass that is based on a single datetime type,
but part of the reason for this post is to elicit ideas. I'm influenced by
Mark's  discussion apropos blaze
https://github.com/ContinuumIO/blaze/blob/master/docs/design/blaze-datetime.md.
I thought it easier to start such a project in python, as it is far easier
for people interested in the problem to work with.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datetime again

2015-01-22 Thread Nathaniel Smith
On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Hi All,

 I'm playing with the idea of building a simplified datetime class on top of
 the current numpy implementation. I believe Pandas does something like this,
 and blaze will (does?) have a simplified version. The reason for the new
 class would be to have an easier, and hopefully more portable, API that can
 be implemented in Python, and maybe pushed down into C when things settle.

When you say datetime class what do you mean? A dtype? An ndarray
subclass? A python class representing a scalar datetime that you can
put in an object array? ...?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datetime again

2015-01-22 Thread Charles R Harris
On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  I'm playing with the idea of building a simplified datetime class on
 top of
  the current numpy implementation. I believe Pandas does something like
 this,
  and blaze will (does?) have a simplified version. The reason for the new
  class would be to have an easier, and hopefully more portable, API that
 can
  be implemented in Python, and maybe pushed down into C when things
 settle.

 When you say datetime class what do you mean? A dtype? An ndarray
 subclass? A python class representing a scalar datetime that you can
 put in an object array? ...?


 I was thinking an ndarray subclass that is based on a single datetime
 type, but part of the reason for this post is to elicit ideas. I'm
 influenced by Mark's  discussion apropos blaze
 https://github.com/ContinuumIO/blaze/blob/master/docs/design/blaze-datetime.md.
 I thought it easier to start such a project in python, as it is far easier
 for people interested in the problem to work with.


And if I had my druthers, it would use quad precision floating point at
it's heart. The 64 bits of long long really isn't enough and leads to all
sorts of compromises. But that is probably a pipe dream.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datetime again

2015-01-22 Thread Nathaniel Smith
On Thu, Jan 22, 2015 at 3:18 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  I'm playing with the idea of building a simplified datetime class on
  top of
  the current numpy implementation. I believe Pandas does something like
  this,
  and blaze will (does?) have a simplified version. The reason for the
  new
  class would be to have an easier, and hopefully more portable, API that
  can
  be implemented in Python, and maybe pushed down into C when things
  settle.

 When you say datetime class what do you mean? A dtype? An ndarray
 subclass? A python class representing a scalar datetime that you can
 put in an object array? ...?


 I was thinking an ndarray subclass that is based on a single datetime
 type, but part of the reason for this post is to elicit ideas. I'm
 influenced by Mark's  discussion apropos blaze.  I thought it easier to
 start such a project in python, as it is far easier for people interested in
 the problem to work with.


 And if I had my druthers, it would use quad precision floating point at it's
 heart. The 64 bits of long long really isn't enough and leads to all sorts
 of compromises. But that is probably a pipe dream.

I guess there are lots of options -- e.g. 32-bit day + 64-bit
time-of-day (I think that would give 11.8 million years at
10-femtisecond precision?). Figuring out which clock this is on
matters a lot more though (e.g. how to handle leap-seconds in absolute
and relative times -- is adding 1 day always the same as adding 24 *
60 * 60 seconds?).

At a very general level, I feel like numpy-qua-numpy's role here
shouldn't be to try and add special code to handle any one specific
datetime implementation: that hasn't worked out terribly well
historically, and as referenced above there's a *ton* of plausible
ways of approaching datetime handling that people might want, so we
don't want to be in the position of having to pick the-one-and-only
implementation. Telling people who want to tweak datetime handling
that they have to start mucking around in umath.so is terrible.

Instead, we should be trying to evolve numpy to add generic
functionality, so that it's prepared to handle multiple third-party
approaches to date-time handling (among other things).

Implementing prototypes built on top of numpy could be an excellent
way to generate ideas for appropriate changes to the numpy core.

As far as this specific prototype, I should say that I'm dubious that
subclassing ndarray is actually a *good* long-term solution. I really
think that the *right* way to solve this would be to improve the dtype
system so we could define useful date/time types that worked with
plain vanilla ndarrays. But that approach requires a lot more up-front
C coding; it's harder to throw together a quick prototype. OTOOH if
your goal is the moon then you don't want to waste time investing in
ladder technology... so I dunno.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion