Re: [Numpy-discussion] Datetime again
Sorry not to notice this for a while -- I've been distracted by python-ideas. (Nathaniel knows what I'm talking about ;-) ) I do like the idea of prototyping some DateTime stuff -- it really isn't clear what's needed or how to do it at this point. Though we did more or less settle on a reasonable minimum set last summer at SciPy (shame on me for not getting that written up properly!) Chuck -- what have you got in mind for new functionality here? I tend to agree with Nathaniel that a ndarray subclass is less than ideal -- they tend to get ugly fast. But maybe that is the only way to do anything in Python, short of a major refactor to be able to write a dtype in Python -- which would be great, but sure sounds like a major project to me. And as for The 64 bits of long long really isn't enough and leads to all sorts of compromises. not long enough for what? I've always thought that what we need is the ability to set the epoch. Does anyone ever need picoseconds since 100 years ago? And if they did, we'd be in a heck of a mess with leap seconds and all that anyway. Or is there a use-case I'm not thinking of? -Chris On Thu, Jan 22, 2015 at 12:58 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Jan 22, 2015 at 3:18 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, I'm playing with the idea of building a simplified datetime class on top of the current numpy implementation. I believe Pandas does something like this, and blaze will (does?) have a simplified version. The reason for the new class would be to have an easier, and hopefully more portable, API that can be implemented in Python, and maybe pushed down into C when things settle. When you say datetime class what do you mean? A dtype? An ndarray subclass? A python class representing a scalar datetime that you can put in an object array? ...? I was thinking an ndarray subclass that is based on a single datetime type, but part of the reason for this post is to elicit ideas. I'm influenced by Mark's discussion apropos blaze. I thought it easier to start such a project in python, as it is far easier for people interested in the problem to work with. And if I had my druthers, it would use quad precision floating point at it's heart. The 64 bits of long long really isn't enough and leads to all sorts of compromises. But that is probably a pipe dream. I guess there are lots of options -- e.g. 32-bit day + 64-bit time-of-day (I think that would give 11.8 million years at 10-femtisecond precision?). Figuring out which clock this is on matters a lot more though (e.g. how to handle leap-seconds in absolute and relative times -- is adding 1 day always the same as adding 24 * 60 * 60 seconds?). At a very general level, I feel like numpy-qua-numpy's role here shouldn't be to try and add special code to handle any one specific datetime implementation: that hasn't worked out terribly well historically, and as referenced above there's a *ton* of plausible ways of approaching datetime handling that people might want, so we don't want to be in the position of having to pick the-one-and-only implementation. Telling people who want to tweak datetime handling that they have to start mucking around in umath.so is terrible. Instead, we should be trying to evolve numpy to add generic functionality, so that it's prepared to handle multiple third-party approaches to date-time handling (among other things). Implementing prototypes built on top of numpy could be an excellent way to generate ideas for appropriate changes to the numpy core. As far as this specific prototype, I should say that I'm dubious that subclassing ndarray is actually a *good* long-term solution. I really think that the *right* way to solve this would be to improve the dtype system so we could define useful date/time types that worked with plain vanilla ndarrays. But that approach requires a lot more up-front C coding; it's harder to throw together a quick prototype. OTOOH if your goal is the moon then you don't want to waste time investing in ladder technology... so I dunno. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
Re: [Numpy-discussion] Datetime again
On Wed, Jan 28, 2015 at 5:13 PM, Chris Barker chris.bar...@noaa.gov wrote: I tend to agree with Nathaniel that a ndarray subclass is less than ideal -- they tend to get ugly fast. But maybe that is the only way to do anything in Python, short of a major refactor to be able to write a dtype in Python -- which would be great, but sure sounds like a major project to me. My vote would be for using composition rather than inheritance. So DatetimeArray should contain but not be an ndarray, making use of appropriate APIs like __array__, __array_wrap__ and __numpy_ufunc__. And as for The 64 bits of long long really isn't enough and leads to all sorts of compromises. not long enough for what? I've always thought that what we need is the ability to set the epoch. Does anyone ever need picoseconds since 100 years ago? And if they did, we'd be in a heck of a mess with leap seconds and all that anyway. I agree pretty strongly with the Blaze docs with respect to time units. I think fixed precision int64 is probably OK (simplifying things quite a bit), but the ns precision chosen by pandas was probably a mistake (not a big enough range). The main advantage of using a single array for the underlying data is that it's very straightforward to drop in a Cython or Numba or whatever for performance critical steps. In my mind, the main advantage of using floating point math is that NaT (not a time) becomes much easier to represent and work with -- you can share map it to NaN. Handling NaT is a major source of complexity for the datetime operations in pandas. The other thing to consider is how much progress has been made on the datetime dype in DyND, which is where the numpy replacement part of Blaze has ended up. I know some sort of datetime object *has* been implemented, though from my tests it does not really appear to be in fully working condition at this point (e.g., there does not appear to be a corresponding timedelta time): https://github.com/libdynd/dynd-python Stephan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Datetime again
On Wed, Jan 28, 2015 at 6:13 PM, Chris Barker chris.bar...@noaa.gov wrote: Sorry not to notice this for a while -- I've been distracted by python-ideas. (Nathaniel knows what I'm talking about ;-) ) I do like the idea of prototyping some DateTime stuff -- it really isn't clear what's needed or how to do it at this point. Though we did more or less settle on a reasonable minimum set last summer at SciPy (shame on me for not getting that written up properly!) Chuck -- what have you got in mind for new functionality here? I tend to agree with Nathaniel that a ndarray subclass is less than ideal -- they tend to get ugly fast. But maybe that is the only way to do anything in Python, short of a major refactor to be able to write a dtype in Python -- which would be great, but sure sounds like a major project to me. I was mostly thinking of implementing a Blaze compatible API without having to rewrite the numpy datetime stuff. But also, I thought it might be an easy way to solve some of our problems, or at least experiment. And as for The 64 bits of long long really isn't enough and leads to all sorts of compromises. not long enough for what? I've always thought that what we need is the ability to set the epoch. Does anyone ever need picoseconds since 100 years ago? And if they did, we'd be in a heck of a mess with leap seconds and all that anyway. I was thinking elapsed time. Nanoseconds can be rather crude for that depending on the measurement. Of course, such short times aren't going to come from the system clock, but data collected in other ways, interference between light pulses over microscopic distances for instance. Such data is likely acquired as, or computed, from simple numbers with a unit, which gets us back to the numpy version. But that complicates the heck out of things when you want to start adding times in different units. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Datetime again
On 2015/01/28 6:29 PM, Charles R Harris wrote: And as for The 64 bits of long long really isn't enough and leads to all sorts of compromises. not long enough for what? I've always thought that what we need is the ability to set the epoch. Does anyone ever need picoseconds since 100 years ago? And if they did, we'd be in a heck of a mess with leap seconds and all that anyway. I was thinking elapsed time. Nanoseconds can be rather crude for that depending on the measurement. Of course, such short times aren't going to come from the system clock, but data collected in other ways, interference between light pulses over microscopic distances for instance. Such data is likely acquired as, or computed, from simple numbers with a unit, which gets us back to the numpy version. But that complicates the heck out of things when you want to start adding times in different units. Chuck, For any kind of data like that, I fail to see why any special numpy time type is needed at all. Wouldn't the user just keep elapsed time as a count, or floating point number, in whatever units the instrument spits out? Why does it need to be treated in a different way from any other numeric data? We don't have special types for length. It seems to me that numpy's present experimental datetime64 type has already fallen into the trap of overengineering--trying to be too many things to too many people. The main reason for having a special datetime type is to deal with the calendar mess, and conventional hours-minutes-seconds time. For very short time intervals, all that is irrelevant. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Datetime again
Hi All, I'm playing with the idea of building a simplified datetime class on top of the current numpy implementation. I believe Pandas does something like this, and blaze will (does?) have a simplified version. The reason for the new class would be to have an easier, and hopefully more portable, API that can be implemented in Python, and maybe pushed down into C when things settle. Thoughts? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Datetime again
On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, I'm playing with the idea of building a simplified datetime class on top of the current numpy implementation. I believe Pandas does something like this, and blaze will (does?) have a simplified version. The reason for the new class would be to have an easier, and hopefully more portable, API that can be implemented in Python, and maybe pushed down into C when things settle. When you say datetime class what do you mean? A dtype? An ndarray subclass? A python class representing a scalar datetime that you can put in an object array? ...? I was thinking an ndarray subclass that is based on a single datetime type, but part of the reason for this post is to elicit ideas. I'm influenced by Mark's discussion apropos blaze https://github.com/ContinuumIO/blaze/blob/master/docs/design/blaze-datetime.md. I thought it easier to start such a project in python, as it is far easier for people interested in the problem to work with. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Datetime again
On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, I'm playing with the idea of building a simplified datetime class on top of the current numpy implementation. I believe Pandas does something like this, and blaze will (does?) have a simplified version. The reason for the new class would be to have an easier, and hopefully more portable, API that can be implemented in Python, and maybe pushed down into C when things settle. When you say datetime class what do you mean? A dtype? An ndarray subclass? A python class representing a scalar datetime that you can put in an object array? ...? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Datetime again
On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, I'm playing with the idea of building a simplified datetime class on top of the current numpy implementation. I believe Pandas does something like this, and blaze will (does?) have a simplified version. The reason for the new class would be to have an easier, and hopefully more portable, API that can be implemented in Python, and maybe pushed down into C when things settle. When you say datetime class what do you mean? A dtype? An ndarray subclass? A python class representing a scalar datetime that you can put in an object array? ...? I was thinking an ndarray subclass that is based on a single datetime type, but part of the reason for this post is to elicit ideas. I'm influenced by Mark's discussion apropos blaze https://github.com/ContinuumIO/blaze/blob/master/docs/design/blaze-datetime.md. I thought it easier to start such a project in python, as it is far easier for people interested in the problem to work with. And if I had my druthers, it would use quad precision floating point at it's heart. The 64 bits of long long really isn't enough and leads to all sorts of compromises. But that is probably a pipe dream. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Datetime again
On Thu, Jan 22, 2015 at 3:18 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith n...@pobox.com wrote: On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, I'm playing with the idea of building a simplified datetime class on top of the current numpy implementation. I believe Pandas does something like this, and blaze will (does?) have a simplified version. The reason for the new class would be to have an easier, and hopefully more portable, API that can be implemented in Python, and maybe pushed down into C when things settle. When you say datetime class what do you mean? A dtype? An ndarray subclass? A python class representing a scalar datetime that you can put in an object array? ...? I was thinking an ndarray subclass that is based on a single datetime type, but part of the reason for this post is to elicit ideas. I'm influenced by Mark's discussion apropos blaze. I thought it easier to start such a project in python, as it is far easier for people interested in the problem to work with. And if I had my druthers, it would use quad precision floating point at it's heart. The 64 bits of long long really isn't enough and leads to all sorts of compromises. But that is probably a pipe dream. I guess there are lots of options -- e.g. 32-bit day + 64-bit time-of-day (I think that would give 11.8 million years at 10-femtisecond precision?). Figuring out which clock this is on matters a lot more though (e.g. how to handle leap-seconds in absolute and relative times -- is adding 1 day always the same as adding 24 * 60 * 60 seconds?). At a very general level, I feel like numpy-qua-numpy's role here shouldn't be to try and add special code to handle any one specific datetime implementation: that hasn't worked out terribly well historically, and as referenced above there's a *ton* of plausible ways of approaching datetime handling that people might want, so we don't want to be in the position of having to pick the-one-and-only implementation. Telling people who want to tweak datetime handling that they have to start mucking around in umath.so is terrible. Instead, we should be trying to evolve numpy to add generic functionality, so that it's prepared to handle multiple third-party approaches to date-time handling (among other things). Implementing prototypes built on top of numpy could be an excellent way to generate ideas for appropriate changes to the numpy core. As far as this specific prototype, I should say that I'm dubious that subclassing ndarray is actually a *good* long-term solution. I really think that the *right* way to solve this would be to improve the dtype system so we could define useful date/time types that worked with plain vanilla ndarrays. But that approach requires a lot more up-front C coding; it's harder to throw together a quick prototype. OTOOH if your goal is the moon then you don't want to waste time investing in ladder technology... so I dunno. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion