Noam Yorav-Raphael wrote
> The solution is simple, and is what datetime64 used to do before the
> change
> - have a type that just represents a moment in time. It's not "in UTC" -
> it
> just stores the number of seconds that passed since an agreed moment in
> time (which is usually 1970-01-01 02:00+0200, which is more commonly
> referred to as 1970-01-01 00:00Z - it's the exact same moment).

I agree with this.  I understand the issue of parsing arbitrary timestamps
with incomplete information, however it's not clear to me why it has become
more difficult to work with ISO 8601 timestamps.  For example, we use
numpy.genfromtxt to load an array with UTC offset timestamps e.g.
`2020-08-19T12:42:57.7903616-04:00`.  If loading this array took 0.0352s
without having to convert, it now takes 0.8615s with the following
converter:

>>> lambda x:
>>> dateutil.parser.parse(x).astimezone(timezone.utc).replace(tzinfo=None)

That's a huge performance hit to do something that should be considered a
standard operation, namely loading ISO compliant data.  There may be more
efficient converters out there but it seems strange to employ an external
function to remove precision from an ISO datatype.  As an aside, with or
without the converter, numpy.genfromtxt is consistently faster than
numpy.loadtxt, despite the documentation stating otherwise.

I feel there's a lack of guidance in the documentation on this issue.  In
most threads I've encountered on this the first recommendation is to use
pandas.  The most effective way to crack a nut should not be to use a
sledgehammer.  The purpose of introducing standards should be to make these
sorts of operations trivial and efficient.  Perhaps I'm missing the solution
here...



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to