Re: [Format] Timestamp timezone semantics?

2021-06-04 Thread Wes McKinney
Sorry, I definitely did NOT mean "Python functions treat a naive timestamp as if it were a UTC timestamp." I am referring to the relationship between the behavior of attribute accessors like "hour" or "day" and the representation of the data. datetime.datetime.hour returns the same thing for the

Re: [Format] Timestamp timezone semantics?

2021-06-04 Thread Weston Pace
> We are recommending that the behavior of > these functions should consistently have the UTC interpretation of the > value rather than using the system locale. This is what Python does > with "tz-naive" datetime.datetime objects This is not quite true, although perhaps my reading is incorrect.

Re: [Format] Timestamp timezone semantics?

2021-06-04 Thread Julian Hyde
The learning there is: library software shouldn’t use anything from its environment (time zone, locale, encoding, endianness). Functions that use time zone should always have a time zone parameter. Once you take that step, the functions that work with zoneless timestamps start to look

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Wes McKinney
Arrow's decision was not to permit storage of timestamps with "localized" representation (which is distinct from UTC internal representation with a different time zone set). The problem really comes down to the interpretation of "time zone naive" timestamps on different systems: operations in my

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Julian Hyde
It seems that Arrow’s timestamp type can either have no time zone or be UTC. I think that is a flawed design, because doesn’t catch user errors. Suppose you want to find the number of milliseconds between two timestamps. If the first has a timezone and the second is implicitly UTC, then you can

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Adam Hooper
On Thu, Jun 3, 2021 at 2:02 PM Adam Hooper wrote: > I understand isAdjustedToUTC=true to mean "timestamp", and > isAdjustedToUTC=false to mean, "int64 and I hope somebody attached some > docs because >

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Adam Hooper
On Thu, Jun 3, 2021 at 1:17 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > That is my understanding as well, a timestamp either has a timezone or it > has not. If it does not have a timezone, it should be presented as is and > no assumptions can be made about its timezone. In

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Jorge Cardoso Leitão
That is my understanding as well, a timestamp either has a timezone or it has not. If it does not have a timezone, it should be presented as is and no assumptions can be made about its timezone. In particular, but given two fields X and Y, one with a timezone and another without, e.g. it is not

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Julian Hyde
My answer to Antoine’s question would not be “kind of”, it would be “no”. In a system such as Joda-time, which I claim is the only system that Arrow should be considering, a timestamp-without-timezone does not have an implicit time zone of UTC. It has no time zone. > On Jun 3, 2021, at 8:52

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Micah Kornfield
> > Aren't those exactly the same (i.e. no timezone implicitly means UTC, > not local time)? Kind of, the reason we went with this approach is this sentence from the specification: "the data is "time zone naive" and shall be displayed *as is* to the user, not localized to the locale of the

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Hongze Zhang
On Wed, 2021-06-02 at 13:56 -0700, Micah Kornfield wrote: > > > > Any SQL interface to Arrow should follow the SQL standard. So, for > > instance, if a column has TIMESTAMP type, it should behave as a > > date-time without a time-zone. > > > At least in bigquery we do the following mapping: >

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Antoine Pitrou
Le 02/06/2021 à 22:56, Micah Kornfield a écrit : Any SQL interface to Arrow should follow the SQL standard. So, for instance, if a column has TIMESTAMP type, it should behave as a date-time without a time-zone. At least in bigquery we do the following mapping: SQL TIMESTAMP -> Arrow

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Julian Hyde
> On Jun 2, 2021, at 1:56 PM, Micah Kornfield wrote: > > > At least in bigquery we do the following mapping: > SQL TIMESTAMP -> Arrow Timestamp with "UTC" timezone > SQL DATETIME -> Arrow Timestamp without a time-zone. BigQuery was one of the systems I had in mind when I said "naming is a

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Micah Kornfield
> > Any SQL interface to Arrow should follow the SQL standard. So, for > instance, if a column has TIMESTAMP type, it should behave as a > date-time without a time-zone. At least in bigquery we do the following mapping: SQL TIMESTAMP -> Arrow Timestamp with "UTC" timezone SQL DATETIME -> Arrow

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Julian Hyde
Good time libraries support all. E.g. Jodatime [1] has * Instant - an instantaneous point on the time-line * DateTime - full date and time with time-zone * LocalDateTime - date-time without a time-zone The SQL world isn't quite as much of a mess as Adam makes it out to be. The SQL standard

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Rok Mihevc
On Wed, Jun 2, 2021 at 3:23 PM Joris Peeters wrote: > You could store epoch offsets, but interpret them in the local timezone. > E.g. (0, "America/New_York") could mean 1970-01-01 00:00:00 in the New York > timezone. > At least one nasty problem with that is ambiguous times, i.e. when the >

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Adam Hooper
On Wed, Jun 2, 2021 at 7:56 AM Antoine Pitrou wrote: > > This seems rather weird to me: timestamps always convey a UTC timestamp > value, optionally decorated with a local timezone? What is the > motivation for such a representation? It is unlike other systems such > as Python It's standard.

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Joris Peeters
You could store epoch offsets, but interpret them in the local timezone. E.g. (0, "America/New_York") could mean 1970-01-01 00:00:00 in the New York timezone. At least one nasty problem with that is ambiguous times, i.e. when the clock turns back on going from DST to ST, as well as invalid times

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Antoine Pitrou
Le 02/06/2021 à 14:58, Joris Van den Bossche a écrit : On Wed, 2 Jun 2021 at 13:56, Antoine Pitrou wrote: Hello, For the first time I notice this piece of information about the timestamp type: /// * If the time zone is set to a valid value, values can be displayed as ///

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Joris Van den Bossche
On Wed, 2 Jun 2021 at 13:56, Antoine Pitrou wrote: > > Hello, > > For the first time I notice this piece of information about the > timestamp type: > > > /// * If the time zone is set to a valid value, values can be > displayed as > > /// "localized" to that time zone, even though the

[Format] Timestamp timezone semantics?

2021-06-02 Thread Antoine Pitrou
Hello, For the first time I notice this piece of information about the timestamp type: /// * If the time zone is set to a valid value, values can be displayed as /// "localized" to that time zone, even though the underlying 64-bit /// integers are identical to the same data