Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-11 Thread Martin Maechler
> Ben Bolker 
> on Mon, 10 Oct 2022 16:59:35 -0400 writes:

> Right now as.POSIXlt.Date() is just
> function (x, ...)
> .Internal(Date2POSIXlt(x))

It has been quite a bit different in R-devel  for a little
while.  NEWS entries  (there are more already, and more coming
on the wide topic)

* The as.POSIXlt() and as.POSIXct() default
  methods now do obey their tz argument, also in this case.

* as.POSIXlt() now does apply a tz (timezone) argument, as
  does as.POSIXct(); partly suggested by Roland Fuss on the R-devel
  mailing list.

and indeed it would have been good had you used (and read) the
R-devel mailing list  which is much more appropriate on the
topic of *changing* base R behavior.




> How expensive would it be to throw a warning when '...' is provided by 
> the user/discarded ??

> Alternately, perhaps the documentation could be amended, although I'm 
> not quite sure what to suggest. (The sentence Liam refers to, "Dates 
> without times are treated as being at midnight UTC." is correct but 
> terse ...)


> On 2022-10-10 4:50 p.m., Alexandre Courtiol wrote:
>> Hi Simon,
>> 
>> Thanks for the clarification.
>> 
>> From a naive developer point of view, we were initially baffled that the
>> generic as.POSIXlt() does very different things on a character and on a
>> Date input:
>> 
>> as.POSIXlt(as.character(foo), "Europe/Berlin")
>> [1] "1992-09-27 CEST"
>> 
>> as.POSIXlt(foo, "Europe/Berlin")
>> [1] "1992-09-27 UTC"
>> 
>> Based on what you said, it does make sense: it is only when creating the
>> date/time that we want to include the time zone and that only happens 
when
>> we don't already work on a previously created date.
>> That is your subtle but spot-on distinction between "parsing" and
>> "changing" the time zone.
>> 
>> Yet, we do find it dangerous that as.POSIXlt.Date() accepts a time zone 
but
>> does nothing of it, especially when the help file starts with:
>> 
>> Usage
>> as.POSIXlt(x, tz = "", ...)
>> 
>> The behaviour is documented, as Liam reported it, but still, we will 
almost
>> certainly not be the last one tripping on this (without even adding the
>> additional issue of as.POSIXct() behaving differently across OS).
>> 
>> Thanks again,
>> 
>> Alex & Liam
>> 
>> 
>> 
>> 
>> On Mon, 10 Oct 2022 at 22:13, Simon Urbanek 
>> wrote:
>> 
>>> Liam,
>>> 
>>> I think I have failed to convey my main point in the last e-mail - which
>>> was that you want to parse the date/time in the timezone that you care
>>> about so in your example that would be
>>> 
 foo <- as.Date(33874, origin = "1899-12-30")
 foo
>>> [1] "1992-09-27"
 as.POSIXlt(as.character(foo), "Europe/Berlin")
>>> [1] "1992-09-27 CEST"
>>> 
>>> I was explicitly saying that you do NOT want to simply change the time
>>> zone on POSIXlt objects as that won't work for reasons I explained - 
see my
>>> last e-mail.
>>> 
>>> Cheers,
>>> Simon
>>> 
>>> 
 On 11/10/2022, at 6:31 AM, Liam Bailey 
>>> wrote:
 
 Hi all,
 
 Thanks Simon for the detailed response, that helps us understand a lot
>>> better what’s going on! However, with your response in mind, we still
>>> encounter some behaviour that we did not expect.
 
 I’ve included another minimum reproducible example below to expand on
>>> the situation. In this example, `foo` is a Date object that we generate
>>> from a numeric input. Following your advice, `bar` is then a POSIXlt 
object
>>> where we now explicitly define timezone using argument tz. However, even
>>> though we are explicit about the timezone the POSIXlt that is generated 
is
>>> always in UTC. This then leads to the issues outlined by Alexandre 
above,
>>> which we now understand are caused by DST.
 
 ``` r
 #Generate date from numeric
 #Not possible to specify tz at this point
 foo <- as.Date(33874, origin = "1899-12-30")
 dput(foo)
 #> structure(8305, class = "Date")
 
 #Convert to POSIXlt specifying UTC timezone
 bar <- as.POSIXlt(foo, tz = "UTC")
 dput(bar)
 #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
 #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =
>>> c("POSIXlt",
 #> "POSIXt"), tzone = "UTC")
 
 #Convert to POSIXlt specifying Europe/Berlin.
 #Time zone is still UTC
 bar <- as.POSIXlt(foo, tz = "Europe/Berlin")
 dput(bar)
 #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
 #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =
>>> c("POSIXlt",
 #> 

Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-10 Thread Jeff Newmiller
I have no idea how to get readxl::read_excel to import a timestamp column in a 
timezone. It is true that Excel has no concept of timezones, but the data one 
finds there usually came from a text file at some point. Importing as character 
is a feasible strategy, but trying to convince an intermediate user to go to 
that much trouble is a headache when the issue is ignored in the help file.

It is evidently possible to specify a locale input to readr::read_csv, but the 
default behaviour guesses timestamp columns and assumes "UTC", and a file may 
contain data from different timezones (UTC and local civil are a common 
combination). Again, character import and manual conversion are needed.

On October 10, 2022 9:40:42 AM PDT, Hadley Wickham  wrote:
>On Sun, Oct 9, 2022 at 9:31 PM Jeff Newmiller  wrote:
>>
>> ... which is why tidyverse functions and Python datetime handling irk me so 
>> much.
>>
>> Is tidyverse time handling intrinsically broken? They have a standard 
>> practice of reading time as UTC and then using force_tz to fix the 
>> "mistake". Same as Python.
>
>Can you point to any docs that lead you to this conclusion so we can
>get them fixed? I strongly encourage people to parse date-times in the
>correct time zone; this is why lubridate::ymd_hms() and friends have a
>tz argument.
>
>Hadley
>

-- 
Sent from my phone. Please excuse my brevity.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-10 Thread Ben Bolker

  Right now as.POSIXlt.Date() is just

function (x, ...)
.Internal(Date2POSIXlt(x))

How expensive would it be to throw a warning when '...' is provided by 
the user/discarded ??


Alternately, perhaps the documentation could be amended, although I'm 
not quite sure what to suggest. (The sentence Liam refers to, "Dates 
without times are treated as being at midnight UTC." is correct but 
terse ...)



On 2022-10-10 4:50 p.m., Alexandre Courtiol wrote:

Hi Simon,

Thanks for the clarification.

 From a naive developer point of view, we were initially baffled that the
generic as.POSIXlt() does very different things on a character and on a
Date input:

as.POSIXlt(as.character(foo), "Europe/Berlin")
[1] "1992-09-27 CEST"

as.POSIXlt(foo, "Europe/Berlin")
[1] "1992-09-27 UTC"

Based on what you said, it does make sense: it is only when creating the
date/time that we want to include the time zone and that only happens when
we don't already work on a previously created date.
That is your subtle but spot-on distinction between "parsing" and
"changing" the time zone.

Yet, we do find it dangerous that as.POSIXlt.Date() accepts a time zone but
does nothing of it, especially when the help file starts with:

Usage
as.POSIXlt(x, tz = "", ...)

The behaviour is documented, as Liam reported it, but still, we will almost
certainly not be the last one tripping on this (without even adding the
additional issue of as.POSIXct() behaving differently across OS).

Thanks again,

Alex & Liam




On Mon, 10 Oct 2022 at 22:13, Simon Urbanek 
wrote:


Liam,

I think I have failed to convey my main point in the last e-mail - which
was that you want to parse the date/time in the timezone that you care
about so in your example that would be


foo <- as.Date(33874, origin = "1899-12-30")
foo

[1] "1992-09-27"

as.POSIXlt(as.character(foo), "Europe/Berlin")

[1] "1992-09-27 CEST"

I was explicitly saying that you do NOT want to simply change the time
zone on POSIXlt objects as that won't work for reasons I explained - see my
last e-mail.

Cheers,
Simon



On 11/10/2022, at 6:31 AM, Liam Bailey 

wrote:


Hi all,

Thanks Simon for the detailed response, that helps us understand a lot

better what’s going on! However, with your response in mind, we still
encounter some behaviour that we did not expect.


I’ve included another minimum reproducible example below to expand on

the situation. In this example, `foo` is a Date object that we generate
from a numeric input. Following your advice, `bar` is then a POSIXlt object
where we now explicitly define timezone using argument tz. However, even
though we are explicit about the timezone the POSIXlt that is generated is
always in UTC. This then leads to the issues outlined by Alexandre above,
which we now understand are caused by DST.


``` r
#Generate date from numeric
 #Not possible to specify tz at this point
 foo <- as.Date(33874, origin = "1899-12-30")
 dput(foo)
#> structure(8305, class = "Date")

 #Convert to POSIXlt specifying UTC timezone
 bar <- as.POSIXlt(foo, tz = "UTC")
 dput(bar)
#> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
#> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =

c("POSIXlt",

#> "POSIXt"), tzone = "UTC")

 #Convert to POSIXlt specifying Europe/Berlin.
 #Time zone is still UTC
 bar <- as.POSIXlt(foo, tz = "Europe/Berlin")
 dput(bar)
#> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
#> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =

c("POSIXlt",

#> "POSIXt"), tzone = "UTC")
```


We noticed that this occurs because the tz argument is not passed to

`.Internal(Date2POSIXlt())` inside `as.POSIXlt.Date()`.


Reading through the documentation for `as.POSIX*` we can see that this

behaviour is described:


   > “Dates without times are treated as being at midnight UTC.”

In this case, if we want to convert a Date object to POSIX* and specify

a (non-UTC) timezone would the best strategy be to first coerce our Date
object to character? Alternatively, `lubridate::as_datetime()` does seem to
recognise the tz argument and convert a Date object to POSIX* with non-UTC
time zone (see second example below). But it would be nice to know if there
are subtle differences between these two approaches that we should be aware
of.


``` r
foo <- as.Date(33874, origin = "1899-12-30")
dput(foo)
#> structure(8305, class = "Date")

#Convert to POSIXct specifying UTC timezone
bar <- lubridate::as_datetime(foo, tz = "UTC")
dput(as.POSIXlt(bar))
#> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
#> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =

c("POSIXlt",

#> "POSIXt"), tzone = "UTC")

#Convert to POSIXct specifying Europe/Berlin
bar <- lubridate::as_datetime(foo, tz = "Europe/Berlin")
dput(as.POSIXlt(bar))
#> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
#> year = 92L, wday = 0L, yday = 270L, isdst = 1L, zone = "CEST",

Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-10 Thread Alexandre Courtiol
Hi Simon,

Thanks for the clarification.

>From a naive developer point of view, we were initially baffled that the
generic as.POSIXlt() does very different things on a character and on a
Date input:

as.POSIXlt(as.character(foo), "Europe/Berlin")
[1] "1992-09-27 CEST"

as.POSIXlt(foo, "Europe/Berlin")
[1] "1992-09-27 UTC"

Based on what you said, it does make sense: it is only when creating the
date/time that we want to include the time zone and that only happens when
we don't already work on a previously created date.
That is your subtle but spot-on distinction between "parsing" and
"changing" the time zone.

Yet, we do find it dangerous that as.POSIXlt.Date() accepts a time zone but
does nothing of it, especially when the help file starts with:

Usage
as.POSIXlt(x, tz = "", ...)

The behaviour is documented, as Liam reported it, but still, we will almost
certainly not be the last one tripping on this (without even adding the
additional issue of as.POSIXct() behaving differently across OS).

Thanks again,

Alex & Liam




On Mon, 10 Oct 2022 at 22:13, Simon Urbanek 
wrote:

> Liam,
>
> I think I have failed to convey my main point in the last e-mail - which
> was that you want to parse the date/time in the timezone that you care
> about so in your example that would be
>
> > foo <- as.Date(33874, origin = "1899-12-30")
> > foo
> [1] "1992-09-27"
> > as.POSIXlt(as.character(foo), "Europe/Berlin")
> [1] "1992-09-27 CEST"
>
> I was explicitly saying that you do NOT want to simply change the time
> zone on POSIXlt objects as that won't work for reasons I explained - see my
> last e-mail.
>
> Cheers,
> Simon
>
>
> > On 11/10/2022, at 6:31 AM, Liam Bailey 
> wrote:
> >
> > Hi all,
> >
> > Thanks Simon for the detailed response, that helps us understand a lot
> better what’s going on! However, with your response in mind, we still
> encounter some behaviour that we did not expect.
> >
> > I’ve included another minimum reproducible example below to expand on
> the situation. In this example, `foo` is a Date object that we generate
> from a numeric input. Following your advice, `bar` is then a POSIXlt object
> where we now explicitly define timezone using argument tz. However, even
> though we are explicit about the timezone the POSIXlt that is generated is
> always in UTC. This then leads to the issues outlined by Alexandre above,
> which we now understand are caused by DST.
> >
> > ``` r
> > #Generate date from numeric
> > #Not possible to specify tz at this point
> > foo <- as.Date(33874, origin = "1899-12-30")
> > dput(foo)
> > #> structure(8305, class = "Date")
> >
> > #Convert to POSIXlt specifying UTC timezone
> > bar <- as.POSIXlt(foo, tz = "UTC")
> > dput(bar)
> > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
> > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =
> c("POSIXlt",
> > #> "POSIXt"), tzone = "UTC")
> >
> > #Convert to POSIXlt specifying Europe/Berlin.
> > #Time zone is still UTC
> > bar <- as.POSIXlt(foo, tz = "Europe/Berlin")
> > dput(bar)
> > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
> > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =
> c("POSIXlt",
> > #> "POSIXt"), tzone = "UTC")
> > ```
> >
> >
> > We noticed that this occurs because the tz argument is not passed to
> `.Internal(Date2POSIXlt())` inside `as.POSIXlt.Date()`.
> >
> > Reading through the documentation for `as.POSIX*` we can see that this
> behaviour is described:
> >
> >   > “Dates without times are treated as being at midnight UTC.”
> >
> > In this case, if we want to convert a Date object to POSIX* and specify
> a (non-UTC) timezone would the best strategy be to first coerce our Date
> object to character? Alternatively, `lubridate::as_datetime()` does seem to
> recognise the tz argument and convert a Date object to POSIX* with non-UTC
> time zone (see second example below). But it would be nice to know if there
> are subtle differences between these two approaches that we should be aware
> of.
> >
> > ``` r
> > foo <- as.Date(33874, origin = "1899-12-30")
> > dput(foo)
> > #> structure(8305, class = "Date")
> >
> > #Convert to POSIXct specifying UTC timezone
> > bar <- lubridate::as_datetime(foo, tz = "UTC")
> > dput(as.POSIXlt(bar))
> > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
> > #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class =
> c("POSIXlt",
> > #> "POSIXt"), tzone = "UTC")
> >
> > #Convert to POSIXct specifying Europe/Berlin
> > bar <- lubridate::as_datetime(foo, tz = "Europe/Berlin")
> > dput(as.POSIXlt(bar))
> > #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L,
> > #> year = 92L, wday = 0L, yday = 270L, isdst = 1L, zone = "CEST",
> > #> gmtoff = 7200L), class = c("POSIXlt", "POSIXt"), tzone =
> c("Europe/Berlin",
> > #> "CET", "CEST"))
> > ```
> >
> > Thanks again for all your help.
> > Alex & Liam
> >
> >> On 10 Oct 2022, 

Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-10 Thread Simon Urbanek
Liam,

I think I have failed to convey my main point in the last e-mail - which was 
that you want to parse the date/time in the timezone that you care about so in 
your example that would be

> foo <- as.Date(33874, origin = "1899-12-30")
> foo
[1] "1992-09-27"
> as.POSIXlt(as.character(foo), "Europe/Berlin")
[1] "1992-09-27 CEST"

I was explicitly saying that you do NOT want to simply change the time zone on 
POSIXlt objects as that won't work for reasons I explained - see my last e-mail.

Cheers,
Simon


> On 11/10/2022, at 6:31 AM, Liam Bailey  wrote:
> 
> Hi all,
> 
> Thanks Simon for the detailed response, that helps us understand a lot better 
> what’s going on! However, with your response in mind, we still encounter some 
> behaviour that we did not expect.
> 
> I’ve included another minimum reproducible example below to expand on the 
> situation. In this example, `foo` is a Date object that we generate from a 
> numeric input. Following your advice, `bar` is then a POSIXlt object where we 
> now explicitly define timezone using argument tz. However, even though we are 
> explicit about the timezone the POSIXlt that is generated is always in UTC. 
> This then leads to the issues outlined by Alexandre above, which we now 
> understand are caused by DST.
> 
> ``` r
> #Generate date from numeric
> #Not possible to specify tz at this point
> foo <- as.Date(33874, origin = "1899-12-30")
> dput(foo)
> #> structure(8305, class = "Date")
> 
> #Convert to POSIXlt specifying UTC timezone
> bar <- as.POSIXlt(foo, tz = "UTC")
> dput(bar)
> #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, 
> #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", 
> #> "POSIXt"), tzone = "UTC")
> 
> #Convert to POSIXlt specifying Europe/Berlin.
> #Time zone is still UTC
> bar <- as.POSIXlt(foo, tz = "Europe/Berlin")
> dput(bar)
> #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, 
> #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", 
> #> "POSIXt"), tzone = "UTC")
> ```
> 
> 
> We noticed that this occurs because the tz argument is not passed to 
> `.Internal(Date2POSIXlt())` inside `as.POSIXlt.Date()`.
> 
> Reading through the documentation for `as.POSIX*` we can see that this 
> behaviour is described:
> 
>   > “Dates without times are treated as being at midnight UTC.”
> 
> In this case, if we want to convert a Date object to POSIX* and specify a 
> (non-UTC) timezone would the best strategy be to first coerce our Date object 
> to character? Alternatively, `lubridate::as_datetime()` does seem to 
> recognise the tz argument and convert a Date object to POSIX* with non-UTC 
> time zone (see second example below). But it would be nice to know if there 
> are subtle differences between these two approaches that we should be aware 
> of.
> 
> ``` r
> foo <- as.Date(33874, origin = "1899-12-30")
> dput(foo)
> #> structure(8305, class = "Date")
> 
> #Convert to POSIXct specifying UTC timezone
> bar <- lubridate::as_datetime(foo, tz = "UTC")
> dput(as.POSIXlt(bar))
> #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, 
> #> year = 92L, wday = 0L, yday = 270L, isdst = 0L), class = c("POSIXlt", 
> #> "POSIXt"), tzone = "UTC")
> 
> #Convert to POSIXct specifying Europe/Berlin
> bar <- lubridate::as_datetime(foo, tz = "Europe/Berlin")
> dput(as.POSIXlt(bar))
> #> structure(list(sec = 0, min = 0L, hour = 0L, mday = 27L, mon = 8L, 
> #> year = 92L, wday = 0L, yday = 270L, isdst = 1L, zone = "CEST", 
> #> gmtoff = 7200L), class = c("POSIXlt", "POSIXt"), tzone = 
> c("Europe/Berlin", 
> #> "CET", "CEST"))
> ```
> 
> Thanks again for all your help.
> Alex & Liam
> 
>> On 10 Oct 2022, at 6:40 pm, Hadley Wickham  wrote:
>> 
>> On Sun, Oct 9, 2022 at 9:31 PM Jeff Newmiller  
>> wrote:
>>> 
>>> ... which is why tidyverse functions and Python datetime handling irk me so 
>>> much.
>>> 
>>> Is tidyverse time handling intrinsically broken? They have a standard 
>>> practice of reading time as UTC and then using force_tz to fix the 
>>> "mistake". Same as Python.
>> 
>> Can you point to any docs that lead you to this conclusion so we can
>> get them fixed? I strongly encourage people to parse date-times in the
>> correct time zone; this is why lubridate::ymd_hms() and friends have a
>> tz argument.
>> 
>> Hadley
>> 
>> -- 
>> http://hadley.nz
> 

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-10 Thread Hadley Wickham
On Sun, Oct 9, 2022 at 9:31 PM Jeff Newmiller  wrote:
>
> ... which is why tidyverse functions and Python datetime handling irk me so 
> much.
>
> Is tidyverse time handling intrinsically broken? They have a standard 
> practice of reading time as UTC and then using force_tz to fix the "mistake". 
> Same as Python.

Can you point to any docs that lead you to this conclusion so we can
get them fixed? I strongly encourage people to parse date-times in the
correct time zone; this is why lubridate::ymd_hms() and friends have a
tz argument.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-09 Thread Jeff Newmiller
... which is why tidyverse functions and Python datetime handling irk me so 
much.

Is tidyverse time handling intrinsically broken? They have a standard practice 
of reading time as UTC and then using force_tz to fix the "mistake". Same as 
Python.

On October 9, 2022 6:57:06 PM PDT, Simon Urbanek  
wrote:
>Alexandre,
>
>it's better to parse the timestamp in correct timezone:
>
>> foo = as.POSIXlt("2021-10-01", "UTC")
>> as.POSIXct(as.character(foo), "Europe/Berlin")
>[1] "2021-10-01 CEST"
>
>The issue stems from the fact that you are pretending like your timestamp is 
>UTC (which it is not) while you want to interpret the same values in a 
>different time zone. The DST flags varies depending on the day (due to DST 
>being 0 or 1 depending on the date) and POSIXlt does not have that information 
>since you only attached the time zone without updating it:
>
>> str(unclass(as.POSIXlt(foo, "Europe/Berlin")))
>List of 9
> $ sec  : num 0
> $ min  : int 0
> $ hour : int 0
> $ mday : int 1
> $ mon  : int 9
> $ year : int 121
> $ wday : int 5
> $ yday : int 273
> $ isdst: int 0
> - attr(*, "tzone")= chr "Europe/Berlin"
>
>note that isdst is 0 from the UTC entry (which doesn't have DST) even though 
>that date is actually DST in CEST. Compare that to the correctly parsed 
>POSIXlt:
>
>> str(unclass(as.POSIXlt(as.character(foo), "Europe/Berlin")))
>List of 11
> $ sec   : num 0
> $ min   : int 0
> $ hour  : int 0
> $ mday  : int 1
> $ mon   : int 9
> $ year  : int 121
> $ wday  : int 5
> $ yday  : int 273
> $ isdst : int 1
> $ zone  : chr "CEST"
> $ gmtoff: int NA
> - attr(*, "tzone")= chr "Europe/Berlin"
>
>where isdst is 1 since it is indeed the DST. The OS difference seems to be 
>that Linux respects the isdst information from POSIXlt while Windows and macOS 
>ignores it. This behavior is documented: 
>
> At all other times ‘isdst’ can be deduced from the
> first six values, but the behaviour if it is set incorrectly is
> platform-dependent.
>
>You can re-set isdst to -1 to make sure R will try to determine it:
>
>> foo$isdst = -1L
>> as.POSIXct(foo, "Europe/Berlin")
>[1] "2021-10-01 CEST"
>
>So, generally, you cannot simply change the time zone in POSIXlt - don't 
>pretend the time is in UTC if it's not, you have to re-parse or re-compute the 
>timestamps for it to be reliable or else the DST flag will be wrong.
>
>Cheers,
>Simon
>
>
>> On 10/10/2022, at 1:14 AM, Alexandre Courtiol  
>> wrote:
>> 
>> Hi R pkg developers,
>> 
>> We are facing a datetime handling issue which manifests itself in a
>> package we are working on.
>> 
>> In context, we noticed that reading datetime info from an excel file
>> resulted in different data depending on the computer we used.
>> 
>> We are aware that timezone and regional settings are general sources
>> of troubles, but the code we are using was trying to circumvent this.
>> We went only as far as figuring out that the issue happens when
>> converting a POSIXlt into a POSIXct.
>> 
>> Please find below, a minimal reproducible example where `foo` is
>> converted to `bar` on two different computers.
>> `foo` is a POSIXlt with a defined time zone and upon conversion to a
>> POSIXct, despite using a set time zone, we end up with `bar` being
>> different on Linux and on a Windows machine.
>> 
>> We noticed that the difference emerges from the system call
>> `.Internal(as.POSIXct())` within `as.POSIXct.POSIXlt()`.
>> We also noticed that the internal function in R actually calls
>> getenv("TZ") within C, which is probably what explains where the
>> difference comes from.
>> 
>> Such a behaviour is probably expected and not a bug, but what would be
>> the strategy to convert a POSIXlt into a POSIXct that would not be
>> machine dependent?
>> 
>> We finally noticed that depending on the datetime used as a starting
>> point and on the time zone used when calling `as.POSIXct()`, we
>> sometimes have a difference between computers and sometimes not...
>> which adds to our puzzlement.
>> 
>> Many thanks.
>> Alex & Liam
>> 
>> 
>> ``` r
>> ## On Linux
>> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
>> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
>> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
>> 
>> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
>> 
>> bar
>> #> [1] "2021-10-01 01:00:00 CEST"
>> 
>> dput(bar)
>> #> structure(1633042800, class = c("POSIXct", "POSIXt"), tzone =
>> "Europe/Berlin")
>> ```
>> 
>> ``` r
>> ## On Windows
>> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
>> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
>> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
>> 
>> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
>> 
>> bar
>> #> [1] "2021-10-01 CEST"
>> 
>> dput(bar)
>> structure(1633046400, class = c("POSIXct", "POSIXt"), tzone = 
>> "Europe/Berlin")
>> ```
>> 
>> -- 
>> Alexandre Courtiol, www.datazoogang.de
>> 
>> __
>> 

Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-09 Thread Simon Urbanek
Alexandre,

it's better to parse the timestamp in correct timezone:

> foo = as.POSIXlt("2021-10-01", "UTC")
> as.POSIXct(as.character(foo), "Europe/Berlin")
[1] "2021-10-01 CEST"

The issue stems from the fact that you are pretending like your timestamp is 
UTC (which it is not) while you want to interpret the same values in a 
different time zone. The DST flags varies depending on the day (due to DST 
being 0 or 1 depending on the date) and POSIXlt does not have that information 
since you only attached the time zone without updating it:

> str(unclass(as.POSIXlt(foo, "Europe/Berlin")))
List of 9
 $ sec  : num 0
 $ min  : int 0
 $ hour : int 0
 $ mday : int 1
 $ mon  : int 9
 $ year : int 121
 $ wday : int 5
 $ yday : int 273
 $ isdst: int 0
 - attr(*, "tzone")= chr "Europe/Berlin"

note that isdst is 0 from the UTC entry (which doesn't have DST) even though 
that date is actually DST in CEST. Compare that to the correctly parsed POSIXlt:

> str(unclass(as.POSIXlt(as.character(foo), "Europe/Berlin")))
List of 11
 $ sec   : num 0
 $ min   : int 0
 $ hour  : int 0
 $ mday  : int 1
 $ mon   : int 9
 $ year  : int 121
 $ wday  : int 5
 $ yday  : int 273
 $ isdst : int 1
 $ zone  : chr "CEST"
 $ gmtoff: int NA
 - attr(*, "tzone")= chr "Europe/Berlin"

where isdst is 1 since it is indeed the DST. The OS difference seems to be that 
Linux respects the isdst information from POSIXlt while Windows and macOS 
ignores it. This behavior is documented: 

 At all other times ‘isdst’ can be deduced from the
 first six values, but the behaviour if it is set incorrectly is
 platform-dependent.

You can re-set isdst to -1 to make sure R will try to determine it:

> foo$isdst = -1L
> as.POSIXct(foo, "Europe/Berlin")
[1] "2021-10-01 CEST"

So, generally, you cannot simply change the time zone in POSIXlt - don't 
pretend the time is in UTC if it's not, you have to re-parse or re-compute the 
timestamps for it to be reliable or else the DST flag will be wrong.

Cheers,
Simon


> On 10/10/2022, at 1:14 AM, Alexandre Courtiol  
> wrote:
> 
> Hi R pkg developers,
> 
> We are facing a datetime handling issue which manifests itself in a
> package we are working on.
> 
> In context, we noticed that reading datetime info from an excel file
> resulted in different data depending on the computer we used.
> 
> We are aware that timezone and regional settings are general sources
> of troubles, but the code we are using was trying to circumvent this.
> We went only as far as figuring out that the issue happens when
> converting a POSIXlt into a POSIXct.
> 
> Please find below, a minimal reproducible example where `foo` is
> converted to `bar` on two different computers.
> `foo` is a POSIXlt with a defined time zone and upon conversion to a
> POSIXct, despite using a set time zone, we end up with `bar` being
> different on Linux and on a Windows machine.
> 
> We noticed that the difference emerges from the system call
> `.Internal(as.POSIXct())` within `as.POSIXct.POSIXlt()`.
> We also noticed that the internal function in R actually calls
> getenv("TZ") within C, which is probably what explains where the
> difference comes from.
> 
> Such a behaviour is probably expected and not a bug, but what would be
> the strategy to convert a POSIXlt into a POSIXct that would not be
> machine dependent?
> 
> We finally noticed that depending on the datetime used as a starting
> point and on the time zone used when calling `as.POSIXct()`, we
> sometimes have a difference between computers and sometimes not...
> which adds to our puzzlement.
> 
> Many thanks.
> Alex & Liam
> 
> 
> ``` r
> ## On Linux
> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
> 
> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
> 
> bar
> #> [1] "2021-10-01 01:00:00 CEST"
> 
> dput(bar)
> #> structure(1633042800, class = c("POSIXct", "POSIXt"), tzone =
> "Europe/Berlin")
> ```
> 
> ``` r
> ## On Windows
> foo <- structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon =
> 9L, year = 121L, wday = 5L, yday = 273L, isdst = 0L),
> class = c("POSIXlt", "POSIXt"), tzone = "UTC")
> 
> bar <- as.POSIXct(foo, tz = "Europe/Berlin")
> 
> bar
> #> [1] "2021-10-01 CEST"
> 
> dput(bar)
> structure(1633046400, class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin")
> ```
> 
> -- 
> Alexandre Courtiol, www.datazoogang.de
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel