What is the best way to read (from a text file) timestamps from the fall time change, where there are two 1:15am's? E.g., here is an extract from a US Geological Survey web site giving data on the river through our county on 2020-11-01, when we changed from PDT to PST, https://nwis.waterdata.usgs.gov/wa/nwis/uv/?cb_00010=on&cb_00060=on&cb_00065=on&format=rdb&site_no=12200500&period=&begin_date=2020-11-01&end_date=2020-11-05 .
The timestamps include the date and time as well as PDT or PST. river <- c("datetime,tz,discharge,height,temp", "2020-11-01 00:00,PDT,20500,16.44,9.3", "2020-11-01 00:15,PDT,20500,16.44,9.3", "2020-11-01 00:30,PDT,20500,16.43,9.3", "2020-11-01 00:45,PDT,20400,16.40,9.3", "2020-11-01 01:00,PDT,20400,16.40,9.3", "2020-11-01 01:00,PST,20200,16.34,9.2", "2020-11-01 01:15,PDT,20400,16.39,9.3", "2020-11-01 01:15,PST,20200,16.34,9.2", "2020-11-01 01:30,PDT,20300,16.37,9.2", "2020-11-01 01:30,PST,20100,16.31,9.2", "2020-11-01 01:45,PDT,20300,16.35,9.2", "2020-11-01 01:45,PST,20100,16.29,9.2", "2020-11-01 02:00,PST,20100,16.29,9.2", "2020-11-01 02:15,PST,20000,16.27,9.1", "2020-11-01 02:30,PST,20000,16.26,9.1" ) d <- read.table(text=river, sep=",",header=TRUE) The entries are obviously not in time order. Is there a simple way to read the timedate and tz columns together? One way is to use d$tz to construct an offset that can be read with strptime's "%z". > d$POSIXct <- as.POSIXct(paste(d$datetime,ifelse(d$tz=="PDT","-0700","-0800")), format="%Y-%m-%d %H:%M %z") > d datetime tz discharge height temp POSIXct 1 2020-11-01 00:00 PDT 20500 16.44 9.3 2020-11-01 00:00:00 2 2020-11-01 00:15 PDT 20500 16.44 9.3 2020-11-01 00:15:00 3 2020-11-01 00:30 PDT 20500 16.43 9.3 2020-11-01 00:30:00 4 2020-11-01 00:45 PDT 20400 16.40 9.3 2020-11-01 00:45:00 5 2020-11-01 01:00 PDT 20400 16.40 9.3 2020-11-01 01:00:00 6 2020-11-01 01:00 PST 20200 16.34 9.2 2020-11-01 01:00:00 7 2020-11-01 01:15 PDT 20400 16.39 9.3 2020-11-01 01:15:00 8 2020-11-01 01:15 PST 20200 16.34 9.2 2020-11-01 01:15:00 9 2020-11-01 01:30 PDT 20300 16.37 9.2 2020-11-01 01:30:00 10 2020-11-01 01:30 PST 20100 16.31 9.2 2020-11-01 01:30:00 11 2020-11-01 01:45 PDT 20300 16.35 9.2 2020-11-01 01:45:00 12 2020-11-01 01:45 PST 20100 16.29 9.2 2020-11-01 01:45:00 13 2020-11-01 02:00 PST 20100 16.29 9.2 2020-11-01 02:00:00 14 2020-11-01 02:15 PST 20000 16.27 9.1 2020-11-01 02:15:00 15 2020-11-01 02:30 PST 20000 16.26 9.1 2020-11-01 02:30:00 > with(d[order(d$POSIXct),], plot(temp)) # monotonic temperature -Bill On Thu, Sep 2, 2021 at 12:41 PM Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Regardless of whether you use the lower-level split function, or the > higher-level aggregate function, or the tidyverse group_by function, the > key is learning how to create the column that is the same for all records > corresponding to the time interval of interest. > > If you convert the sampdate to POSIXct, the tz IS important, because most > of us use local timezones that respect daylight savings time, and a naive > conversion of standard time will run into trouble if R is assuming daylight > savings time applies. The lubridate package gets around this by always > assuming UTC and giving you a function to "fix" the timezone after the > conversion. I prefer to always be specific about timezones, at least by > using so something like > > Sys.setenv( TZ = "Etc/GMT+8" ) > > which does not respect daylight savings. > > Regarding using character data for identifying the month, in order to have > clean plots of the data I prefer to use the trunc function but it returns a > POSIXlt so I convert it to POSIXct: > > discharge$sampmonthbegin <- as.POSIXct( trunc( discharge$sampdate, > units = "months" ) ) > > Then any of various ways can be used to aggregate the records by that > column. > > On September 2, 2021 12:10:15 PM PDT, Andrew Simmons <akwsi...@gmail.com> > wrote: > >You could use 'split' to create a list of data frames, and then apply a > >function to each to get the means and sds. > > > > > >cols <- "cfs" # add more as necessary > >S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m")) > >means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE)) > >sds <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd, na.rm = > >TRUE))) > > > >On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard <rshep...@appl-ecosys.com> > >wrote: > > > >> On Thu, 2 Sep 2021, Rich Shepard wrote: > >> > >> > If I correctly understand the output of as.POSIXlt each date and time > >> > element is separate, so input such as 2016-03-03 12:00 would now be > 2016 > >> 03 > >> > 03 12 00 (I've not read how the elements are separated). (The TZ is > not > >> > important because all data are either PST or PDT.) > >> > >> Using this script: > >> discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep > = > >> ',', stringsAsFactors = FALSE) > >> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "", > >> format = '%Y-%m-%d %H:%M', > >> optional = 'logical') > >> discharge$cfs <- as.numeric(discharge$cfs, length = 6) > >> > >> I get this result: > >> > head(discharge) > >> sampdate cfs > >> 1 2016-03-03 12:00:00 149000 > >> 2 2016-03-03 12:10:00 150000 > >> 3 2016-03-03 12:20:00 151000 > >> 4 2016-03-03 12:30:00 156000 > >> 5 2016-03-03 12:40:00 154000 > >> 6 2016-03-03 12:50:00 150000 > >> > >> I'm completely open to suggestions on using this output to calculate > >> monthly > >> means and sds. > >> > >> If dplyr:summarize() will do so please show me how to modify this > command: > >> disc_monthly <- ( discharge > >> %>% group_by(sampdate) > >> %>% summarize(exp_value = mean(cfs, na.rm = TRUE)) > >> because it produces daily means, not monthly means. > >> > >> TIA, > >> > >> Rich > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.