Re: [R] Merge with closest (not equal) time stamps
Thanks for your suggestion. I have just returned from a vacation and started catching up on my emails. Rolling join is an elegant and most suitable solution for my tasks. I invested some time in learning data.table package. The vignette on secondary indices and auto indexing refers to another vignette on joins and rolling joins. But the vignette on joins is missing. There is already some discussion on the internet regarding this missing vignette. Here is one more request for this joins vignette, even if in a draft form. Naresh > On Aug 9, 2023, at 8:39 AM, Hadley Wickham wrote: > > It sounds like you might want a rolling join, e.g. > https://dplyr.tidyverse.org/reference/join_by.html#rolling-joins. > > (And data.table has similar functionality which inspired dplyr) > > Hadley > > On Mon, Aug 7, 2023 at 9:32 PM Naresh Gurbuxani > wrote: >> >> >> I have two dataframes, each with a column for timestamp. I want to >> merge the two dataframes such that each row from first dataframe >> is matched with the row in the second dataframe with most recent but >> preceding timestamp. Here is an example. >> >> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", >> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, >> 1.8)) >> >> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", >> "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", >> "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7)) >> >> stock.trades <- stock.trades[order(stock.trades$timestamp),] >> >> library(plyr) >> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) >> tail(subset(stock.trades, timestamp <= tstamp), 1)) >> names(mystock.prices)[1] <- "stock.timestamp" >> myres <- cbind(option.trades, mystock.prices) >> >> This method works. But for large dataframes, it is very slow. Is there >> a way to speed up the merge? >> >> Thanks, >> Naresh >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > http://hadley.nz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge with closest (not equal) time stamps
It sounds like you might want a rolling join, e.g. https://dplyr.tidyverse.org/reference/join_by.html#rolling-joins. (And data.table has similar functionality which inspired dplyr) Hadley On Mon, Aug 7, 2023 at 9:32 PM Naresh Gurbuxani wrote: > > > I have two dataframes, each with a column for timestamp. I want to > merge the two dataframes such that each row from first dataframe > is matched with the row in the second dataframe with most recent but > preceding timestamp. Here is an example. > > option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", > "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, > 1.8)) > > stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", > "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", > "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7)) > > stock.trades <- stock.trades[order(stock.trades$timestamp),] > > library(plyr) > mystock.prices <- ldply(option.trades$timestamp, function(tstamp) > tail(subset(stock.trades, timestamp <= tstamp), 1)) > names(mystock.prices)[1] <- "stock.timestamp" > myres <- cbind(option.trades, mystock.prices) > > This method works. But for large dataframes, it is very slow. Is there > a way to speed up the merge? > > Thanks, > Naresh > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- http://hadley.nz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge with closest (not equal) time stamps
On Mon, 07 Aug 2023, Naresh Gurbuxani writes: > I have two dataframes, each with a column for timestamp. I want to > merge the two dataframes such that each row from first dataframe > is matched with the row in the second dataframe with most recent but > preceding timestamp. Here is an example. > > option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", > "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, > 1.8)) > > stock.trades <- data.frame(timestamp = > as.POSIXct(c("2023-08-07 10:23:21", "2023-08-07 > 10:23:34", "2023-08-07 10:24:57", "2023-08-07 > 10:28:37", "2023-08-07 10:29:01")), stock.price = > c(102.2, 102.9, 103.1, 101.8, 101.7)) > > stock.trades <- stock.trades[order(stock.trades$timestamp),] > > library(plyr) > mystock.prices <- ldply(option.trades$timestamp, function(tstamp) > tail(subset(stock.trades, timestamp <= tstamp), 1)) > names(mystock.prices)[1] <- "stock.timestamp" > myres <- cbind(option.trades, mystock.prices) > > This method works. But for large dataframes, it is very slow. Is there > a way to speed up the merge? > > Thanks, > Naresh > If the timestamps are sorted (or you can sort them), function ?findInterval might be helpful: i <- findInterval(option.trades$timestamp, stock.trades$timestamp) cbind(option.trades, stock.trades[i, ]) ## timestamp option.price timestamp stock.price ## 1 2023-08-07 10:23:22 2.5 2023-08-07 10:23:21 102.2 ## 3 2023-08-07 10:25:33 2.7 2023-08-07 10:24:57 103.1 ## 4 2023-08-07 10:28:41 1.8 2023-08-07 10:28:37 101.8 -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge with closest (not equal) time stamps
I was able to adapt your solution using packages with which I am more familiar. myres2 <- merge(option.trades, stock.trades, by = "timestamp", all = TRUE) myres2[,"stock.timestamp"] <- ifelse(is.na(myres2$stock.price), NA, myres2$timestamp) myres2$stock.timestamp <- as.POSIXct(myres2$stock.timestamp, origin = "1970-01-01") library(zoo) myres2$stock.price <- na.locf(myres2$stock.price) myres2$stock.timestamp <- na.locf(myres2$stock.timestamp) myres2 <- myres2[!is.na(myres2$option.price),] row.names(myres2) <- NULL all.equal(myres, myres2[,c(1, 2, 4, 3)]) # TRUE This calculation is indeed faster. Thanks for your help, Naresh > On Aug 8, 2023, at 5:39 AM, Eric Berger wrote: > > Hi Naresh, > Perhaps the below is faster than your approach > > library(dplyr) > library(tidyr) > merge(option.trades, stock.trades, by="timestamp", all=TRUE) |> > dplyr::arrange(timestamp) |> > dplyr::mutate(stock.timestamp = > as.POSIXct(ifelse(is.na(option.price), timestamp, NA))) |> > tidyr::fill(stock.price, stock.timestamp) |> > dplyr::filter(!is.na(option.price)) |> > dplyr::select(1,2,4,3) -> > myres2 > > identical(myres, myres2) ## TRUE > > > On Tue, Aug 8, 2023 at 5:32 AM Naresh Gurbuxani > wrote: >> >> >> I have two dataframes, each with a column for timestamp. I want to >> merge the two dataframes such that each row from first dataframe >> is matched with the row in the second dataframe with most recent but >> preceding timestamp. Here is an example. >> >> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", >> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, >> 1.8)) >> >> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", >> "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", >> "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7)) >> >> stock.trades <- stock.trades[order(stock.trades$timestamp),] >> >> library(plyr) >> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) >> tail(subset(stock.trades, timestamp <= tstamp), 1)) >> names(mystock.prices)[1] <- "stock.timestamp" >> myres <- cbind(option.trades, mystock.prices) >> >> This method works. But for large dataframes, it is very slow. Is there >> a way to speed up the merge? >> >> Thanks, >> Naresh >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge with closest (not equal) time stamps
Hi Naresh, Perhaps the below is faster than your approach library(dplyr) library(tidyr) merge(option.trades, stock.trades, by="timestamp", all=TRUE) |> dplyr::arrange(timestamp) |> dplyr::mutate(stock.timestamp = as.POSIXct(ifelse(is.na(option.price), timestamp, NA))) |> tidyr::fill(stock.price, stock.timestamp) |> dplyr::filter(!is.na(option.price)) |> dplyr::select(1,2,4,3) -> myres2 identical(myres, myres2) ## TRUE On Tue, Aug 8, 2023 at 5:32 AM Naresh Gurbuxani wrote: > > > I have two dataframes, each with a column for timestamp. I want to > merge the two dataframes such that each row from first dataframe > is matched with the row in the second dataframe with most recent but > preceding timestamp. Here is an example. > > option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", > "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, > 1.8)) > > stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", > "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", > "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7)) > > stock.trades <- stock.trades[order(stock.trades$timestamp),] > > library(plyr) > mystock.prices <- ldply(option.trades$timestamp, function(tstamp) > tail(subset(stock.trades, timestamp <= tstamp), 1)) > names(mystock.prices)[1] <- "stock.timestamp" > myres <- cbind(option.trades, mystock.prices) > > This method works. But for large dataframes, it is very slow. Is there > a way to speed up the merge? > > Thanks, > Naresh > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge with closest (not equal) time stamps
I have two dataframes, each with a column for timestamp. I want to merge the two dataframes such that each row from first dataframe is matched with the row in the second dataframe with most recent but preceding timestamp. Here is an example. option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 1.8)) stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7)) stock.trades <- stock.trades[order(stock.trades$timestamp),] library(plyr) mystock.prices <- ldply(option.trades$timestamp, function(tstamp) tail(subset(stock.trades, timestamp <= tstamp), 1)) names(mystock.prices)[1] <- "stock.timestamp" myres <- cbind(option.trades, mystock.prices) This method works. But for large dataframes, it is very slow. Is there a way to speed up the merge? Thanks, Naresh __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.