Re: [R] Merge with closest (not equal) time stamps

2023-08-24 Thread Naresh Gurbuxani
Thanks for your suggestion. 

I have just returned from a vacation and started catching up on my emails. 

Rolling join is an elegant and most suitable solution for my tasks.  I invested 
some time in learning data.table package.  The vignette on secondary indices 
and auto indexing refers to another vignette on joins and rolling joins.  But 
the vignette on joins is missing.  There is already some discussion on the 
internet regarding this missing vignette.  

Here is one more request for this joins vignette, even if in a draft form.  

Naresh  

> On Aug 9, 2023, at 8:39 AM, Hadley Wickham  wrote:
> 
> It sounds like you might want a rolling join, e.g.
> https://dplyr.tidyverse.org/reference/join_by.html#rolling-joins.
> 
> (And data.table has similar functionality which inspired dplyr)
> 
> Hadley
> 
> On Mon, Aug 7, 2023 at 9:32 PM Naresh Gurbuxani
>  wrote:
>> 
>> 
>> I have two dataframes, each with a column for timestamp.  I want to
>> merge the two dataframes such that each row from first dataframe
>> is matched with the row in the second dataframe with most recent but
>> preceding timestamp. Here is an example.
>> 
>> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", 
>> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 
>> 1.8))
>> 
>> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", 
>> "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", 
>> "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))
>> 
>> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>> 
>> library(plyr)
>> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) 
>> tail(subset(stock.trades, timestamp <= tstamp), 1))
>> names(mystock.prices)[1] <- "stock.timestamp"
>> myres <- cbind(option.trades, mystock.prices)
>> 
>> This method works. But for large dataframes, it is very slow.  Is there
>> a way to speed up the merge?
>> 
>> Thanks,
>> Naresh
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge with closest (not equal) time stamps

2023-08-09 Thread Hadley Wickham
It sounds like you might want a rolling join, e.g.
https://dplyr.tidyverse.org/reference/join_by.html#rolling-joins.

(And data.table has similar functionality which inspired dplyr)

Hadley

On Mon, Aug 7, 2023 at 9:32 PM Naresh Gurbuxani
 wrote:
>
>
> I have two dataframes, each with a column for timestamp.  I want to
> merge the two dataframes such that each row from first dataframe
> is matched with the row in the second dataframe with most recent but
> preceding timestamp. Here is an example.
>
> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", 
> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 
> 1.8))
>
> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", 
> "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", 
> "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))
>
> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>
> library(plyr)
> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) 
> tail(subset(stock.trades, timestamp <= tstamp), 1))
> names(mystock.prices)[1] <- "stock.timestamp"
> myres <- cbind(option.trades, mystock.prices)
>
> This method works. But for large dataframes, it is very slow.  Is there
> a way to speed up the merge?
>
> Thanks,
> Naresh
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge with closest (not equal) time stamps

2023-08-08 Thread Enrico Schumann
On Mon, 07 Aug 2023, Naresh Gurbuxani writes:

> I have two dataframes, each with a column for timestamp.  I want to
> merge the two dataframes such that each row from first dataframe
> is matched with the row in the second dataframe with most recent but
> preceding timestamp. Here is an example.
>
> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", 
> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 
> 1.8))
>
> stock.trades <- data.frame(timestamp =
> as.POSIXct(c("2023-08-07 10:23:21", "2023-08-07
> 10:23:34", "2023-08-07 10:24:57", "2023-08-07
> 10:28:37", "2023-08-07 10:29:01")), stock.price =
> c(102.2, 102.9, 103.1, 101.8, 101.7))
>
> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>
> library(plyr)
> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) 
> tail(subset(stock.trades, timestamp <= tstamp), 1))
> names(mystock.prices)[1] <- "stock.timestamp"
> myres <- cbind(option.trades, mystock.prices)
>
> This method works. But for large dataframes, it is very slow.  Is there
> a way to speed up the merge?
>
> Thanks,
> Naresh
>

If the timestamps are sorted (or you can sort them),
function ?findInterval might be helpful:

i <- findInterval(option.trades$timestamp, stock.trades$timestamp)
cbind(option.trades, stock.trades[i, ])
## timestamp option.price   timestamp stock.price
## 1 2023-08-07 10:23:22  2.5 2023-08-07 10:23:21   102.2
## 3 2023-08-07 10:25:33  2.7 2023-08-07 10:24:57   103.1
## 4 2023-08-07 10:28:41  1.8 2023-08-07 10:28:37   101.8



-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge with closest (not equal) time stamps

2023-08-08 Thread Naresh Gurbuxani
I was able to adapt your solution using packages with which I am more familiar. 

myres2 <- merge(option.trades, stock.trades, by = "timestamp", all =
TRUE)

myres2[,"stock.timestamp"] <- ifelse(is.na(myres2$stock.price), NA,
myres2$timestamp)

myres2$stock.timestamp <- as.POSIXct(myres2$stock.timestamp, origin =
"1970-01-01")

library(zoo)
myres2$stock.price <- na.locf(myres2$stock.price)
myres2$stock.timestamp <- na.locf(myres2$stock.timestamp)
myres2 <- myres2[!is.na(myres2$option.price),]
row.names(myres2) <- NULL

all.equal(myres, myres2[,c(1, 2, 4, 3)]) # TRUE

This calculation is indeed faster.  

Thanks for your help,
Naresh

> On Aug 8, 2023, at 5:39 AM, Eric Berger  wrote:
> 
> Hi Naresh,
> Perhaps the below is faster than your approach
> 
> library(dplyr)
> library(tidyr)
> merge(option.trades, stock.trades, by="timestamp", all=TRUE) |>
>  dplyr::arrange(timestamp) |>
>  dplyr::mutate(stock.timestamp =
> as.POSIXct(ifelse(is.na(option.price), timestamp, NA))) |>
>  tidyr::fill(stock.price, stock.timestamp) |>
>  dplyr::filter(!is.na(option.price)) |>
>  dplyr::select(1,2,4,3) ->
>  myres2
> 
> identical(myres, myres2)  ## TRUE
> 
> 
> On Tue, Aug 8, 2023 at 5:32 AM Naresh Gurbuxani
>  wrote:
>> 
>> 
>> I have two dataframes, each with a column for timestamp.  I want to
>> merge the two dataframes such that each row from first dataframe
>> is matched with the row in the second dataframe with most recent but
>> preceding timestamp. Here is an example.
>> 
>> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", 
>> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 
>> 1.8))
>> 
>> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", 
>> "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", 
>> "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))
>> 
>> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>> 
>> library(plyr)
>> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) 
>> tail(subset(stock.trades, timestamp <= tstamp), 1))
>> names(mystock.prices)[1] <- "stock.timestamp"
>> myres <- cbind(option.trades, mystock.prices)
>> 
>> This method works. But for large dataframes, it is very slow.  Is there
>> a way to speed up the merge?
>> 
>> Thanks,
>> Naresh
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge with closest (not equal) time stamps

2023-08-08 Thread Eric Berger
Hi Naresh,
Perhaps the below is faster than your approach

library(dplyr)
library(tidyr)
merge(option.trades, stock.trades, by="timestamp", all=TRUE) |>
  dplyr::arrange(timestamp) |>
  dplyr::mutate(stock.timestamp =
as.POSIXct(ifelse(is.na(option.price), timestamp, NA))) |>
  tidyr::fill(stock.price, stock.timestamp) |>
  dplyr::filter(!is.na(option.price)) |>
  dplyr::select(1,2,4,3) ->
  myres2

identical(myres, myres2)  ## TRUE


On Tue, Aug 8, 2023 at 5:32 AM Naresh Gurbuxani
 wrote:
>
>
> I have two dataframes, each with a column for timestamp.  I want to
> merge the two dataframes such that each row from first dataframe
> is matched with the row in the second dataframe with most recent but
> preceding timestamp. Here is an example.
>
> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", 
> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 
> 1.8))
>
> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", 
> "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", 
> "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))
>
> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>
> library(plyr)
> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) 
> tail(subset(stock.trades, timestamp <= tstamp), 1))
> names(mystock.prices)[1] <- "stock.timestamp"
> myres <- cbind(option.trades, mystock.prices)
>
> This method works. But for large dataframes, it is very slow.  Is there
> a way to speed up the merge?
>
> Thanks,
> Naresh
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merge with closest (not equal) time stamps

2023-08-07 Thread Naresh Gurbuxani


I have two dataframes, each with a column for timestamp.  I want to
merge the two dataframes such that each row from first dataframe
is matched with the row in the second dataframe with most recent but
preceding timestamp. Here is an example.

option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", 
"2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, 1.8))

stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", 
"2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", 
"2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))

stock.trades <- stock.trades[order(stock.trades$timestamp),]

library(plyr)
mystock.prices <- ldply(option.trades$timestamp, function(tstamp) 
tail(subset(stock.trades, timestamp <= tstamp), 1))
names(mystock.prices)[1] <- "stock.timestamp"
myres <- cbind(option.trades, mystock.prices)

This method works. But for large dataframes, it is very slow.  Is there
a way to speed up the merge?

Thanks,
Naresh

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.