Hello

I'm not sure if this is an appropriate use of this mailing list or not, please 
let me know if it isn't.  I'm struggling to figure out how to merge two data 
tables based on max effective date logic compared to when a payment occurred.  
My dtDistributions DT is a transactional dataset while dtDepartments is a 
domain data set containing all department names and the effective date of when 
department name changes have occurred.  For the Bob example below, there was a 
payment on 2016-01-01 which occurred in H229000.  In 2012, this department was 
named "Modified Name", in 2019 the department will be named "Final Name".  When 
I merge these two tables, I'd like it to pull the transactional data and match 
it up to department name "Modified Name" since that was the active department 
name at the time of that transaction.  I've read documentation on foverlaps, 
but I'm not sure if this problem is considered a range of dates or not.  At the 
bottom of this post is a temporarily solution that is working but it runs for a 
long time due to the amount of data in my actual source.

Here is some sample data to get started:
library(data.table)
dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
                          Department = factor(c("H229000", "H135000", 
"H047800")),
                          Amount = c(5, 34, 87),
                          PaymentDT = as.Date(c("2016-01-01", "2015-01-01", 
"2015-01-01")))

dtDepartments <- data.table(Department = factor(c("H229000", "H229000", 
"H229000", "H135000", "H047800")),
                        EffDT = as.Date(c("2019-01-01", "2012-01-01", 
"1901-01-01", "1901-01-01", "1901-01-01")),
                        Descr = c("Final Name","Modified Name","Original 
Name","Payables","Postal"))

Here is the output I would like to see:
PayeeName  Department     PaymentDT   Amount
Bob        Modified Name  2016-01-01  5
Tracy      Payables       2015-01-01  34
Tom        Postal         2015-01-01  87

I was able to get this working by using the sqldf library, but it runs for a 
very long time in my actual dataset and I'd like to use data.table if at all 
possible.
library(sqldf)
joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
            FROM dtDistributions A, dtDepartments B
            WHERE A.DEPARTMENT = B.Department
            AND B.EffDT = (SELECT MAX(ED.EffDT)
                            FROM dtDepartments ED
                            WHERE B.Department = ED.Department
                            AND ED.EffDT <= A.PaymentDT)"

finalDT <- data.table(sqldf(joinString))



-----Message Disclaimer-----

This e-mail message is intended only for the use of the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you are not 
the intended recipient, any dissemination, distribution or copying of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by reply email to conn...@principal.com 
and delete or destroy all copies of the original message and attachments 
thereto. Email sent to or from the Principal Financial Group or any of its 
member companies may be retained as required by law or regulation.

Nothing in this message is intended to constitute an Electronic signature for 
purposes of the Uniform Electronic Transactions Act (UETA) or the Electronic 
Signatures in Global and National Commerce Act ("E-Sign") unless a specific 
statement to the contrary is included in this message.

If you no longer wish to receive any further solicitation from the Principal 
Financial Group you may unsubscribe at 
https://www.principal.com/do-not-contact-form any time.

If you are a Canadian resident and no longer wish to receive commercial 
electronic messages you may unsubscribe at 
https://www.principal.com/do-not-email-request-canadian-residents any time.





This message was secured by Zix(R).

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to