[R] data.table merge equivalent for all.x

2011-11-26 Thread ONKELINX, Thierry
Dear all,

I'm trying to use data.table to summarise a table and merge it to another 
table. Here is what I would like to do, but by using data.table() in a proper 
way.

library(data.table)
tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID)
tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2)
junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum)
merge(tab1, junk, by = D, all.x = TRUE)

This my attempt using data.table()

junk - tab2[, mean(B), by = D]
tab1[junk]

Best regards,

Thierry



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.table merge equivalent for all.x

2011-11-26 Thread Dennis Murphy
Hi:

There may well be a more efficient way to do this, but here's one take.

library('data.table')
# Want to merge by D in the end, so set D as part of the key:
t1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID, D)
t2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2, D)

# The J expression produces sums of B (the non-key variable) for each D group
# .SD denotes 'sub-data'.  The result 'junk' is a data table.
junk - t2[, lapply(.SD, sum), by = D]

tables()   # junk has no key
# set a key for junk so that it can be merged
setkey(junk, 'D')
# t1 and junk have a common key variable D, so the left join is
merge(t1, junk, by = 'D', all.x = TRUE)

# check against
t1
junk

HTH,
Dennis


On Sat, Nov 26, 2011 at 3:59 PM, ONKELINX, Thierry
thierry.onkel...@inbo.be wrote:
 Dear all,

 I'm trying to use data.table to summarise a table and merge it to another 
 table. Here is what I would like to do, but by using data.table() in a proper 
 way.

 library(data.table)
 tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID)
 tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2)
 junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum)
 merge(tab1, junk, by = D, all.x = TRUE)

 This my attempt using data.table()

 junk - tab2[, mean(B), by = D]
 tab1[junk]

 Best regards,

 Thierry



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.