Re: [R] combining dataframes with different numbers of columns

2006-11-07 Thread Denis Chabot
Thanks you very much Hadley, Stephen, and Sundar, your suggestions  
all solve my problem, even if the narrower dataframe contains  
variables that are new to the wider dataframe.

I'm sure glad I took the time to write instead of pursuing with my  
ugly and time-consuming solution.

Denis

> Dear list members,
>
> I have to combine dataframes together. However they contain  
> different numbers of variables. It is possible that all the  
> variables in the dataframe with fewer variables are contained in  
> the dataframe with more variables, though it is not always the case.
>
> There are key variables identifying observations. These could be  
> used in a merge statement, although this won't quite work for me  
> (see below).
>
> I was hoping to find a way to combine dataframes where I needed  
> only to ensure the key variables were present. The total number of  
> variables in the final dataframe would be the total number of  
> different variables in both initial dataframes. Variables that were  
> absent in one dataframe would automatically get missing values in  
> the joint dataframe.
>
> Here is a simple example. The initial dataframes are a and b. All  
> variables in b are also in a.
>
> a <- data.frame(X=seq(1,10), matrix(runif(100, 0,15), ncol=10))
> b <- data.frame(X=seq(16,20), X4=runif(5,0,15))
>
> A merge does not work because the common variable X4 becomes 2  
> variables, X4.x and X4.y.
>
> c <- merge(a,b,by="X", all=T)
>
> This can be fixed but it requires several steps (although my  
> solution is probably not optimal):
>
> names(c)[5] <- "X4"
> c$X4[is.na(c$X4)] <- c$X4.y[is.na(c$X4)]
> c <- c[,1:11]
>
> One quickly becomes tired with this solution with my real-life  
> dataframes where different columns would require "repair" from one  
> case to the next.
>
> I think I still prefer making the narrower dataframe like the wider  
> one:
>
> b2 <- upData(b, X1=NA, X2=NA, X3=NA, X5=NA, X6=NA, X7=NA, X8=NA,  
> X9=NA, X10=NA)
> b2 <- b2[,c(1, 3:5, 2, 6:11)]
>
> d <- rbind(a, b2)
>
> But again this requires quite a bit of fine-tuning from one case to  
> the next in my real-life dataframes.
>
> I suspect R has a neat way to do this and I just cannot come up  
> with the proper search term to find help on my own.
>
> Or this can be automated: can one compare variable lists from 2  
> dataframes and add missing variables in the "narrower" dataframe?
>
> Ideally, the solution would be able to handle the situation where  
> the narrower dataframe contains one or more variables that are  
> absent from the wider one. If this was the case, I'd like the new  
> variable to be present in the combined dataframe, with missing  
> values given to the observations from the wider dataframe.
>
> Thanks in advance,
>
> Denis Chabot
>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining dataframes with different numbers of columns

2006-11-07 Thread hadley wickham
> Or, try this:
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/77358.html

It's interesting to compare your implementation:

rbind.all <- function(...) {
   x <- list(...)
   cn <- unique(unlist(lapply(x, colnames)))
   for(i in seq(along = x)) {
 if(any(m <- !cn %in% colnames(x[[i]]))) {
   na <- matrix(NA, nrow(x[[i]]), sum(m))
   dimnames(na) <- list(rownames(x[[i]]), cn[m])
   x[[i]] <- cbind(x[[i]], na)
 }
   }
   do.call(rbind, x)
}

with mine:

rbind.fill <- function (...) {
dfs <- list(...)
if (length(dfs) == 0)
return(list())
all.names <- unique(unlist(lapply(dfs, names)))
do.call("rbind", compact(lapply(dfs, function(df) {
if (length(df) == 0 || nrow(df) == 0)
return(NULL)
missing.vars <- setdiff(all.names, names(df))
if (length(missing.vars) > 0)
df[, missing.vars] <- NA
df
})))
}

they're pretty similar!

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining dataframes with different numbers of columns

2006-11-07 Thread Sundar Dorai-Raj


hadley wickham said the following on 11/7/2006 8:46 PM:
> On 11/7/06, Denis Chabot <[EMAIL PROTECTED]> wrote:
>> Dear list members,
>>
>> I have to combine dataframes together. However they contain different
>> numbers of variables. It is possible that all the variables in the
>> dataframe with fewer variables are contained in the dataframe with
>> more variables, though it is not always the case.
>>
>> There are key variables identifying observations. These could be used
>> in a merge statement, although this won't quite work for me (see below).
>>
>> I was hoping to find a way to combine dataframes where I needed only
>> to ensure the key variables were present. The total number of
>> variables in the final dataframe would be the total number of
>> different variables in both initial dataframes. Variables that were
>> absent in one dataframe would automatically get missing values in the
>> joint dataframe.
> 
> Have a look at rbind.fill in the reshape package.
> 
> library(reshape)
> rbind.fill(data.frame(a=1), data.frame(b=2))
> rbind.fill(data.frame(a=1), data.frame(a=2, b=2))
> 
> 
> Hadley
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


Or, try this:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/77358.html

HTH,

--sundar

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining dataframes with different numbers of columns

2006-11-07 Thread Stephen D. Weigand
Denis,

On Nov 7, 2006, at 8:30 PM, Denis Chabot wrote:

> Dear list members,
>
> I have to combine dataframes together. However they contain different
> numbers of variables. It is possible that all the variables in the
> dataframe with fewer variables are contained in the dataframe with
> more variables, though it is not always the case.
>
> There are key variables identifying observations. These could be used
> in a merge statement, although this won't quite work for me (see 
> below).
>
> I was hoping to find a way to combine dataframes where I needed only
> to ensure the key variables were present. The total number of
> variables in the final dataframe would be the total number of
> different variables in both initial dataframes. Variables that were
> absent in one dataframe would automatically get missing values in the
> joint dataframe.
>
> Here is a simple example. The initial dataframes are a and b. All
> variables in b are also in a.
>
> a <- data.frame(X=seq(1,10), matrix(runif(100, 0,15), ncol=10))
> b <- data.frame(X=seq(16,20), X4=runif(5,0,15))
>
> A merge does not work because the common variable X4 becomes 2
> variables, X4.x and X4.y.
>
> c <- merge(a,b,by="X", all=T)


[snipped]


> Thanks in advance,
>
> Denis Chabot
>

Will

   merge(a, b, by = intersect(names(a), names(b)), all = TRUE)

do what you want? (Note the 'by' argument uses the default so
it can be left out.)

Hope this helps,

Stephen

Stephen Weigand
Rochester, Minnesota, USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining dataframes with different numbers of columns

2006-11-07 Thread hadley wickham
On 11/7/06, Denis Chabot <[EMAIL PROTECTED]> wrote:
> Dear list members,
>
> I have to combine dataframes together. However they contain different
> numbers of variables. It is possible that all the variables in the
> dataframe with fewer variables are contained in the dataframe with
> more variables, though it is not always the case.
>
> There are key variables identifying observations. These could be used
> in a merge statement, although this won't quite work for me (see below).
>
> I was hoping to find a way to combine dataframes where I needed only
> to ensure the key variables were present. The total number of
> variables in the final dataframe would be the total number of
> different variables in both initial dataframes. Variables that were
> absent in one dataframe would automatically get missing values in the
> joint dataframe.

Have a look at rbind.fill in the reshape package.

library(reshape)
rbind.fill(data.frame(a=1), data.frame(b=2))
rbind.fill(data.frame(a=1), data.frame(a=2, b=2))


Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining dataframes with different numbers of columns

2006-11-07 Thread Denis Chabot
Dear list members,

I have to combine dataframes together. However they contain different  
numbers of variables. It is possible that all the variables in the  
dataframe with fewer variables are contained in the dataframe with  
more variables, though it is not always the case.

There are key variables identifying observations. These could be used  
in a merge statement, although this won't quite work for me (see below).

I was hoping to find a way to combine dataframes where I needed only  
to ensure the key variables were present. The total number of  
variables in the final dataframe would be the total number of  
different variables in both initial dataframes. Variables that were  
absent in one dataframe would automatically get missing values in the  
joint dataframe.

Here is a simple example. The initial dataframes are a and b. All  
variables in b are also in a.

a <- data.frame(X=seq(1,10), matrix(runif(100, 0,15), ncol=10))
b <- data.frame(X=seq(16,20), X4=runif(5,0,15))

A merge does not work because the common variable X4 becomes 2  
variables, X4.x and X4.y.

c <- merge(a,b,by="X", all=T)

This can be fixed but it requires several steps (although my solution  
is probably not optimal):

names(c)[5] <- "X4"
c$X4[is.na(c$X4)] <- c$X4.y[is.na(c$X4)]
c <- c[,1:11]

One quickly becomes tired with this solution with my real-life  
dataframes where different columns would require "repair" from one  
case to the next.

I think I still prefer making the narrower dataframe like the wider one:

b2 <- upData(b, X1=NA, X2=NA, X3=NA, X5=NA, X6=NA, X7=NA, X8=NA,  
X9=NA, X10=NA)
b2 <- b2[,c(1, 3:5, 2, 6:11)]

d <- rbind(a, b2)

But again this requires quite a bit of fine-tuning from one case to  
the next in my real-life dataframes.

I suspect R has a neat way to do this and I just cannot come up with  
the proper search term to find help on my own.

Or this can be automated: can one compare variable lists from 2  
dataframes and add missing variables in the "narrower" dataframe?

Ideally, the solution would be able to handle the situation where the  
narrower dataframe contains one or more variables that are absent  
from the wider one. If this was the case, I'd like the new variable  
to be present in the combined dataframe, with missing values given to  
the observations from the wider dataframe.

Thanks in advance,

Denis Chabot

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.