Hi: Here's another way to look at the problem. Instead of manually adding a new column after k datasets have been read in, read your individual data files into a list, as long as they all have the same variable names and the same class (in this case, data.frame). Then create a vector of names for the list components and use 'apply family' logic to get the column means, returning the combined results to a data frame or matrix. Here's a toy example to illustrate the point. Firstly, three data frames are created and saved to external files:
# Create some artificial data and ship to external files d1 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25)) d2 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25)) d3 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25)) write.csv(d1, file = "d1.csv", row.names = TRUE, quote = FALSE) write.csv(d2, file = "d2.csv", row.names = TRUE, quote = FALSE) write.csv(d3, file = "d3.csv", row.names = TRUE, quote = FALSE) ### # Now, read them back in and store them in a list object # Vector of file names to process files <- paste0("d", 1:3, ".csv") # Create the list of data frames and assign names to list components L <- lapply(files, function(x) read.csv(x, header = TRUE)) names(L) <- paste0("d", 1:3) # Compute column means from each list component and row bind them # Method 1: base R do.call(rbind, lapply(L, colMeans)) # Method 2: plyr package library(plyr) ldply(L, colMeans) Dennis On Wed, Nov 18, 2015 at 2:19 AM, Jesús Para Fernández <j.para.fernan...@hotmail.com> wrote: > Hi everyone > > I have a dataframe "data" wich is the result of join multiple csv (400 rows > and 600cols every csv). The "data" dataframe has n rows and m columns (200000 > rows and 600 cols) , and I have add a new colum, "csvdata", in which I > specify the number of csv at wich those data belong. > > So, the dataframe "data" looks like: > > x1 x2 x3 .... xn csvdata > 21 23 32 .... 12 1 > 27 21 39 .... 14 1 > 24 22 30 .... 11 1 > .............................................. > 21 24 32 .... 19 2 > 27 21 39 .... 14 2 > .............................................. > 27 22 30 .... 11 n > > > > I want to store into a matrix the mean values of different substes of data of > every csv, for example: > > region1,1 (rows 1:20,columns 1:20) for every "csvdata" value > region 2,1 (rows 21:40,columns 1:20) para every "csvdata" value > .... > > And so on for hole data.frame. > > I have tryed: > > area1<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T) > area2<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T) > > But this error is the output I obtain: > > Error in tapply(data[1:30, ], datos$nueva, mean, na.rm = T) : > arguments must have same length > > I´m sure that it is not very complex to do it, but I have no idea of how to > do it. > > Thanks for all. > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.