See if this works for you:
# read into a list and then rbind to single data frame
input <- do.call(rbind, lapply(files, function(.file){
X <- read.csv(.file)
X$label <- gsub('.csv$', '', .file) # add name
X
}))
# use the reshape package
require(reshape)
i.melt <- melt(input, id=c("label", "Item_name"), measure="Occurance")
output <- cast(i.melt, Item_name ~ label)
On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <[email protected]> wrote:
> Hi R list,
> I would like to automate, or speed up the process from which I take
> several separate datasets, stored in .csv formate, import and merge
> them by a common variable. So far I have greatly sped up the loading
> process but cannot think of a way to automate the merger of all
> datasets into a common data.frame.
> My apologies if this has been covered, any R search suggestions are
> appreciated.
>
> # All scripts function out of the base directory
> rm(list=ls())
> setwd('/Users/myuser/Documents/workfolder/')
>
> # Check files and list all .csv in directory
> files<-list.files()
> files<-files[grep('.csv', files)]
> # Create labels for each file (ex. June08.csv becomes June08)
> labels<-gsub('.csv', '', files)
>
> # Load all .csv datasets and assign name
>
> item<-vector() # preallocate an index of all items in datasets
> for(i in 1:length(files)){
> X<-read.csv(files[i])
> item<-union(item, X$Item_Name)
> assign(labels[i], X)
> }
> # What is loaded
> ls()
> # [1] "files" "i" "item" "June01" "June02" "June03"
> "labels"
>
> # What does everything look like?
> str(June03)
> #'data.frame': 992 obs. of 8 variables:
> # $ Item_Name : Factor w/ 992 levels "Birds","Fish",..: 1 2 3 4
> 5 6 7 8 9 10 ...
> # $ Occurance : int 30 30 50 450 75 550 100 500 250 75 ...
>
> str(June01)
> #'data.frame': 819 obs. of 8 variables:
> # $ Item_Name : Factor w/ 819 levels "Birds","Turtles",..: 1 2
> 3 4 5 6 7 8 9 10 ...
> # $ Occurance : int 30 50 450 750 550 100 500 250 275 450 ...
>
> # Here is where I'm stuck...
> #I would like to:
> # Create a data.frame with an index column composed of the union of
> all items
> # Create columns in the frame by a merger of the 'Occurance' in each
> loaded dataset and are labeled by their name (eg. June01)
> # Automate this procedure so that I do not have to manuualy type in
> each column addition when I have a new dataset.
>
> # This is my current strategy, but when I have new datasets I have to
> mannually setup the preallocation and merger
>
> allData<-data.frame(Item=item, June01 =NA, June02=NA, June03 =NA)
> allData[match(June01$Item_Name, allData$Item ),]$June01 <-
> June01$Occurance
> allData[match(June02$Item_Name, allData$Item ),]$June02 <-
> June02$Occurance
> allData[match(June03$Item_Name, allData$Item ),]$June03 <-
> June03$Occurance
>
> # Any help to automate this process is greatly appreciated!!!
>
> sessionInfo()
> #R version 2.9.0 (2009-04-17)
> #i386-apple-darwin8.11.1
> #
> #locale:
> #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> #
> #attached base packages:
> #[1] stats graphics grDevices utils datasets methods base
>
>
> Jon Loehrke
> Graduate Research Assistant
> Department of Fisheries Oceanography
> School for Marine Science and Technology
> University of Massachusetts
> 200 Mill Road, Suite 325
> Fairhaven, MA 02719
> [email protected]
> T 508-910-6393
> F 508-910-6396
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.