Re: [R] Automate a data load and merge

jim holtman Fri, 12 Jun 2009 09:28:08 -0700

See if this works for you:

# read into a list and then rbind to single data frame
input <- do.call(rbind, lapply(files, function(.file){
    X <- read.csv(.file)
    X$label <- gsub('.csv$', '', .file)  # add name
    X
}))
# use the reshape package
require(reshape)
i.melt <- melt(input, id=c("label", "Item_name"), measure="Occurance")
output <- cast(i.melt, Item_name ~ label)




On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <[email protected]> wrote:

> Hi R list,
>        I would like to automate, or speed up the process from which I take
> several separate datasets, stored in .csv formate, import and merge
> them by a common variable.  So far I have greatly sped up the loading
> process but cannot think of a way to automate the merger of all
> datasets into a common data.frame.
>        My apologies if this has been covered, any R search suggestions are
> appreciated.
>
> # All scripts function out of the base directory
> rm(list=ls())
> setwd('/Users/myuser/Documents/workfolder/')
>
> # Check files and list all .csv in directory
> files<-list.files()
> files<-files[grep('.csv', files)]
> # Create labels for each file (ex. June08.csv becomes June08)
> labels<-gsub('.csv', '', files)
>
> # Load all .csv datasets and assign name
>
> item<-vector() # preallocate an index of all items in datasets
> for(i in 1:length(files)){
>        X<-read.csv(files[i])
>        item<-union(item, X$Item_Name)
>        assign(labels[i], X)
>        }
> # What is loaded
> ls()
> # [1] "files"    "i"        "item"     "June01" "June02" "June03"
> "labels"
>
> # What does everything look like?
> str(June03)
> #'data.frame':  992 obs. of  8 variables:
> # $ Item_Name        : Factor w/ 992 levels "Birds","Fish",..: 1 2 3 4
> 5 6 7 8 9 10 ...
> # $ Occurance     : int  30 30 50 450 75 550 100 500 250 75 ...
>
> str(June01)
> #'data.frame':  819 obs. of  8 variables:
> # $ Item_Name        : Factor w/ 819 levels "Birds","Turtles",..: 1 2
> 3 4 5 6 7 8 9 10 ...
> # $ Occurance     : int  30 50 450 750 550 100 500 250 275 450 ...
>
> # Here is where I'm stuck...
> #I would like to:
> #       Create a data.frame with an index column composed of the union of
> all items
> #       Create columns in the frame by a merger of the 'Occurance' in each
> loaded dataset and are labeled by their name (eg. June01)
> #       Automate this procedure so that I do not have to manuualy type in
> each column addition when I have a new dataset.
>
> # This is my current strategy, but when I have new datasets I have to
> mannually setup the preallocation and merger
>
> allData<-data.frame(Item=item, June01 =NA, June02=NA,  June03 =NA)
> allData[match(June01$Item_Name, allData$Item ),]$June01 <-
> June01$Occurance
> allData[match(June02$Item_Name, allData$Item ),]$June02 <-
> June02$Occurance
> allData[match(June03$Item_Name, allData$Item ),]$June03 <-
> June03$Occurance
>
> # Any help to automate this process is greatly appreciated!!!
>
> sessionInfo()
> #R version 2.9.0 (2009-04-17)
> #i386-apple-darwin8.11.1
> #
> #locale:
> #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> #
> #attached base packages:
> #[1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> Jon Loehrke
> Graduate Research Assistant
> Department of Fisheries Oceanography
> School for Marine Science and Technology
> University of Massachusetts
> 200 Mill Road, Suite 325
> Fairhaven, MA 02719
> [email protected]
> T 508-910-6393
> F 508-910-6396
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automate a data load and merge

Reply via email to