[R] shoudl I use apply, sapply, etc instead of a "for loop"?

Thomas Pujol Wed, 20 Jun 2007 11:59:09 -0700

I have been trying to learn the various "apply" functions but am still learning 
their appropriate use.  I appreciate any help the R community can offer me.  
Sorry for the length of this post.


Background:

I have data on my hard drive organized in the following manner:

The data pertains to many different "samples" of data. (e.g. sample 001, 
sample, 002, sample 003, etc.)

Each "sample" contains many different "data frames" for a large number of 
different data-items. 
(e.g. sat score, median income of zip-code, gender, GPA, etc)

The data frames and files are each named with the data-item name as the 
"prefix" of the name and the "sample number" as the suffix of the name.
e.g. sat.001, income.001, sat.002, income.002

Each data frame has approximately 5,000 rows, 1 for each "person".

Note: The files are somehat large, and most of my analysis will be completed 
within each "sample" .  (Thus, I think that I should probably keep the files 
stored as separate files, and not combine them into a larger list or data 
frames. I also do not think I want to load all the files for multiple samples 
at once, as this mayy take up too much memory.)  Also, I have similar 
simplified description of the files; many contain multiple columns of data.


###############
I have written a "for" loop that does the following:

a. For each "sample period" I load two files.
b. I perform a function on the data contain din these two files.
c. I take the results and save them as a new file.
I proceed to the next sample.

Is there a "better" (i.e. more elegant and/or efficient) way to do this, 
perhaps with one of the "apply" functions? (e.g. apply, sapply, lapply, tapply?)

#e.g. my simplified code

#this creates example data:
sat.001=c(500,400,750)
sat.002=c(245,455,767)

income.001=c(5020,4200,7250)
income.002=c(2425,4525,7627)

filenames=c('sat.001', 'sat.002', 'income.001', 'income.002')
sapply(filenames,function(x) { save( list=x , file = paste(x ,'.r', sep ='')  ) 
})
rm(sat.001,sat.002,income.001,income.002,filenames)
ls() #
##############
#my for loop

divide = function(x,y) {x/y} 
#creates a custom function


#inputs to my loop:
samplenames=c('001','002')
x.name='sat'
y.name='income'
fun='divide'

for (i in 1:length(samplenames) ) {

x.name.suf = paste(x.name,samplenames[i],sep='.') 
#name of x file on hrd drive

y.name.suf = paste(y.name,samplenames[i],sep='.') 
#name of y file on hrd drive

x=get(load(file = paste(x.name.suf ,'r', sep ='.')  , envir = .GlobalEnv) ) 
#loads and gets the x file

y=get(load(file = paste(y.name.suf ,'r', sep ='.')  , envir = .GlobalEnv) ) 
#loads and gets the y file

temp=get(fun)(x,y) 
#applies custom function specified in arguments above
# to data  contained in x and y files

save( list='temp' , file = paste(fun,x.name ,y.name,samplenames[i],sep='.') ) 
#save the results in files with name that specifies 
#name of function, name of x, name of y, and sample number
#files will be used for later analysis

rm(list=paste(x.name.suf , sep ='.'))
rm(list=paste(y.name.suf , sep ='.'))
rm(x.name.suf,y.name.suf,x,y,temp)
}

rm(divide,samplenames,x.name,y.name,fun,i)
ls()


 
---------------------------------
Bored stiff? Loosen up...

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] shoudl I use apply, sapply, etc instead of a "for loop"?

Reply via email to