>On 4/19/07, Marra, David <[EMAIL PROTECTED]> wrote:
>> Thank you to everyone who contributed to understanding the multi-core
>> problem better. I took Elijah's advice and purchased a Leopard
>> pre-release DVD and will post performance results here when it arrives
>> and if the results are interesting (should be!). I'll also post the
>> result from Simon's BLAS test in a few hours.
>>
>> In the meantime, there is a speed problem to solve. Appreciate advice
>> anyone may have on potential approaches for speeding up the following
>> function. Based on previous comments, fewer calls to memory may be
>> important...
>>
>> results <- function(x){
>> fit <- lm(Y ~
>> get(cmb[x,1])+get(cmb[x,2])+get(cmb[x,3])+get(cmb[x,4])+get(cmb[x,5]),
>> data=data1)
>> list(R2=summary(fit)$adj.r.squared) }
>>
>> This call to lm function is nested in a parSapply function that iterates
>> down the rows of the "cmb" matrix. Each row of cmb has, in the example
>> above, 5 character values (such as "var1", "var2",..."var5")
>> corresponding to variable names in the "data1" dataframe. The function
>> iterates down the rows, generating regressions, each with a different
>> combination of variables. (x just goes from 1 to whatever number of rows
>> are in cmb.) Finally the function delivers the R2 for each combination.>Can you be more specific about what x is here? What you write makes >it sound as if x is a single row but you wouldn't be able to do a >linear model fit on a single row. It must be more than one row. >The immediate way to speed things up is to use lm.fit directly instead >of going through lm. The lm function is a convenience function to >take a formula/data representation of a linear model along with >several optional arguments and create the model matrices. In this >case you can create the model matrix for all the rows in a single >call, provided that it fits into memory, then farm out the individual >fits. Also, the call to summary does a lot more that calculate an >adjusted R-squared. You can calculate this single statistic directly >from the dimensions of the problem and the "effects" component of the >lm fit. I will try to clarify. The purpose of the function is to create x different lm models and extract their R2s. If x is 1:500 that means 500 unique models, each with a different combination of arguments. The 500 unique combinations of argument names are stored in cmb. One combination in each row. If there are 500 combinations of 4 arguments each, the cmb matrix has 500 rows and 4 columns. For example row 29 might contain the following 4 character values: "Var2", "Var7", "Var18", "Var30". Literally, just characters. A text file, if you will. The characters "Var2" would be in the first column, "Var7, in the second.."Var18" in the fourth. The function I would like to speed up, if it is possible, then gets variable names from cmb and the data from data1. data1 is a large dataframe with all the variables, Var1 to Var30, and their data. >> >> Any speed-up ideas? >> >> David >> >> _______________________________________________ >> R-SIG-Mac mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/r-sig-mac >> _______________________________________________ R-SIG-Mac mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-mac
