Hi Sergey, I believe the code below should get you close to want you want.
For dates, I usually store them as "POSIXct" classes in data frames, but according to Gabor Grothendieck and Thomas Petzoldt's R Help Desk article <http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf>, I should probably be using "chron" date and times... Nonetheless, POSIXct casses are what I know so I can show you that to get the month out of your column (replace "8.29.97" with your variable), you can do the following: month = format(strptime("8.29.97",format="%m.%d.%y"),format="%m") Or, month = as.data.frame(strsplit("8.29.97","\\."))[1,] In any case, here is a code, in which I follow a series of function application and definitions (which effectively includes successive application of split() and lapply(). Best regards, ST # define data (I just made this up) df <- data.frame(month=as.character(rep(1:3,each=30)),fac=factor(rep(1:2,each=15)), data1=round(runif(90),2), data2=round(runif(90),2)) # define functions to split the data and another # to get statistics doSplits <- function(df) { unlist(lapply(split(df,df$month),function(x) split(x,x$fac)),recursive=FALSE) } getStats <- function(x,f) { return(as.data.frame(lapply(x[unlist(lapply(x,mode))=="numeric" & unlist(lapply(x,class))!="factor"],f))) } # create a matrix of data, means, and standard deviations listMatrix <- cbind(Data=doSplits(df), Means=lapply(doSplits(df),getStats,mean), SDs=lapply(doSplits(df),getStats,sd)) # function to subtract means and divide by standard deviations transformData <- function(x) { newdata <- x$Data matchedNames <- match(names(x$Means),names(x$Data)) newdata[matchedNames] <- sweep(sweep(data.matrix(x$Data[matchedNames]),2,unlist(x$Means),"-"), 2,unlist(x$SDs),"/") return(newdata) } # apply to data newDF <- lapply(as.data.frame(t(listMatrix)),transformData) # Defind Fold function Fold <- function(f, x, L) for(e in L) x <- f(x, e) # Apply this to the data finalData <- Fold(rbind,vector(),newDF) --- Sergey Goriatchev <[EMAIL PROTECTED]> wrote: > Hi, fellow R users. > > I have a question about sapply and split combination. > > I have a big dataframe (40000 observations, 21 variables). First > variable (factor) is "date" and it is in format "8.29.97", that is, I > have monthly data. Second variable (also factor) has levels 1 to 6 > (fractiles 1 to 5 and missing value with code 6). The other 19 > variables are numeric. > For each month I have several hunder observations of 19 numeric and 1 > factor. > > I am normalizing the numeric variables by dividing val1 by val2, where: > > val1: (for each month, for each numeric variable) difference between > mean of ith numeric variable in fractile 1, and mean of ith numeric > variable in fractile 5. > > val2: (for each month, for each numeric variable) standard deviation > for ith numeric variable. > > Basically, as far as I understand, I need to use split() function several > times. > To calculate val1 I need to use split() twice - first to split by > month and then split by fractile. Is this even possible to do (since > after first application of split() I get a list)?? > > Is there a smart way to perform this normalization computation? > > My knowledge of R is not so advanced, but I need to know an efficient > way to perform calculations of this kind. > > Would really appreciate some help from experienced R users! > > Regards, > S > > -- > Laziness is nothing more than the habit of resting before you get tired. > - Jules Renard (writer) > > Experience is one thing you can't get for nothing. > - Oscar Wilde (writer) > > When you are finished changing, you're finished. > - Benjamin Franklin (Diplomat) > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
