### Re: [R] Using split() several times in a row?

SteT == Stephen Tucker [EMAIL PROTECTED] on Fri, 30 Mar 2007 18:41:39 -0700 (PDT) writes: [..] SteT For dates, I usually store them as POSIXct classes SteT in data frames, but according to Gabor Grothendieck SteT and Thomas Petzoldt's R Help Desk article SteT http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf, SteT I should probably be using chron date and times... I don't think you should (and I doubt Gabor and Thomas would recommend this in every case): POSIXct (and 'POSIXlt', 'POSIXt' 'Date') are part of standard R, and whereas they may seem not as convenient in all cases as chron etc, I'd rather recommed to stick to them in such a case. SteT Nonetheless, POSIXct casses are what I know so I can SteT show you that to get the month out of your column SteT (replace 8.29.97 with your variable), you can do the SteT following: SteT month = format(strptime(8.29.97,format=%m.%d.%y),format=%m) SteT Or, SteT month = as.data.frame(strsplit(8.29.97,\\.))[1,] [..etc..veryuseful..advice] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Using split() several times in a row?

On 3/31/07, Martin Maechler [EMAIL PROTECTED] wrote: SteT == Stephen Tucker [EMAIL PROTECTED] on Fri, 30 Mar 2007 18:41:39 -0700 (PDT) writes: [..] SteT For dates, I usually store them as POSIXct classes SteT in data frames, but according to Gabor Grothendieck SteT and Thomas Petzoldt's R Help Desk article SteT http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf, SteT I should probably be using chron date and times... I don't think you should (and I doubt Gabor and Thomas would recommend this in every case): POSIXct (and 'POSIXlt', 'POSIXt' 'Date') are part of standard R, and whereas they may seem not as convenient in all cases as chron etc, I'd rather recommed to stick to them in such a case. There is one change that has occurred since the article that in my mind would let you safely use POSIX but its pretty drastic. At the time of the article you could not set the time zone to GMT in the R process on Windows but now you can do this: Sys.putenv(TZ = GMT) and you can also change it back like this: Sys.putenv(TZ = ) Since the problem is that you never can be sure which time zone the time is interpreted in within various function (although you can be pretty sure its either the local time zone or GMT) by setting the process to GMT you make the two alternatives the same so it no longer matters. Short of the above, the recommendations of the article should be followed. Its not a matter of convenience. Its a matter of being error prone and introducing subtle time-zone related errors into your code which are very hard to track down or worse, even realize that you have. Those who claim that its not a problem simply have not used dates and times enough or they would not say that. I have seen posters make such comments on this list only later to run into subtle time zone problems that they never would have had had they followed the advice in the article. I've used R and dates a lot and therefore have made a lot of programming errors and these recommendations come from bitter experience looking back to see how I could have avoided them. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

### Re: [R] Using split() several times in a row?

Hi Sergey, I believe the code below should get you close to want you want. For dates, I usually store them as POSIXct classes in data frames, but according to Gabor Grothendieck and Thomas Petzoldt's R Help Desk article http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf, I should probably be using chron date and times... Nonetheless, POSIXct casses are what I know so I can show you that to get the month out of your column (replace 8.29.97 with your variable), you can do the following: month = format(strptime(8.29.97,format=%m.%d.%y),format=%m) Or, month = as.data.frame(strsplit(8.29.97,\\.))[1,] In any case, here is a code, in which I follow a series of function application and definitions (which effectively includes successive application of split() and lapply(). Best regards, ST # define data (I just made this up) df - data.frame(month=as.character(rep(1:3,each=30)),fac=factor(rep(1:2,each=15)), data1=round(runif(90),2), data2=round(runif(90),2)) # define functions to split the data and another # to get statistics doSplits - function(df) { unlist(lapply(split(df,df$month),function(x) split(x,x$fac)),recursive=FALSE) } getStats - function(x,f) { return(as.data.frame(lapply(x[unlist(lapply(x,mode))==numeric unlist(lapply(x,class))!=factor],f))) } # create a matrix of data, means, and standard deviations listMatrix - cbind(Data=doSplits(df), Means=lapply(doSplits(df),getStats,mean), SDs=lapply(doSplits(df),getStats,sd)) # function to subtract means and divide by standard deviations transformData - function(x) { newdata - x$Data matchedNames - match(names(x$Means),names(x$Data)) newdata[matchedNames] - sweep(sweep(data.matrix(x$Data[matchedNames]),2,unlist(x$Means),-), 2,unlist(x$SDs),/) return(newdata) } # apply to data newDF - lapply(as.data.frame(t(listMatrix)),transformData) # Defind Fold function Fold - function(f, x, L) for(e in L) x - f(x, e) # Apply this to the data finalData - Fold(rbind,vector(),newDF) --- Sergey Goriatchev [EMAIL PROTECTED] wrote: Hi, fellow R users. I have a question about sapply and split combination. I have a big dataframe (4 observations, 21 variables). First variable (factor) is date and it is in format 8.29.97, that is, I have monthly data. Second variable (also factor) has levels 1 to 6 (fractiles 1 to 5 and missing value with code 6). The other 19 variables are numeric. For each month I have several hunder observations of 19 numeric and 1 factor. I am normalizing the numeric variables by dividing val1 by val2, where: val1: (for each month, for each numeric variable) difference between mean of ith numeric variable in fractile 1, and mean of ith numeric variable in fractile 5. val2: (for each month, for each numeric variable) standard deviation for ith numeric variable. Basically, as far as I understand, I need to use split() function several times. To calculate val1 I need to use split() twice - first to split by month and then split by fractile. Is this even possible to do (since after first application of split() I get a list)?? Is there a smart way to perform this normalization computation? My knowledge of R is not so advanced, but I need to know an efficient way to perform calculations of this kind. Would really appreciate some help from experienced R users! Regards, S -- Laziness is nothing more than the habit of resting before you get tired. - Jules Renard (writer) Experience is one thing you can't get for nothing. - Oscar Wilde (writer) When you are finished changing, you're finished. - Benjamin Franklin (Diplomat) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.