[R] Winsorizing Multiple Variables
Hi All, I want to take a matrix (or data frame) and winsorize each variable. So I can, for example, correlate the winsorized variables. The code below will winsorize a single vector, but when applied to several vectors, each ends up sorted independently in ascending order so that a given observation is no longer on the same row for each vector. So I need to winsorize the variable but then return it to its original order. Or another solution that will take a data frame, wisorize each variable, and return a new data frame with all the variables in the original order. Thanks for any help! -Karl #The function I'm working from win-function(x,tr=.2,na.rm=F){ if(na.rm)x-x[!is.na(x)] y-sort(x) n-length(x) ibot-floor(tr*n)+1 itop-length(x)-ibot+1 xbot-y[ibot] xtop-y[itop] y-ifelse(y=xbot,xbot,y) y-ifelse(y=xtop,xtop,y) win-y win } #Produces an example data frame, ss is the observation id, vars 1-5 are the variables I want to winzorise. ss = c (1 : 5 );var1 = rnorm (5 );var2 = rnorm (5 );var3 =rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- data data #Winsorizes each variable, but sorts them independently so the observations no longer line up. sapply(data,win) ___ M. Karl Healey Ph.D. Student Department of Psychology University of Toronto Sidney Smith Hall 100 St. George Street Toronto, ON M5S 3G3 k...@psych.utoronto.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Winsorizing Multiple Variables
Might work better to determine top and bottom for each column with quantile() using an appropriate quantile option, and then process each variable in place with your ifelse logic. I did find a somewhat different definition of winsorization with no sorting in this code copied from a Patrick Burns posting from earlier this year on R-SIG-Finance; function(x, winsorize=5) { s - mad(x) * winsorize top - median(x) + s bot - median(x) - s x[x top] - top x[x bot] - bot x } -- David Winsemius On Jan 16, 2009, at 3:50 PM, Karl Healey wrote: Hi All, I want to take a matrix (or data frame) and winsorize each variable. So I can, for example, correlate the winsorized variables. The code below will winsorize a single vector, but when applied to several vectors, each ends up sorted independently in ascending order so that a given observation is no longer on the same row for each vector. So I need to winsorize the variable but then return it to its original order. Or another solution that will take a data frame, wisorize each variable, and return a new data frame with all the variables in the original order. Thanks for any help! -Karl #The function I'm working from win-function(x,tr=.2,na.rm=F){ if(na.rm)x-x[!is.na(x)] y-sort(x) n-length(x) ibot-floor(tr*n)+1 itop-length(x)-ibot+1 xbot-y[ibot] xtop-y[itop] y-ifelse(y=xbot,xbot,y) y-ifelse(y=xtop,xtop,y) win-y win } #Produces an example data frame, ss is the observation id, vars 1-5 are the variables I want to winzorise. ss = c (1 : 5 );var1 = rnorm (5 );var2 = rnorm (5 );var3 =rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- data data #Winsorizes each variable, but sorts them independently so the observations no longer line up. sapply(data,win) ___ M. Karl Healey Ph.D. Student Department of Psychology University of Toronto Sidney Smith Hall 100 St. George Street Toronto, ON M5S 3G3 k...@psych.utoronto.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Winsorizing Multiple Variables
Don't sort y. Calculate xbot and xtop using xtemp-quantile(y,c(tr,1-tr),na.rm=na.rm) xbot-xtemp[1] xtop-xtemp[2] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Healey Sent: Friday, January 16, 2009 2:51 PM To: r-help@r-project.org Subject: [R] Winsorizing Multiple Variables Hi All, I want to take a matrix (or data frame) and winsorize each variable. So I can, for example, correlate the winsorized variables. The code below will winsorize a single vector, but when applied to several vectors, each ends up sorted independently in ascending order so that a given observation is no longer on the same row for each vector. So I need to winsorize the variable but then return it to its original order. Or another solution that will take a data frame, wisorize each variable, and return a new data frame with all the variables in the original order. Thanks for any help! -Karl #The function I'm working from win-function(x,tr=.2,na.rm=F){ if(na.rm)x-x[!is.na(x)] y-sort(x) n-length(x) ibot-floor(tr*n)+1 itop-length(x)-ibot+1 xbot-y[ibot] xtop-y[itop] y-ifelse(y=xbot,xbot,y) y-ifelse(y=xtop,xtop,y) win-y win } #Produces an example data frame, ss is the observation id, vars 1-5 are the variables I want to winzorise. ss = c (1 : 5 );var1 = rnorm (5 );var2 = rnorm (5 );var3 =rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- data data #Winsorizes each variable, but sorts them independently so the observations no longer line up. sapply(data,win) ___ M. Karl Healey Ph.D. Student Department of Psychology University of Toronto Sidney Smith Hall 100 St. George Street Toronto, ON M5S 3G3 k...@psych.utoronto.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Winsorizing Multiple Variables
Thanks to Michael for giving a nice solution to Karl's question . This identified a bug in the psych package winsor function which has now been fixed in version 1.0.63. (The current development version). Although my winsor.means function in 1.0..62 (and ealier) worked correctly, my winsor function when applied to matrices or data.frames gave an incorrect result. Bill At 1:24 PM -0800 1/16/09, Michael Conklin wrote: Don't sort y. Calculate xbot and xtop using xtemp-quantile(y,c(tr,1-tr),na.rm=na.rm) xbot-xtemp[1] xtop-xtemp[2] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Healey Sent: Friday, January 16, 2009 2:51 PM To: r-help@r-project.org Subject: [R] Winsorizing Multiple Variables Hi All, I want to take a matrix (or data frame) and winsorize each variable. So I can, for example, correlate the winsorized variables. The code below will winsorize a single vector, but when applied to several vectors, each ends up sorted independently in ascending order so that a given observation is no longer on the same row for each vector. So I need to winsorize the variable but then return it to its original order. Or another solution that will take a data frame, wisorize each variable, and return a new data frame with all the variables in the original order. Thanks for any help! -Karl #The function I'm working from win-function(x,tr=.2,na.rm=F){ if(na.rm)x-x[!is.na(x)] y-sort(x) n-length(x) ibot-floor(tr*n)+1 itop-length(x)-ibot+1 xbot-y[ibot] xtop-y[itop] y-ifelse(y=xbot,xbot,y) y-ifelse(y=xtop,xtop,y) win-y win } #Produces an example data frame, ss is the observation id, vars 1-5 are the variables I want to winzorise. ss = c (1 : 5 );var1 = rnorm (5 );var2 = rnorm (5 );var3 =rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- data data #Winsorizes each variable, but sorts them independently so the observations no longer line up. sapply(data,win) ___ M. Karl Healey Ph.D. Student Department of Psychology University of Toronto Sidney Smith Hall 100 St. George Street Toronto, ON M5S 3G3 k...@psych.utoronto.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- William Revelle http://personality-project.org/revelle.html Professor http://personality-project.org/personality.html Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern University http://www.northwestern.edu/ Attend ISSID/ARP:2009 http://issid.org/issid.2009/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.