[R] Winsorizing Multiple Variables

2009-01-16 Thread Karl Healey

Hi All,

I want to take a matrix (or data frame) and winsorize each variable.  
So I can, for example, correlate the winsorized variables.


The code below will winsorize a single vector, but when applied to  
several vectors, each ends up sorted independently in ascending order  
so that a given observation is no longer on the same row for each  
vector.


So I need to winsorize the variable but then return it to its original  
order. Or another solution that will take a data frame, wisorize each  
variable, and return a new data frame with all the variables in the  
original order.


Thanks for any help!

-Karl


#The function I'm working from

win-function(x,tr=.2,na.rm=F){

   if(na.rm)x-x[!is.na(x)]
   y-sort(x)
   n-length(x)
   ibot-floor(tr*n)+1
   itop-length(x)-ibot+1
   xbot-y[ibot]
   xtop-y[itop]
   y-ifelse(y=xbot,xbot,y)
   y-ifelse(y=xtop,xtop,y)
   win-y
   win
}

#Produces an example data frame, ss is the observation id, vars 1-5  
are the variables I want to winzorise.


ss 
= 
c 
(1 
: 
5 
);var1 
= 
rnorm 
(5 
);var2 
= 
rnorm 
(5 
);var3 
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- 
data

data

#Winsorizes each variable, but sorts them independently so the  
observations no longer line up.


sapply(data,win)


___
M. Karl Healey
Ph.D. Student

Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3

k...@psych.utoronto.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Winsorizing Multiple Variables

2009-01-16 Thread David Winsemius
Might work better to determine top and bottom for each column with  
quantile() using an appropriate quantile option,  and then process  
each variable in place with your ifelse logic.


I did find a somewhat different definition of winsorization with no  
sorting in this code copied from a Patrick Burns posting from earlier  
this year on R-SIG-Finance;


function(x, winsorize=5) {
   s - mad(x) * winsorize
   top - median(x) + s
   bot - median(x) - s
   x[x  top] - top
   x[x  bot] - bot x }

--
David Winsemius
On Jan 16, 2009, at 3:50 PM, Karl Healey wrote:


Hi All,

I want to take a matrix (or data frame) and winsorize each variable.  
So I can, for example, correlate the winsorized variables.


The code below will winsorize a single vector, but when applied to  
several vectors, each ends up sorted independently in ascending  
order so that a given observation is no longer on the same row for  
each vector.


So I need to winsorize the variable but then return it to its  
original order. Or another solution that will take a data frame,  
wisorize each variable, and return a new data frame with all the  
variables in the original order.


Thanks for any help!

-Karl


#The function I'm working from

win-function(x,tr=.2,na.rm=F){

  if(na.rm)x-x[!is.na(x)]
  y-sort(x)
  n-length(x)
  ibot-floor(tr*n)+1
  itop-length(x)-ibot+1
  xbot-y[ibot]
  xtop-y[itop]
  y-ifelse(y=xbot,xbot,y)
  y-ifelse(y=xtop,xtop,y)
  win-y
  win
}

#Produces an example data frame, ss is the observation id, vars 1-5  
are the variables I want to winzorise.


ss 
= 
c 
(1 
: 
5 
);var1 
= 
rnorm 
(5 
);var2 
= 
rnorm 
(5 
);var3 
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- 
data

data

#Winsorizes each variable, but sorts them independently so the  
observations no longer line up.


sapply(data,win)


___
M. Karl Healey
Ph.D. Student

Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3

k...@psych.utoronto.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Winsorizing Multiple Variables

2009-01-16 Thread Michael Conklin
Don't sort y. Calculate xbot and xtop using
xtemp-quantile(y,c(tr,1-tr),na.rm=na.rm)
xbot-xtemp[1]
xtop-xtemp[2]

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Karl Healey
Sent: Friday, January 16, 2009 2:51 PM
To: r-help@r-project.org
Subject: [R] Winsorizing Multiple Variables

Hi All,

I want to take a matrix (or data frame) and winsorize each variable.
So I can, for example, correlate the winsorized variables.

The code below will winsorize a single vector, but when applied to
several vectors, each ends up sorted independently in ascending order
so that a given observation is no longer on the same row for each
vector.

So I need to winsorize the variable but then return it to its original
order. Or another solution that will take a data frame, wisorize each
variable, and return a new data frame with all the variables in the
original order.

Thanks for any help!

-Karl


#The function I'm working from

win-function(x,tr=.2,na.rm=F){

if(na.rm)x-x[!is.na(x)]
y-sort(x)
n-length(x)
ibot-floor(tr*n)+1
itop-length(x)-ibot+1
xbot-y[ibot]
xtop-y[itop]
y-ifelse(y=xbot,xbot,y)
y-ifelse(y=xtop,xtop,y)
win-y
win
}

#Produces an example data frame, ss is the observation id, vars 1-5
are the variables I want to winzorise.

ss
=
c
(1
:
5
);var1
=
rnorm
(5
);var2
=
rnorm
(5
);var3
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
 data
data

#Winsorizes each variable, but sorts them independently so the
observations no longer line up.

sapply(data,win)


___
M. Karl Healey
Ph.D. Student

Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3

k...@psych.utoronto.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Winsorizing Multiple Variables

2009-01-16 Thread William Revelle

Thanks to Michael for giving a nice solution to Karl's question .

This identified a bug in the psych package winsor function which has 
now been fixed in version 1.0.63.  (The current development version). 
Although my winsor.means function  in 1.0..62 (and ealier) worked 
correctly, my winsor function when applied to matrices or data.frames 
gave an incorrect result.


Bill





At 1:24 PM -0800 1/16/09, Michael Conklin wrote:

Don't sort y. Calculate xbot and xtop using
xtemp-quantile(y,c(tr,1-tr),na.rm=na.rm)
xbot-xtemp[1]
xtop-xtemp[2]

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Karl Healey

Sent: Friday, January 16, 2009 2:51 PM
To: r-help@r-project.org
Subject: [R] Winsorizing Multiple Variables

Hi All,

I want to take a matrix (or data frame) and winsorize each variable.
So I can, for example, correlate the winsorized variables.

The code below will winsorize a single vector, but when applied to
several vectors, each ends up sorted independently in ascending order
so that a given observation is no longer on the same row for each
vector.

So I need to winsorize the variable but then return it to its original
order. Or another solution that will take a data frame, wisorize each
variable, and return a new data frame with all the variables in the
original order.

Thanks for any help!

-Karl


#The function I'm working from

win-function(x,tr=.2,na.rm=F){

if(na.rm)x-x[!is.na(x)]
y-sort(x)
n-length(x)
ibot-floor(tr*n)+1
itop-length(x)-ibot+1
xbot-y[ibot]
xtop-y[itop]
y-ifelse(y=xbot,xbot,y)
y-ifelse(y=xtop,xtop,y)
win-y
win
}

#Produces an example data frame, ss is the observation id, vars 1-5
are the variables I want to winzorise.

ss
=
c
(1
:
5
);var1
=
rnorm
(5
);var2
=
rnorm
(5
);var3
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
 data
data

#Winsorizes each variable, but sorts them independently so the
observations no longer line up.

sapply(data,win)


___
M. Karl Healey
Ph.D. Student

Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3

k...@psych.utoronto.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
William Revelle http://personality-project.org/revelle.html
Professor   http://personality-project.org/personality.html
Department of Psychology http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/
Attend  ISSID/ARP:2009   http://issid.org/issid.2009/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.