[R] standardize columns selectively within a dataframe

2010-09-01 Thread Olga Lyashevska
Dear all,

I have a dataframe:   
df-dataframe(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))

I want to obtain a new dataframe with columns a and b being standardized
((x-mean(x))/sd(x)); the other two columns (c,d) I want to leave
unchanged. What is the best way to achieve this? I have been trying to
use subscripts but did not succeed so far. 

Any tips?

Many thanks,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standardize columns selectively within a dataframe

2010-09-01 Thread David Winsemius


On Sep 1, 2010, at 10:35 AM, Olga Lyashevska wrote:


Dear all,

I have a dataframe:
df-dataframe(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))

I want to obtain a new dataframe with columns a and b being  
standardized

((x-mean(x))/sd(x)); the other two columns (c,d) I want to leave
unchanged. What is the best way to achieve this? I have been trying to
use subscripts but did not succeed so far.


 df[ , 1:2] - scale(df[ , 1:2])
 df
   a  b c  d
1 -1 -1 7 10
2  0  0 8 11
3  1  1 9 12


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standardize columns selectively within a dataframe

2010-09-01 Thread Adaikalavan Ramasamy

If you want to scale within columns, you could try

 cbind( scale(df[,1:2]), df[ ,-c(1:2)] )
a  b c  d
 1 -1 -1 7 10
 2  0  0 8 11
 3  1  1 9 12

and it is data.frame() btw.


On 01/09/2010 15:35, Olga Lyashevska wrote:

Dear all,

I have a dataframe:
df-dataframe(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))

I want to obtain a new dataframe with columns a and b being standardized
((x-mean(x))/sd(x)); the other two columns (c,d) I want to leave
unchanged. What is the best way to achieve this? I have been trying to
use subscripts but did not succeed so far.

Any tips?

Many thanks,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standardize columns selectively within a dataframe

2010-09-01 Thread Olga Lyashevska
Thanks! It is exactly what I was looking for!

Cheers
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standardize columns selectively within a dataframe

2010-09-01 Thread David Winsemius


On Sep 1, 2010, at 10:42 AM, David Winsemius wrote:



On Sep 1, 2010, at 10:35 AM, Olga Lyashevska wrote:


Dear all,

I have a dataframe:
df-dataframe(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))

I want to obtain a new dataframe with columns a and b being  
standardized

((x-mean(x))/sd(x)); the other two columns (c,d) I want to leave
unchanged. What is the best way to achieve this? I have been trying  
to

use subscripts but did not succeed so far.


 df[ , 1:2] - scale(df[ , 1:2])
 df
  a  b c  d
1 -1 -1 7 10
2  0  0 8 11
3  1  1 9 12


I suspect you might have tried (df-mean(df))/sd(x) and gotten  
unsatisfactory results; I know I did. If you had really wanted to  
persist and do it from first principles, so to speak, or perhaps as  
homework, then consider the sweep operation. It takes an object of  
lower dimension and applies a function, (-) by default, with the  
third argument repeatedly across the specified (in the second  
argument) dimension. You wanted to work on columns, so this would  
accomplish the subtraction of means() followed by division by sd():


 sweep(as.matrix(df[ , 1:2]), 2L, colMeans(mm)) # using the default  
- operator

  a  b
[1,] -1 -1
[2,]  0  0
[3,]  1  1
 sweep(sweep(df[ , 1:2], 2L, colMeans(mm)), 2, sd(mm), /)
   a  b
1 -1 -1
2  0  0
3  1  1

(Your test columns happened to be scaled already and only needed to be  
centered. This is how scale() does its work, and their help pages have  
links cross-referencing each other.)


This is probably a good time to reference Burns', The R Inferno, which  
has an entry for sweep (p 57) as well tips regarding the drop=FALSE  
maneuver (p 54) that I tried first for this problem but it didn't  
work.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standardize columns selectively within a dataframe

2010-09-01 Thread Olga Lyashevska
On Wed, 2010-09-01 at 12:42 -0400, David Winsemius wrote:

 I suspect you might have tried (df-mean(df))/sd(x) and gotten  
 unsatisfactory results; I know I did. 

yes, indeed! a few times, but why is that?

 If you had really wanted to  
 persist and do it from first principles, so to speak, or perhaps as  
 homework, then consider the sweep operation. It takes an object of  
 lower dimension and applies a function, (-) by default, with the  
 third argument repeatedly across the specified (in the second  
 argument) dimension. You wanted to work on columns, so this would  
 accomplish the subtraction of means() followed by division by sd():
 
   sweep(as.matrix(df[ , 1:2]), 2L, colMeans(mm)) # using the default  
 - operator
a  b
 [1,] -1 -1
 [2,]  0  0
 [3,]  1  1
   sweep(sweep(df[ , 1:2], 2L, colMeans(mm)), 2, sd(mm), /)
 a  b
 1 -1 -1
 2  0  0
 3  1  1

I am glad you are talking about sweep here, I have been also trying to
use it, but never managed to get complete understanding of what it
exactly does and therefore I could not get it working properly. Very
clear explanation, thanks!   

 (Your test columns happened to be scaled already and only needed to be  
 centered. This is how scale() does its work, and their help pages have  
 links cross-referencing each other.)
 
 This is probably a good time to reference Burns', The R Inferno, which  
 has an entry for sweep (p 57) as well tips regarding the drop=FALSE  
 maneuver (p 54) that I tried first for this problem but it didn't  
 work.

Thanks for the references! Your solution with scale() is nice and neat,
but for the sake of learning it is useful to persist.  

Cheers,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.