[R] Alternative to Scale Function?

2009-09-11 Thread Noah Silverman

Hi,

Is there an alternative to the scale function where I can specify my own 
mean and standard deviation?


I've come across an interesting issue where this would help.

I'm training and testing on completely different sets of data.  The 
testing set is smaller than the training set.


Using the standard scale function of R seems to introduce some error.  
Since it scales data WITHIN the set, it may scale the same number to 
different value since the range in the training and testing set may be 
different.


My thought was to scale the larger training set of data, then use the 
mean and SD of the training data to scale the testing data according to 
the same parameters.  That way a number will transform to the same 
result regardless of whether it is in the training or testing set.


I can't be the first one to have looked at this.  Does anyone know of a 
function in R or if there is a scale alternative where I can control the 
parameters?


Thanks!

--
Noah

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to Scale Function?

2009-09-11 Thread Noah Silverman

I think I just answered my own question.

The scale function will return the mean and sd of the data.

So the process is fairly simple.
scale training data varaible
note mean and sd from the scale
then manually scale the test data using the mean and sd from the 
training data.


That should make sure that a value is transformed the same regardless of 
which data set it is in.


Do I have this correct, or can anybody contribute any more to the concept?

Thanks!


--
Noah

On 9/11/09 1:10 PM, Noah Silverman wrote:

Hi,

Is there an alternative to the scale function where I can specify my 
own mean and standard deviation?


I've come across an interesting issue where this would help.

I'm training and testing on completely different sets of data.  The 
testing set is smaller than the training set.


Using the standard scale function of R seems to introduce some error.  
Since it scales data WITHIN the set, it may scale the same number to 
different value since the range in the training and testing set may be 
different.


My thought was to scale the larger training set of data, then use the 
mean and SD of the training data to scale the testing data according 
to the same parameters.  That way a number will transform to the same 
result regardless of whether it is in the training or testing set.


I can't be the first one to have looked at this.  Does anyone know of 
a function in R or if there is a scale alternative where I can control 
the parameters?


Thanks!

--
Noah

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to Scale Function?

2009-09-11 Thread Gavin Simpson
On Fri, 2009-09-11 at 13:10 -0700, Noah Silverman wrote:
 Hi,
 
 Is there an alternative to the scale function where I can specify my own 
 mean and standard deviation?

A couple of calls to sweep?

See ?sweep

set.seed(123)
dat - data.frame(matrix(runif(10*10), ncol = 10))
xbar - colMeans(dat)
sigma - apply(dat, 2, sd)

dat.std - sweep(sweep(dat, 2, xbar, -), 2, sigma, /)

## compare
scale(dat)

HTH

 
 I've come across an interesting issue where this would help.
 
 I'm training and testing on completely different sets of data.  The 
 testing set is smaller than the training set.
 
 Using the standard scale function of R seems to introduce some error.  
 Since it scales data WITHIN the set, it may scale the same number to 
 different value since the range in the training and testing set may be 
 different.
 
 My thought was to scale the larger training set of data, then use the 
 mean and SD of the training data to scale the testing data according to 
 the same parameters.  That way a number will transform to the same 
 result regardless of whether it is in the training or testing set.
 
 I can't be the first one to have looked at this.  Does anyone know of a 
 function in R or if there is a scale alternative where I can control the 
 parameters?
 
 Thanks!
 
 --
 Noah
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to Scale Function?

2009-09-11 Thread Mark Difford

 The scale function will return the mean and sd of the data.

By default. Read ?scale.

Mark.


Noah Silverman-3 wrote:
 
 I think I just answered my own question.
 
 The scale function will return the mean and sd of the data.
 
 So the process is fairly simple.
 scale training data varaible
 note mean and sd from the scale
 then manually scale the test data using the mean and sd from the 
 training data.
 
 That should make sure that a value is transformed the same regardless of 
 which data set it is in.
 
 Do I have this correct, or can anybody contribute any more to the concept?
 
 Thanks!
 
 
 --
 Noah
 
 On 9/11/09 1:10 PM, Noah Silverman wrote:
 Hi,

 Is there an alternative to the scale function where I can specify my 
 own mean and standard deviation?

 I've come across an interesting issue where this would help.

 I'm training and testing on completely different sets of data.  The 
 testing set is smaller than the training set.

 Using the standard scale function of R seems to introduce some error.  
 Since it scales data WITHIN the set, it may scale the same number to 
 different value since the range in the training and testing set may be 
 different.

 My thought was to scale the larger training set of data, then use the 
 mean and SD of the training data to scale the testing data according 
 to the same parameters.  That way a number will transform to the same 
 result regardless of whether it is in the training or testing set.

 I can't be the first one to have looked at this.  Does anyone know of 
 a function in R or if there is a scale alternative where I can control 
 the parameters?

 Thanks!

 -- 
 Noah

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Alternative-to-Scale-Function--tp25407625p25408289.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to Scale Function?

2009-09-11 Thread Noah Silverman
Genius,

That certainly is much faster that what I had worked out on my own.

I looked at sweep, but couldn't understand the rather thin help page.  
Your example makes it really clear

Thank You!!!

--
Noah


On 9/11/09 1:57 PM, Gavin Simpson wrote:
 On Fri, 2009-09-11 at 13:10 -0700, Noah Silverman wrote:

 Hi,

 Is there an alternative to the scale function where I can specify my own
 mean and standard deviation?
  
 A couple of calls to sweep?

 See ?sweep

 set.seed(123)
 dat- data.frame(matrix(runif(10*10), ncol = 10))
 xbar- colMeans(dat)
 sigma- apply(dat, 2, sd)

 dat.std- sweep(sweep(dat, 2, xbar, -), 2, sigma, /)

 ## compare
 scale(dat)

 HTH


 I've come across an interesting issue where this would help.

 I'm training and testing on completely different sets of data.  The
 testing set is smaller than the training set.

 Using the standard scale function of R seems to introduce some error.
 Since it scales data WITHIN the set, it may scale the same number to
 different value since the range in the training and testing set may be
 different.

 My thought was to scale the larger training set of data, then use the
 mean and SD of the training data to scale the testing data according to
 the same parameters.  That way a number will transform to the same
 result regardless of whether it is in the training or testing set.

 I can't be the first one to have looked at this.  Does anyone know of a
 function in R or if there is a scale alternative where I can control the
 parameters?

 Thanks!

 --
 Noah

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.