[R] Alternative to Scale Function?
Hi, Is there an alternative to the scale function where I can specify my own mean and standard deviation? I've come across an interesting issue where this would help. I'm training and testing on completely different sets of data. The testing set is smaller than the training set. Using the standard scale function of R seems to introduce some error. Since it scales data WITHIN the set, it may scale the same number to different value since the range in the training and testing set may be different. My thought was to scale the larger training set of data, then use the mean and SD of the training data to scale the testing data according to the same parameters. That way a number will transform to the same result regardless of whether it is in the training or testing set. I can't be the first one to have looked at this. Does anyone know of a function in R or if there is a scale alternative where I can control the parameters? Thanks! -- Noah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to Scale Function?
I think I just answered my own question. The scale function will return the mean and sd of the data. So the process is fairly simple. scale training data varaible note mean and sd from the scale then manually scale the test data using the mean and sd from the training data. That should make sure that a value is transformed the same regardless of which data set it is in. Do I have this correct, or can anybody contribute any more to the concept? Thanks! -- Noah On 9/11/09 1:10 PM, Noah Silverman wrote: Hi, Is there an alternative to the scale function where I can specify my own mean and standard deviation? I've come across an interesting issue where this would help. I'm training and testing on completely different sets of data. The testing set is smaller than the training set. Using the standard scale function of R seems to introduce some error. Since it scales data WITHIN the set, it may scale the same number to different value since the range in the training and testing set may be different. My thought was to scale the larger training set of data, then use the mean and SD of the training data to scale the testing data according to the same parameters. That way a number will transform to the same result regardless of whether it is in the training or testing set. I can't be the first one to have looked at this. Does anyone know of a function in R or if there is a scale alternative where I can control the parameters? Thanks! -- Noah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to Scale Function?
On Fri, 2009-09-11 at 13:10 -0700, Noah Silverman wrote: Hi, Is there an alternative to the scale function where I can specify my own mean and standard deviation? A couple of calls to sweep? See ?sweep set.seed(123) dat - data.frame(matrix(runif(10*10), ncol = 10)) xbar - colMeans(dat) sigma - apply(dat, 2, sd) dat.std - sweep(sweep(dat, 2, xbar, -), 2, sigma, /) ## compare scale(dat) HTH I've come across an interesting issue where this would help. I'm training and testing on completely different sets of data. The testing set is smaller than the training set. Using the standard scale function of R seems to introduce some error. Since it scales data WITHIN the set, it may scale the same number to different value since the range in the training and testing set may be different. My thought was to scale the larger training set of data, then use the mean and SD of the training data to scale the testing data according to the same parameters. That way a number will transform to the same result regardless of whether it is in the training or testing set. I can't be the first one to have looked at this. Does anyone know of a function in R or if there is a scale alternative where I can control the parameters? Thanks! -- Noah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to Scale Function?
The scale function will return the mean and sd of the data. By default. Read ?scale. Mark. Noah Silverman-3 wrote: I think I just answered my own question. The scale function will return the mean and sd of the data. So the process is fairly simple. scale training data varaible note mean and sd from the scale then manually scale the test data using the mean and sd from the training data. That should make sure that a value is transformed the same regardless of which data set it is in. Do I have this correct, or can anybody contribute any more to the concept? Thanks! -- Noah On 9/11/09 1:10 PM, Noah Silverman wrote: Hi, Is there an alternative to the scale function where I can specify my own mean and standard deviation? I've come across an interesting issue where this would help. I'm training and testing on completely different sets of data. The testing set is smaller than the training set. Using the standard scale function of R seems to introduce some error. Since it scales data WITHIN the set, it may scale the same number to different value since the range in the training and testing set may be different. My thought was to scale the larger training set of data, then use the mean and SD of the training data to scale the testing data according to the same parameters. That way a number will transform to the same result regardless of whether it is in the training or testing set. I can't be the first one to have looked at this. Does anyone know of a function in R or if there is a scale alternative where I can control the parameters? Thanks! -- Noah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Alternative-to-Scale-Function--tp25407625p25408289.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to Scale Function?
Genius, That certainly is much faster that what I had worked out on my own. I looked at sweep, but couldn't understand the rather thin help page. Your example makes it really clear Thank You!!! -- Noah On 9/11/09 1:57 PM, Gavin Simpson wrote: On Fri, 2009-09-11 at 13:10 -0700, Noah Silverman wrote: Hi, Is there an alternative to the scale function where I can specify my own mean and standard deviation? A couple of calls to sweep? See ?sweep set.seed(123) dat- data.frame(matrix(runif(10*10), ncol = 10)) xbar- colMeans(dat) sigma- apply(dat, 2, sd) dat.std- sweep(sweep(dat, 2, xbar, -), 2, sigma, /) ## compare scale(dat) HTH I've come across an interesting issue where this would help. I'm training and testing on completely different sets of data. The testing set is smaller than the training set. Using the standard scale function of R seems to introduce some error. Since it scales data WITHIN the set, it may scale the same number to different value since the range in the training and testing set may be different. My thought was to scale the larger training set of data, then use the mean and SD of the training data to scale the testing data according to the same parameters. That way a number will transform to the same result regardless of whether it is in the training or testing set. I can't be the first one to have looked at this. Does anyone know of a function in R or if there is a scale alternative where I can control the parameters? Thanks! -- Noah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.