[R] standardization
Hi I have dataframe which contain 5 columns and 1000 records. I want standard each cell. I want range each column between 0 and 1 . I think i must use loop? could you help me? - Moody friends. Drama queens. Your life? Nope! - their life, your story. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standardization
On Fri, 13 Jul 2007, David Barron wrote: Try having a look at the scale and sweep functions. sweep applies to arrays, not data frames, and scale converts to a matrix. For a data frame df2 - df1 df2[] - lapply(df1, function(x) {r - range(x, na.rm=TRUE); (x-r[1])/diff(r)}) seems simple enough. On 13/07/07, Amir_17 [EMAIL PROTECTED] wrote: Hi I have dataframe which contain 5 columns and 1000 records. I want standard each cell. I want range each column between 0 and 1 . I think i must use loop? could you help me? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standardization
Try having a look at the scale and sweep functions. David On 13/07/07, Amir_17 [EMAIL PROTECTED] wrote: Hi I have dataframe which contain 5 columns and 1000 records. I want standard each cell. I want range each column between 0 and 1 . I think i must use loop? could you help me? - Moody friends. Drama queens. Your life? Nope! - their life, your story. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standardization Range
Dear R-Helpers, I want to perform a standardization of a variable with range method. i.e.: Standardization (range) == (var-min(var))/(max(var)-min(var)) Do you konw how can i develop this? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standardization Range
Hi sergio, Sergio Della Franca wrote: Dear R-Helpers, I want to perform a standardization of a variable with range method. i.e.: Standardization (range) == (var-min(var))/(max(var)-min(var)) Do you konw how can i develop this? As you do ... but don't use var which is the name of the function to compute variance. Try something like : stdrange - function(x) {(x-min(x))/(max(x)-min(x))} var=1:10 # not a good idea, just for fun stdrange(var) [1] 0.000 0.111 0.222 0.333 0.444 0.556 0.667 [8] 0.778 0.889 1.000 Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stéphane DRAY ([EMAIL PROTECTED] ) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I 43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88 http://biomserv.univ-lyon1.fr/~dray/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standardization Range
you can still use scale() (as you have been told), look at the help page for more info, especially at the Arguments section, e.g., mat - matrix(rnorm(100*10), 100, 10) rng - apply(mat, 2, range) scale(mat, scale = rng[2, ] - rng[1, ]) or you could even use apply() directly, e.g., apply(mat, 2, function(x) (x - mean(x)) / diff(range(x)) ) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Sergio Della Franca [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Wednesday, March 28, 2007 11:00 AM Subject: [R] Standardization Range Dear R-Helpers, I want to perform a standardization of a variable with range method. i.e.: Standardization (range) == (var-min(var))/(max(var)-min(var)) Do you konw how can i develop this? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standardization
Dear R-Helpers, I want to perform a stadardiazation of a variable with mehtod range. How can i achve this results? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standardization
On Tue, 2007-03-27 at 16:52 +0200, Sergio Della Franca wrote: Dear R-Helpers, I want to perform a stadardiazation of a variable with mehtod range. How can i achve this results? Thank you in advance. Sergio Della Franca See ?scale HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standardization
I am not sure I understand your question, but you may want to have a look at ?scale. It might get you started. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sergio Della Franca Sent: Tuesday, March 27, 2007 10:52 AM To: r-help@stat.math.ethz.ch Subject: [R] Standardization Dear R-Helpers, I want to perform a stadardiazation of a variable with mehtod range. How can i achve this results? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standardization
Sorry, I try to explain better my problem. Standardization (range) == (var-mean(var))/(max(var)-min(var)) Thank you in advance 2007/3/27, Bos, Roger [EMAIL PROTECTED]: I am not sure I understand your question, but you may want to have a look at ?scale. It might get you started. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sergio Della Franca Sent: Tuesday, March 27, 2007 10:52 AM To: r-help@stat.math.ethz.ch Subject: [R] Standardization Dear R-Helpers, I want to perform a stadardiazation of a variable with mehtod range. How can i achve this results? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient. ** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standardization
On 3/27/07, Sergio Della Franca [EMAIL PROTECTED] wrote: Dear R-Helpers, I want to perform a stadardiazation of a variable with mehtod range. How can i achve this results? One way is the rescaler method in the reshape package. It can scale to common range, mean 0 sd 1, or ranks. Compared to scale, which others have mentioned, it will work on data.frames, leaving categorical variables unchanged. Regards, Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standardization
So, scale() is the answer. Have you looked at the help? Giovanni Date: Tue, 27 Mar 2007 17:37:08 +0200 From: Sergio Della Franca [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Precedence: list Sorry, I try to explain better my problem. Standardization (range) == (var-mean(var))/(max(var)-min(var)) Thank you in advance 2007/3/27, Bos, Roger [EMAIL PROTECTED]: I am not sure I understand your question, but you may want to have a look at ?scale. It might get you started. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sergio Della Franca Sent: Tuesday, March 27, 2007 10:52 AM To: r-help@stat.math.ethz.ch Subject: [R] Standardization Dear R-Helpers, I want to perform a stadardiazation of a variable with mehtod range. How can i achve this results? Thank you in advance. Sergio Della Franca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient. ** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Giovanni Petris [EMAIL PROTECTED] Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standardization of values before call to pam() or clara()
Dylan == Dylan Beaudette [EMAIL PROTECTED] on Mon, 22 May 2006 17:33:47 -0700 writes: Dylan Greetings, Experimenting with the cluster package, Dylan and am starting to scratch my head in regards to the Dylan *best* way to standardize my data. Both functions can Dylan pre-standardize columns in a dataframe. according to Dylan the manual: Dylan Measurements are standardized for each variable Dylan (column), by subtracting the variable's mean value Dylan and dividing by the variable's mean absolute Dylan deviation. Dylan This works well when input variables are all in the Dylan same units. When I include new variables with a Dylan different intrinsic range, the ones with the largest Dylan relative values tend to be _weighted_ . this is Dylan certainly not surprising, but complicates things. Dylan Does there exist a robust technique to effectively Dylan re-scale each of the variables, regardless of their Dylan intrinsic range to some set range, say from {0,1} ? Dylan I have tried dividing a variable by the maximum value Dylan of that variable, but I am not sure if this is Dylan statistically correct. A more usual scaling standardization is accomplished by the function -- guess what? -- scale() It defaults to standardize to mean 0 and std. 1. But you can use it as well to do a [0,1] scaling. Note that you are very wise to think about the importance of variable scaling / weighting for cluster analysis. But people have been here before, and invented the much more general notion of a distance/dissimilarity between observational units. -- function daisy() {in cluster} or dist() {from stats} provide such dissimilarity objects. These can be used as input for pam() or clara() as well, and in constructing them you are much more flexible than trying to find a proper scaling of your x-matrix. Note that daisy() in particular has been designed for computing sensible dissimilarities for the case when X-matrix has a collection of continuous {eg interval scaled} and of categorical (e.g binary) variables. I recommend you get a textbook on clustering, to read up more on the subject. Regards, Martin Maechler, ETH Zurich Dylan Any ideas, thoughts would be greatly appreciated. Dylan Cheers, Dylan -- Dylan Beaudette Soils and Biogeochemistry Graduate Dylan Group University of California at Davis 530.754.7341 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] standardization of values before call to pam() or clara()
Greetings, Experimenting with the cluster package, and am starting to scratch my head in regards to the *best* way to standardize my data. Both functions can pre-standardize columns in a dataframe. according to the manual: Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. This works well when input variables are all in the same units. When I include new variables with a different intrinsic range, the ones with the largest relative values tend to be _weighted_ . this is certainly not surprising, but complicates things. Does there exist a robust technique to effectively re-scale each of the variables, regardless of their intrinsic range to some set range, say from {0,1} ? I have tried dividing a variable by the maximum value of that variable, but I am not sure if this is statistically correct. Any ideas, thoughts would be greatly appreciated. Cheers, -- Dylan Beaudette Soils and Biogeochemistry Graduate Group University of California at Davis 530.754.7341 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] standardization
SAS Enterprise Miner recommendeds to standardize using X / STDEV(X) versus [X mean(X)] / STDEV(X) Any thoughts on this? Pros Cons Philip __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] standardization
Philip Bermingham [EMAIL PROTECTED] writes: SAS Enterprise Miner recommendeds to standardize using X / STDEV(X) versus [X mean(X)] / STDEV(X) Any thoughts on this? Pros Cons When??? This makes absolutely no sense out of context. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] standardization
Peter Dalgaard wrote: SAS Enterprise Miner recommendeds to standardize using X / STDEV(X) versus [X mean(X)] / STDEV(X) This makes absolutely no sense out of context. To paraphrase Tanenbaum: The nice thing about standardization is that there's so many ways to do it. Baz [[ Free On-line Dictionary of Computing: Andrew Tanenbaum, in his Computer Networks book, once said, The nice thing about standards is that there are so many of them to choose from, ]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] standardization
Barry Rowlingson [EMAIL PROTECTED] writes: The nice thing about standards is that there are so many of them to choose from, Curiously enough, the same quote came up today on dk.edb.system.unix in the context of translations. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] standardization
My thoughts on this is: Do not trust what SAS say´s and least of all what the Enterprise Miner said. Robust Statisticians recommendends to standardize using e.g. (X - median(X)) / ( MAD(X) / 0.675 ) Best, Matthias SAS Enterprise Miner recommendeds to standardize using X / STDEV(X) versus [X - mean(X)] / STDEV(X) Any thoughts on this? Pros Cons Philip __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] standardization
You asked another question about clustering, so I presume you want to standardize some variables before clustering. In SAS, PROC STDIZE offers 18 standardization methods. See http://support.sas.com/91doc/getDoc/statug.hlp/stdize_sect12.htm#stat_stdize_stdizesm for details. If you're really concerned about this I would suggest running simulations to compare the performance of various standardization methods (relative to your data and what you're after). hth, b. -Original Message- From: Philip Bermingham [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 18, 2005 8:34 AM To: r-help@stat.math.ethz.ch Subject: [R] standardization SAS Enterprise Miner recommendeds to standardize using X / STDEV(X) versus [X mean(X)] / STDEV(X) Any thoughts on this? Pros Cons Philip __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html