Greetings, Experimenting with the cluster package, and am starting to scratch my head in regards to the *best* way to standardize my data. Both functions can pre-standardize columns in a dataframe. according to the manual:
Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. This works well when input variables are all in the same units. When I include new variables with a different intrinsic range, the ones with the largest relative values tend to be _weighted_ . this is certainly not surprising, but complicates things. Does there exist a robust technique to effectively re-scale each of the variables, regardless of their intrinsic range to some set range, say from {0,1} ? I have tried dividing a variable by the maximum value of that variable, but I am not sure if this is statistically correct. Any ideas, thoughts would be greatly appreciated. Cheers, -- Dylan Beaudette Soils and Biogeochemistry Graduate Group University of California at Davis 530.754.7341 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html