On Tue, 2004-11-09 at 12:59, Alessio Boattini wrote: > Dear All, > > I would like to ask clarifications on the gower distnce matrix calculated by > the function gdistin the library mvpart. > Here is a dummy example: > > > library(mvpart) > Loading required package: survival > Loading required package: splines > mvpart package loaded: extends rpart to include > multivariate and distance-based partitioning > > x=matrix(1:6, byrow=T, ncol=2) > > x > [,1] [,2] > [1,] 1 2 > [2,] 3 4 > [3,] 5 6 > > gdist(x, method="euclid") > 1 2 > 2 2.828427 > 3 5.656854 2.828427 > > ########################## > doing the calculations by hand according to the formula in gdist help page I > get the same results. The formula given is: > 'euclidean' d[jk] = sqrt(sum (x[ij]-x[ik])^2) > ################################# > > > sqrt(8) > [1] 2.828427 > > gdist(x, method="gower") > 1 2 > 2 0.7071068 > 3 1.4142136 0.7071068 > > ####################################### > doing the calculations by hand according to the formula in gdist help page > cannot reproduce the same results. The formula given is: > 'gower' d[jk] = sum (abs(x[ij]-x[ik])/(max(i)-min(i)) > ########################################## > > Could anybody please shed some light? >
There seems to be a bug in documentation. The function uses different calculation than the help page specifies. Look at the 'gdist' code. Just to make things easier: In the function body, gower is method 6, and Euclidean distances are method 2. Gower's original paper is available through http://www.jstor.org/ (Biometrics Vol. 27, No. 4, p. 857-871; 1971). cheers, jari oksanen -- Jari Oksanen <[EMAIL PROTECTED]> ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
