Hclust is unable to handle missing values in dist().

There will be missing values in dist() function if 
1. all elements in a row are missing
2. all pairs between any two rows have at least one missing values.

In the former case, it is better to remove the row with all missing as
it is completely uninformative. The latter is harder to detect and I am
not sure how to deal with this.

Here is how dist() calculates its output for the following data:

   NA    3    5
    2    4    6

dist( rbind( c(NA, 3, 5) , c(2,4,6) ) ) = 1.732051 
= sqrt( [ (6-5)^2 + (4-3)^2  ] x 3/2 )

The factor 3/2 scales up the sum of squares of difference to account for
the missing pair.

Hope this helps.

--
Adaikalavan Ramasamy 



> Dear Sir,
> 
> This is Ms. Setsuko Kinoshita writing from Japan.
> 
> I have a question about " missing value" in Hierarchical Clustering. 
> Hierarchical Clustering was not available the data with missing value 
> for earlier version of "R". I used Euclidean distance and complete 
> linkage method for "plot(hclust(dist()),hang=-1)".
> 
> How are missing values treated for Hierarchical Clustering in the 
> latest "R 1.7.1" program? e.g. : Is an average replaced ?
> 
> Yours Sincerely,
> 
> -----
> Setsuko Kinoshita
> 
> Social $B!! (Band Environmental Medicine, $B!! (B
> Graduate School of Comprehensive Human Sciences,
> University of Tsukuba
> 1-1-1, Tennoudai, Tsukuba,
> Ibaraki, 305-8575, Japan
> Tel&Fax: +81-29-853-3489
> E-mail:[EMAIL PROTECTED](office)
> E-mail:[EMAIL PROTECTED](private)
> 
> ______________________________________________
> [EMAIL PROTECTED] mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to