Marek Ancukiewicz <[EMAIL PROTECTED]> writes: > Dear Thomas, > > The question becomes: how do we rank missing values? In > version 1.8.1 at least, cor () uses default handling of > missing values by rank() [by na.last parameter], that is > missing values are assigned the highest rank. However, if > nothing is known about the meaning of NA what would be the > basis of such an assumption? Assigning the NAs highest, > lowest values, or any other values requires some additional > information. > > It seems that the default handling on missing values should be > to assign them missing ranks: within cor(), rank() should be > called with na.last="keep".
Yes, and that is what 1.9.0beta is doing (it's not like this issue hasn't been brought up before, just that the fix didn't quite fix it). I think what we have now is still buggy, but at least it isn't biasing rho towards +1 whenever x and y tend to be both missing at the same time. It's fairly easy to do something more sensible in the complete.cases case, but getting pairwise.complete.cases right is tricky. 1.9.0 is in deep code freeze, so I don't think we should change things at this point, except perhaps add a note to the help page. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel