Re: [Rd] Incorrect handling of NA's in cor() (PR#6750)

Peter Dalgaard Fri, 09 Apr 2004 11:40:27 -0700

Marek Ancukiewicz <[EMAIL PROTECTED]> writes:

> Dear Thomas,
> 
> The question becomes: how do we rank missing values?  In
> version 1.8.1 at least, cor () uses default handling of
> missing values by rank() [by na.last parameter], that is
> missing values are assigned the highest rank. However, if
> nothing is known about the meaning of NA what would be the
> basis of such an assumption?  Assigning the NAs highest,
> lowest values, or any other values requires some additional
> information.
> 
> It seems that the default handling on missing values should be
> to assign them missing ranks: within cor(), rank() should be
> called with na.last="keep".


Yes, and that is what 1.9.0beta is doing (it's not like this issue
hasn't been brought up before, just that the fix didn't quite fix it).
I think what we have now is still buggy, but at least it isn't biasing
rho towards +1 whenever x and y tend to be both missing at the same
time.

It's fairly easy to do something more sensible in the complete.cases
case, but getting pairwise.complete.cases right is tricky. 1.9.0
is in deep code freeze, so I don't think we should change things at
this point, except perhaps add a note to the help page.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])             FAX: (+45) 35327907

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Incorrect handling of NA's in cor() (PR#6750)

Reply via email to