On Fri, 14 Mar 2008, Simon Anders wrote:

> I recently ran into a problem with 'merge' that stems from the way how
> missing values in the key column (i.e., the column specified
> in the "by" argument) are handled. I wonder whether the current behavior
> is fully consistent.
> ...
> > x <- data.frame( key = c(1:3,3,NA,NA), val = 10+1:6 )
> > y <- data.frame( key = c(NA,2:5,3,NA), val = 20+1:7 )
> ...
> > merge( x, y, by="key" )
>    key val.x val.y
> 1   2    12    22
> 2   3    13    23
> 3   3    13    26
> 4   3    14    23
> 5   3    14    26
> 6  NA    15    21
> 7  NA    15    27
> 8  NA    16    21
> 9  NA    16    27
>
> As one should expect, there are now four lines with key value '3',
> because the key '3' appears twice both in x and in y. According to the
> logic of merge, a row should be produced in the output for each pairing
> of a row from x and a row from y where the values of 'key' are equal.
>
> However, the 'NA' values are treated exactly the same way. It seems that
> 'merge' considers the pairing of lines with 'NA' in both 'key' columns
> an allowed match. IMHO, this runs against the convention that two NAs
> are not considered equal. ('NA==NA' does not evaluate to 'TRUE'.)
>
> Is might be more consistent if merge did not include any rows into the
> output with an "NA" in the key column.
>
> Maybe, one could add a flag argument to 'merge' to switch between this
> behaviour and the current one? A note in the help page might be nice, too.

Splus (versions 8.0, 7.0, and 6.2) gives:
   > merge( x, y, by="key" )
     key val.x val.y
   1   2    12    22
   2   3    13    23
   3   3    14    23
   4   3    13    26
   5   3    14    26
Is that what you expect?  There is no argument
to Splus's merge to make it include the NA's
in the way R's merge does.  Should there be such
an argument?

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to