What version of R are you using? I don't get the same result on my system:
> R.version.string # Windows XP [1] "R version 2.1.0, 2005-06-10" > p <- c('a', 'c', '', ''); a <- c(10, 20, 30, 40); d1 <- + data.frame(Promoter=p, ip=a) # Note duplicate empty names in p. > p <- c('b', 'c', 'd', ''); a <- c(15, 20, 30, 40); d2 <- + data.frame(Promoter=p, ip=a) > all <- merge(x=d1, y=d2, by="Promoter", all=T) > all <- merge(x=all, y=d2, by="Promoter", all=T) > all Promoter ip.x ip.y ip 1 30 40 40 2 40 40 40 3 a 10 NA NA 4 c 20 20 20 5 b NA 15 15 6 d NA 30 30 On 6/16/05, Frank Gibbons <[EMAIL PROTECTED]> wrote: > Run this: > > >p <- c('a', 'c', '', ''); a <- c(10, 20, 30, 40); d1 <- > >data.frame(Promoter=p, ip=a) # Note duplicate empty names in p. > >p <- c('b', 'c', 'd', ''); a <- c(15, 20, 30, 40); d2 <- > >data.frame(Promoter=p, ip=a) > >all <- merge(x=d1, y=d2, by="Promoter", all=T) > >all <- merge(x=all, y=d2, by="Promoter", all=T) > >all > > Data is this: > > >d1 > > Promoter ip > >1 a 10 > >2 c 20 > >3 30 > >4 40 > > > >d2 > > Promoter ip > >1 b 15 > >2 c 20 > >3 d 30 > >4 40 > > Output looks like this: > > > Promoter ip.x ip.y ip > >1 40 30 30 > >2 40 40 30 > >3 40 30 40 > >4 40 40 40 > >5 b 15 NA NA > >6 c 20 20 20 > >7 d 30 NA NA > >8 a NA 10 10 > > The weird thing about this is (in my view) that each instance of '' is > considered unique, so with each successive merge, all combinatorial > possibilities are explored, like a SQL outer join (Cartesian product). For > non-empty names, an inner join is performed. > > Dealing with genomic data (10^4 datapoints), it's easy to have a couple of > blanks buried in the middle of things, and to combine several replicates > with successive merges. I couldn't understand how my three replicates of > 6000 points, in which I expected substantial overlap in the labels, were > taking so long to merge and ultimately generating 57000 labels. The culprit > turned out to be a few hundred blanks buried in the middle. > > Why does the empty ("null") name merit special treatment? Perhaps I'm > missing something. I hesitate to submit this as a bug, since technically I > guess you could say that blank names, especially duplicates, are not > kosher. But on the other hand, this combinatorial behaviour seems to occur > only for blanks. > > -Frank > > PhD, Computational Biologist, > Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. > Tel: 617-432-3555 Fax: > 617-432-3557 http://llama.med.harvard.edu/~fgibbons > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html