What version of R is this (please do see the posting guide)? In both 2.1.0 and 2.1.1 beta I get
> all Promoter ip.x ip.y ip 1 30 40 40 2 40 40 40 3 a 10 NA NA 4 c 20 20 20 5 b NA 15 15 6 d NA 30 30 so cannot reproduce your result. Are you sure that the `blanks' really are empty and not some character that is printing as empty on your unstated OS? BTW ' ' is what is normally called `blank'. BTW, these are not `names' but character strings: `names' has other meanings in R. On Thu, 16 Jun 2005, Frank Gibbons wrote: > Run this: > >> p <- c('a', 'c', '', ''); a <- c(10, 20, 30, 40); d1 <- >> data.frame(Promoter=p, ip=a) # Note duplicate empty names in p. >> p <- c('b', 'c', 'd', ''); a <- c(15, 20, 30, 40); d2 <- >> data.frame(Promoter=p, ip=a) >> all <- merge(x=d1, y=d2, by="Promoter", all=T) >> all <- merge(x=all, y=d2, by="Promoter", all=T) >> all > > Data is this: > >> d1 >> Promoter ip >> 1 a 10 >> 2 c 20 >> 3 30 >> 4 40 >> >> d2 >> Promoter ip >> 1 b 15 >> 2 c 20 >> 3 d 30 >> 4 40 > > Output looks like this: > >> Promoter ip.x ip.y ip >> 1 40 30 30 >> 2 40 40 30 >> 3 40 30 40 >> 4 40 40 40 >> 5 b 15 NA NA >> 6 c 20 20 20 >> 7 d 30 NA NA >> 8 a NA 10 10 > > The weird thing about this is (in my view) that each instance of '' is > considered unique, so with each successive merge, all combinatorial > possibilities are explored, like a SQL outer join (Cartesian product). For > non-empty names, an inner join is performed. > > Dealing with genomic data (10^4 datapoints), it's easy to have a couple of > blanks buried in the middle of things, and to combine several replicates > with successive merges. I couldn't understand how my three replicates of > 6000 points, in which I expected substantial overlap in the labels, were > taking so long to merge and ultimately generating 57000 labels. The culprit > turned out to be a few hundred blanks buried in the middle. > > Why does the empty ("null") name merit special treatment? Perhaps I'm > missing something. I hesitate to submit this as a bug, since technically I > guess you could say that blank names, especially duplicates, are not > kosher. But on the other hand, this combinatorial behaviour seems to occur > only for blanks. > > -Frank > > PhD, Computational Biologist, > Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. > Tel: 617-432-3555 Fax: > 617-432-3557 http://llama.med.harvard.edu/~fgibbons > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html