>... >Simon, no, the drop=FALSE argument has nothing to do with what >Christian was talking about. The kind of thing he meant is PR# 8192, >"Subject: [ subscripting sometimes loses names": > > http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192 > >In R, subscripting with "[" USUALLY retains names, but R has various >edge cases where it (IMNSHO) inappropriately discards them. This >occurs with both .Primitive("[") and "[.data.frame". This has been >known for years, but I have not yet tried digging into R's >implementation to see where and how the names are actually getting >lost. > >Incidentally, versions of S-Plus since approximately S-Plus 6.0 back >in 2001 show similar buggy edge case behavior. Older versions of >S-Plus, c. S-Plus 3.3 and earlier, had the correct, name preserving >behavior. I presume that the original Bell Labs S had correct >name-preserving behavior, and then the S-Plus developers broke it >sometime along the way.
(Later comments on the thread pointed out the difference between x[,1] for matrices and data frames.) I rewrote the S-PLUS data frame code around then, to fix various inconsistencies and improve efficiency. This was probably my change, and I would do it again. Note that the components of a data frame do not have names attached to them; the row names are a separate object. Extracting a component vector or matrix from a data frame should not attach names to the result, because of: * memory (attaching row names to an object can more than double the size of the object), * speed * some objects cannot take names, and attaching them could change the class and other behavior of an object, and * the names are usually/often (depending on the user) meaningless, artifacts of an early design decision that all data frames have row names. Data frames differ from matrices in two ways that matter here: * columns in matrices are all the same kind, and are simple objects (numeric, etc.), whereas components of data frames can be nearly arbitrary objects, and * row names get added to a data frame whether a user wants them or not, whereas row names on a matrix have to be specified. A historical note - unique row names on data frame were a design decision made when people worked with small data frames, and are convenient for small data frames. But they are a problem for large data frames. I was writing for all users, not just those with small data frames and meaningful names. I like R's 'automatic' row names. This is a big help working with huge data frames (and I do this often, at Google). But this doesn't go far enough; subscripting and other operations sometimes convert the automatic names to real names, and check/enforce uniqueness, which is a big waste of time when working with large data frames. I'll comment more on this in a new thread. Tim Hesterberg ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel