Andy had written:

> >... The drop=FALSE argument has nothing to do with what
> >Christian was talking about.  The kind of thing he meant is PR# 8192,
> >"Subject: [ subscripting sometimes loses names":
> >
> >  http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192
>

On Sun, Feb 1, 2009 at 12:25 PM, Tim Hesterberg <timhesterb...@gmail.com>wrote:

> (Later comments on the thread pointed out the difference between
> x[,1] for matrices and data frames.)
>
> I rewrote the S-PLUS data frame code around then, to fix
> various inconsistencies and improve efficiency.
> This was probably my change, and I would do it again.
>
> Note that the components of a data frame do not have names
> attached to them; the row names are a separate object.
> Extracting a component vector or matrix from a data frame should not
> attach names to the result, because of:
> * memory (attaching row names to an object can more than double the
>  size of the object),
> * speed
> * some objects cannot take names, and attaching them could change
>  the class and other behavior of an object, and
> * the names are usually/often (depending on the user) meaningless,
>  artifacts of an early design decision that all data frames have row names.
>
> Data frames differ from matrices in two ways that matter here:
> * columns in matrices are all the same kind, and are simple objects
>  (numeric, etc.), whereas components of data frames can be nearly
>  arbitrary objects, and
> * row names get added to a data frame whether a user wants them or not,
>  whereas row names on a matrix have to be specified.
>
> A historical note - unique row names on data frame were a design
> decision made when people worked with small data frames, and are
> convenient for small data frames.  But they are a problem for large
> data frames.  I was writing for all users, not just those with small
> data frames and meaningful names.
>

Hi Tim,

Thank you for explaning this so carefully.  It's very valuable to hear the
rationale beind a design decision explained so carefully.  I accept that
yours is the right solution for general use.

In our case, we deal with not too many rows, up to a few thousand, with
meaningful names.  And we mostly use data frames.  Because of our special
situation, we wrote our own "[" methods, which normally do what's right for
us.  That's why, in one debugging session, it was necessary to "get" the
overriden, stock R method from package:base.  In that case, the obejct
happened to be a matrix not a dataframe, and R got a segmentation fault.
And that's why I submitted the bug report that sparked this discussion.

/Christian

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to