I posted this to r-bugzilla already <https://bugs.r-project.org/show_bug.cgi?id=18860>, but I would be interested to see if anyone here has comments (fewer people track r-bugzilla closely ...)

The issue: if a variable reference in a model.frame() formula is missing from the data but matches a function found elsewhere in the environment (e.g. `count()` in `dplyr`), the user gets the opaque error message "object is not a matrix".

If the variable reference is missing from the data, but matches (e.g. a list or list-like object (e.g., a data frame), then we get the more useful error "invalid type (TYPE) for variable 'VARIABLE'"

The issue is that model frame tries to compute the number of rows of the data before it tests the type of each column. If we switch the order of operations so that the columns are tested first, we get a much more useful error message.

This change seems harmless, but of course I wouldn't be surprised if someone can come up with a reasonable scenario where it causes problems ...

More details/examples in the bug report linked above. The relevant bits of source code are here

model.c row/column checking:

https://github.com/r-devel/r-svn/blob/63369bfed9330b9461ec1d8b90d7251c0118508f/src/library/stats/src/model.c#L138-L152

nrows() function:

https://github.com/r-devel/r-svn/blob/63369bfed9330b9461ec1d8b90d7251c0118508f/src/main/util.c#L81-L94

  cheers
   Ben Bolker

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to