On Wed, 2005-08-17 at 21:48 -0400, Gabor Grothendieck wrote: > If its just a matter of specifying two data frames how about just > letting the user specify them as the first two arguments without > injecting formulas into it so that any of these are allowed but > data frames are still not allowed in formulas other than in the > data argument: > > yourfunction(df1, df2) > yourfunction(y ~ sp1 + sp2) > yourfunction(y ~., df) > > This could easily be implemented by having yourfunction be > generic in which case the first one would dispatch > yourfunction.data.frame and the second and third would > dispatch yourfunction.formula .
Hi Gabor, yourfunction() is already generic, I have .default and .formula methods. The default implementation of the method (Co-correspondence analysis) is akin to a regression and uses a form of multivariate PLS. So one data matrix plays the role of the response and one the predictor. Which is the reason for wanting to use a formula interface. Cheers, G > On 8/17/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: > > > >>>>> "GS" == Gavin Simpson <[EMAIL PROTECTED]> > > > >>>>> on Tue, 16 Aug 2005 18:44:23 +0100 writes: > > > > > > GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck > > > GS> wrote: > > > >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > > > >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor > > > >> Grothendieck wrote: > > It can handle data frames like > > > >> this: > > > >> > > > > > >> > > model.frame(y1) > > or > > model.frame(~., y1) > > > >> > > > > >> > Thanks Gabor, > > > >> > > > > >> > Yes, I know that works, but I want the function > > > >> coca.formula to accept a > formula like this y2 ~ y1, > > > >> with both y1 and y2 being data frames. It is > > > >> > > > >> The expressions I gave work generally (i.e. lm, glm, > > > >> ...), not just in model.matrix, so would it be ok if the > > > >> user just does this? > > > >> > > > >> yourfunction(y2 ~., y1) > > > > > > GS> Thanks again Gabor for your comments, > > > > > > GS> I'd prefer the y1 ~ y2 as data frames - as this is the > > > GS> most natural way of doing things. I'd like to have (y2 > > > GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also > > > GS> work - silently without any trouble. > > > > > > I'm sorry, Gavin, I tend to disagree quite a bit. > > > > > > The formula notation has quite a history in the S language, and > > > AFAIK never was the idea to use data.frames as formula > > > components, but rather as "environments" in which formula > > > components are looked up --- exactly as Gabor has explained. > > > > Hi Martin, thanks for your comments, > > > > But then one could have a matrix of variables on the rhs of the formula > > and it would work - whether this is a documented feature or un-intended > > side-effect of matrices being stored as vectors with dims, I don't know. > > > > And whilst the formula may have a long history, a number of packages > > have extended the interface to implement a specific feature, which don't > > work with standard functions like lm, glm and friends. I don't see how > > what I wanted to achieve is greatly different to that or using a matrix. > > > > > To break with such a deeply rooted principle, > > > you should have very very good reasons, because you're breaking > > > the concepts on which all other uses of formulae are based. > > > And this would potentially lead to much confusion of your users, > > > at least in the way they should learn to think about what > > > formulae mean. > > > > In the end I managed to treat y1 ~ y2 (both data frames) as a special > > case, which allows the existing formula notation to work as well, so I > > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This > > is what I wanted all along, to extend my interface (not do anything to > > R's formulae), but to also work in the traditional sense. > > > > The model I am writing code for really is modelling the relationship > > between two matrices of data. In one version of the method, there is > > real equivalence between both sides of the formula so it would seem odd > > to treat the two sides of the formula differently. At least to me ;-) > > > > > Martin > > > > > > > > > >> If it really is important to do it the way you describe, > > > >> are the data frames necessarily numeric? If so you could > > > >> preprocess your formula by placing as.matrix around all > > > >> the variables representing data frames using something > > > >> like this: > > > >> > > > >> > > > https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html > > > > > > GS> Yes, they are numeric matrices (as data frames). I've > > > GS> looked at this, but I'd prefer to not have to do too > > > GS> much messing with the formula. > > > > > > >> Of course, if they are necessarily numeric maybe they can > > > >> be matrices in the first place? > > > > > > GS> Because read.table etc. produce data.frames and this is > > > GS> the natural way to work with data in R. > > > > > > but it is also slightly inefficient if they are numeric. > > > There are places for data frames and for matrices. > > > > I agree - and in the code I've written, y1 and y2 quickly get coerced to > > matrices before the real number crunching begins. > > > > However, all the other R modelling functions I have used work with > > data.frames. Arguably, it could cause more confusion to write a function > > that looked, walked and quacked like an R modelling function but needed > > the user to apply an extra step to use - a step not usually required > > under normal R usage. > > > > All the best, > > > > Gav > > > > > Why should it be a problem to use > > > M <- as.matrix(read.table(..)) > > > ? > > > > > > For large files, it could be quite a bit more efficient, > > > needing a bit more of code, to > > > use scan() to read the numeric data directly : > > > > > > h1 <- scan(..., n=1) ## <read variable names> > > > nc <- length(h1) > > > a <- matrix(scan(...., what = numeric(), ...), > > > ncol = nc, dimnames = list(NULL, h1)) > > > > > > maybe this would be useful to be packaged into > > > a small utility with usage > > > > > > read.matrix(..., type = numeric(), ...) > > > > > > > > > GS> Following your suggestions, I altered my code to > > > GS> evaluate the rhs of the formula and check if it was of > > > GS> class "data.frame". If it is then I stop processing and > > > GS> return it as a data.frame as this point. If not, it > > > GS> eventually gets passed on to model.frame() for it to > > > GS> deal with it. > > > > > > GS> So far - limited testing - it seems to do what I wanted > > > GS> all along. I'm sure there's a gotcha in there somewhere > > > GS> but at least the code runs so I can check for problems > > > GS> against my examples. > > > > > > GS> Right, back to writing documentation... > > > > > > GS> G > > > > > > >> > more intuitive, to my mind at least for this particular > > > >> example and > analysis, to specify the formula with a > > > >> data frame on the rhs. > > > >> > > > > >> > model.frame doesn't work with the formula "~ y1" if the > > > >> object y1, in > the environment when model.frame > > > >> evaluates the formula, is a data.frame. > It works if y1 > > > >> is a matrix, however. I'd like to work around this > > > > >> problem, say by creating an environment in which y1 is > > > >> modified to be a > matrix, if possible. Can this be done? > > > >> > > > > >> > At the moment I have something working by grabbing the > > > >> bits of the > formula and then using get() to grab the > > > >> named object. Of course, this > won't work if someone > > > >> wants to use R's formula interface with the > following > > > >> formula y2 ~ var1 + var2 + var3, data = y1, or to use the > > > >> > subset argument common to many formula > > > >> implementations. I'd like to have > the function work in > > > >> as general a manner as possible, so I'm fishing > around > > > >> for potential solutions. > > > >> > > > > >> > All the best, > > > >> > > > > >> > Gav > > > >> > > > > >> > > > > > >> > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > > > >> wrote: > > > Hi I'm having a problem with model.frame, > > > >> encapsulated in this example: > > > >> > > > > > > >> > > > y1 <- > > > >> matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), > > > > > > >> nrow = 5, byrow = TRUE) > > > y1 <- as.data.frame(y1) > > > > > >> > rownames(y1) <- paste("site", 1:5, sep = "") > > > > > > >> colnames(y1) <- paste("spp", 1:4, sep = "") > > > y1 > > > >> > > > > > > >> > > > model.frame(~ y1) > > > Error in > > > >> model.frame(formula, rownames, variables, varnames, > > > >> extras, extranames, : > > > invalid variable type > > > >> > > > > > > >> > > > temp <- as.matrix(y1) > > > model.frame(~ temp) > > > > > >> > temp.spp1 temp.spp2 temp.spp3 temp.spp4 > > > 1 3 1 0 1 > > > >> > > > 2 0 1 1 0 > > > 3 0 0 1 0 > > > 4 0 0 1 1 > > > 5 0 > > > >> 1 1 1 > > > >> > > > > > > >> > > > Ideally the above wouldn't have names like > > > >> temp.var1, temp.var2, but one > > > could deal with that > > > >> later. > > > >> > > > > > > >> > > > I have tracked down the source of the error message > > > >> to line 1330 in > > > model.c - here I'm stumped as I > > > >> don't know any C, but it looks as if the > > > code is > > > >> looping over the variables in the formula and checking of > > > >> they > > > are the right "type". So a matrix of variables > > > >> gets through, but a > > > data.frame doesn't. > > > >> > > > > > > >> > > > It would be good if model.frame could cope with > > > >> data.frames in formulae, > > > but seeing as I am > > > >> incapable of providing a patch, is there a way around > > > > > >> > this problem? > > > >> > > > > > > >> > > > Below is the head of the function I am currently > > > >> using, including the > > > function for parsing the > > > >> formula - borrowed and hacked from > > > > > > >> ordiParseFormula() in package vegan. > > > >> > > > > > > >> > > > I can work out the class of the rhs of the > > > >> forumla. Is there a way to > > > create a suitable > > > >> environment for the data argument of parseFormula() > > > > > > >> such that it contains the rhs dataframe coerced to a > > > >> matrix, which then > > > should get through > > > >> model.frame.default without error? How would I go > > > > > > >> about manipulating/creating such an environment? Any > > > >> other ideas? > > > >> > > > > > > >> > > > Thanks in advance > > > >> > > > > > > >> > > > Gav > > > >> > > > > > > >> > > > coca.formula <- function(formula, method = > > > >> c("predictive", "symmetric"), > > > reg.method = > > > >> c("simpls", "eigen"), weights = NULL, > > > n.axes = > > > >> NULL, symmetric = FALSE, data) > > > { > > > parseFormula > > > >> <- function (formula, data) > > > { > > > browser() > > > > > > >> Terms <- terms(formula, "Condition", data = data) > > > > > > >> flapart <- fla <- formula <- formula(Terms, width.cutoff > > > >> = 500) > > > specdata <- formula[[2]] > > > X <- > > > >> eval(specdata, data, parent.frame()) > > > X <- > > > >> as.matrix(X) > > > formula[[2]] <- NULL > > > if > > > >> (formula[[2]] == "1" || formula[[2]] == "0") > > > Y <- > > > >> NULL > > > else { > > > mf <- model.frame(formula, data, > > > >> na.action = na.fail) > > > Y <- model.matrix(formula, mf) > > > >> > > > if (any(colnames(Y) == "(Intercept)")) { > > > xint > > > >> <- which(colnames(Y) == "(Intercept)") > > > Y <- Y[, > > > >> -xint, drop = FALSE] > > > } > > > } > > > list(X = X, Y > > > >> = Y) > > > } > > > if (missing(data)) > > > data <- > > > >> parent.frame() > > > #browser() > > > dat <- > > > >> parseFormula(formula, data) > > > >> > > > > > > >> > > > -- > > > >> > > > > > > >> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > >> > > > Gavin Simpson [T] +44 (0)20 7679 5522 > > > ENSIS > > > >> Research Fellow [F] +44 (0)20 7679 7565 > > > ENSIS > > > >> Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > > > UCL > > > >> Department of Geography [W] > > > >> http://www.ucl.ac.uk/~ucfagls/cv/ > > > 26 Bedford Way > > > >> [W] http://www.ucl.ac.uk/~ucfagls/ > > > London. WC1H > > > >> 0AP. > > > > > > >> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > >> > > > > > > >> > > > ______________________________________________ > > > > > >> > R-devel@r-project.org mailing list > > > > > > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > >> > > > > > > >> > -- > > > >> > > > > >> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > >> > Gavin Simpson [T] +44 (0)20 7679 5522 > ENSIS Research > > > >> Fellow [F] +44 (0)20 7679 7565 > ENSIS Ltd. & ECRC [E] > > > >> gavin.simpsonATNOSPAMucl.ac.uk > UCL Department of > > > >> Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ > 26 > > > >> Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ > London. > > > >> WC1H 0AP. > > > > >> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > >> > > > > >> > > > > >> > > > > GS> -- > > > GS> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > GS> Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research > > > GS> Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] > > > GS> gavin.simpsonATNOSPAMucl.ac.uk UCL Department of > > > GS> Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 > > > GS> Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. > > > GS> WC1H 0AP. > > > GS> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > > > > GS> ______________________________________________ > > > GS> R-devel@r-project.org mailing list > > > GS> https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > Gavin Simpson [T] +44 (0)20 7679 5522 > > ENSIS Research Fellow [F] +44 (0)20 7679 7565 > > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ > > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ > > London. WC1H 0AP. > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel