On Fri, 14 Feb 2003 08:06:45 +0000 (GMT), [EMAIL PROTECTED] said: > The issue here is that coef() tells you the coefficients in R's > internal parametrization of the model, and that is of no use to you > unless you have a means of creating a model matrix in C, SQL or (heaven > forbid) Perl. The information needed to re-create a model matrix is > stored in the lm fit, but in ways that are going to be hard to use > anywhere else (since they include R functions). This is not perverse: > what R does is very general, *far* more so than SPSS. Formulae in lm > can include poly() and ns() terms, for example. > I understand that. And indeed a perfectly general function export is a very big job. However, once we can export the model into a reasonably generic textual form, simply including the text name of any R functions in the export, then users can create special-case translators for the parts that they need. We try to make this as easy for ourselves as possible, for instance by doing all required transformations in SQL (where possible) before importing to R, which means that all the terms in the linear model are often untransformed variables. The only thing we don't do in SQL normally is creating the contrasts, since this is something that SQL is not well suited for.
> The only practical solution it seems to us is to ask R to create the > model matrix for new data. Then the things you are talking about are > just the colnames of that matrix, and don't need to be interpreted. > Yes, that makes things pretty easy then, but's it's not an option in all cases. We need to embed our models into C code. Previously we had a routine to take the SPSS output, convert it into C code, and then recompile the C code into our simulation. The linear model is utilised in the inner loop of the simulation so needs to be very fast; CORBA or SOAP calls to uncompiled code in the inner loop slow things down a great deal. In addition, the simulation is accessed by many people - requiring all of them install R would make the roll-out procedure much more complex. > You may want to read the sources to find out how R does it: that area > is one of the most complex parts of the internals, and one in which > bugs continue to emerge. > I'm glad to hear it is considered complex! ;-) I've actually been reading that bit of the code quite a bit over the last two days and haven't been getting that far. My lack of familiarity with the language, combined with the lack of comments in that section of code, and the very concise/non-descriptive variable names often used in the code, make this even harder. Still, it's a useful exercise for learning more about the language. > > The difficulty I am having is that the output of coef() is not really > > parsable, since there is no marker in the name of an coefficient of > > separate out the components. For instance, in SPSS the name of a > > coefficient might be: > > > > var1=[a]*var2=[b]*var3 > > > > ...which is easy to write a little script to pull that apart and > > turn it into a line of SQL, C, or whatever. In S however the name > > looks like: > > > > var1avar2bvar3 > > > > ...which provides no way to pull the bits apart. > > I find that impossible to understand anyway, but doubt that it > corresponds to SPSS. For a variable V, label Va does not mean V=[a] > except in unusual special cases. > I should firstly mention that I got this slightly wrong - I showed above the SPLUS output, not the R output. R actually looks like this: var1a:var2b:var3 The ':'s certainly help a lot, but still there's the problem of handling factor levels, which are concatenated with the variable name without a delimiter (at least, in all the linear models I've run so far, this is the case). I think with all the great feedback and ideas I've got so far on the list and in private mail (thanks everyone!) I have enough information to make a start. If I create anything that might be more generally useful I'll post back of course. Many thanks, Jeremy -- Jeremy Howard [EMAIL PROTECTED] ______________________________________________ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
