A few more things for you to consider:
1) ?formula will tell you all about the formula syntax in R. a * b expands to
a + b + a:b, and a:b is an �interaction term� which is basically
multiplication. You can also use the I() function to inhibit the formula from
treating things like + and * in formula-special ways.
2) I think you should modify your approach to R slightly. You are correct that
beginners are not well served by having too much thrown at them. But I think
they are also poorly served by being fed lots of specialized functions that do
1 highly specific task in an idiosyncratic way. So...
2a) If you are providing them scripts, make sure that they �play well together�
and represent a coherent system. Doing things �the R way� is part of this, but
there are actually multiple R ways (and also many things that are definitely
not an R way). So...
2b) Begin by determining which standard tools and packages you want to use and
make the rest of what you give them fit into that system. This reduces the
cognitive load for your students and allows them to do more with less.
2c) In fact, I recommend that you make a list of all the code you used in your
most recent course. Then arrange the code into things that you find essential
and nonessential. Also mark the things your students found easy and hard. Now
try to get rid of as much as you can without losing things students can do.
This may require replacing some old favorites with some new favorites. You
will know you are succeeding if when you introduce something new, your students
can pretty much guess how it works before you show them.
3) The mosaic package (I am the maintainer) provides one particular way of
doing this. It begins by assuming you will want to show students the modeling
language (to use things like lm()), so it emphasizes using formulas. For this
reason, the primary graphics system chosen is lattice rather than ggplot2.
(There are a few things that support ggplot2 users as well, and you might take
a look at mplot(), in particular.) We have also added functions that provide
formula interfaces to many numerical summaries, including our favorite:
> favstats( disp ~ cyl, data=mtcars )
.group min Q1 median Q3 max mean sd n missing
1 4 71.1 78.85 108.0 120.65 146.7 105.1364 26.87159 11 0
2 6 145.0 160.00 167.6 196.30 258.0 183.3143 41.56246 7 0
3 8 275.8 301.75 350.5 390.00 472.0 353.1000 67.77132 14 0
In the end, most of what beginners need to do can be done with a single
template:
goal( formula, data = mydata )
where formula is one of
y ~ x
y ~ x | z
~ x
~ x | z
For data manipulation (if that is important to you), we are moving to using the
dplyr and tidyr packages.
4) The internals of your functions are less important (for student use) than
the API, so choose your API and variable names very carefully. For example,
nearly all R functions that receive data call the variable �data�, not
�dataFrame�. Why have your function be different from all the rest? That makes
students need to remember when to use �data� and when to use �dataFrame�.
Compare your functions to the functions in your list in 2c.
5) The mosaic package also provides functions called makeFun() and plotFun()
that make it easy to extract from a model a functional representation of the
fit and to plot that function on top of a scatter plot. Try example(makeFun).
6) I�m including below another version of your function that provides a formula
interface:
models <- dp4dsFit(mpg ~ hp, data = mtcars)
It also rearranges the data a bit to be less wasteful of space (at the cost of
some slightly trickier ggplot2 code). These plots could be done in lattice as
well, if you decided to go that route. I�ve not done any serious debugging or
tried to make other improvements, I just wanted to demonstrate how to write
something with a formula interface, in case you decide to go more in that
direction.
Enjoy!
�rjp
dp4dsFit <- function(formula, data = parent.frame())
{
# code below presumes formula has y ~ x shape. Fancier code
# could check this and throw an error when badly shaped formulas
# are attempted.
# The code could be made more beautiful using mosaic::lhs() and
# mosaic::rhs() to extract left and right sides of formula.
yName = paste(deparse(formula[[3]]), collapse="")
xName = paste(deparse(formula[[2]]), collapse="")
quadraticFormula <-
substitute(y ~ poly(x, 2),
list(y = formula[[2]], x = formula[[3]]))
nlognFormula <-
substitute(y ~ x + x : log(x),
list(y = formula[[2]], x = formula[[3]]))
dp4dsQuadraticFit <- eval(
substitute( lm(f, data=data), list(f=quadraticFormula))
)
dp4dsNlogNFit <- eval(
substitute(lm(f, data=data), list(f=nlognFormula))
)
# could also use dplyr::mutate() to add in extra variables
data <- transform(
data,
predicted_quad = predict(dp4dsQuadraticFit),
predicted_nlogn = predict(dp4dsNlogNFit)
)
R <- list(dp4dsQuadraticFit = dp4dsQuadraticFit,
dp4dsNlogNFit = dp4dsNlogNFit,
data = data,
yName = yName,
xName = xName)
class(R) <- c("dp4dsFit", class(R))
return(R)
}
print.dp4dsFit <- function(x) {
cat(
"=============\r
Quadratic fit\r
=============\r")
print(x$dp4dsQuadraticFit)
cat(
"==========\r
n lg n fit\r
==========\r")
print(x$dp4dsNlogNFit)
}
summary.dp4dsFit <- function(object, plot = FALSE, ...) {
R <- sapply(object[1:2],
summary,
simplify=FALSE)
if(plot) print(plot(object))
class(R) <- c("dp4dsFit", class(R))
return(R)
}
aes_c <- function( ... ) {
res <- c( ... )
class(res) <- "uneval"
res
}
plot.dp4dsFit <- function(x, y, xLabel = x$yName, yLabel = x$xName, ...) {
library(ggplot2)
ggplot() +
geom_point(data = x$data,
aes_string(x = x$yName, y = x$xName),
size = 3) +
geom_line(data = x$data,
aes_c(
aes_string(x = x$yName),
aes(y = predicted_quad, colour = "quadratic"))) +
geom_line(data = x$data,
aes_c(
aes_string(x = x$yName),
aes(y = predicted_nlogn, colour = "n log n"))) +
xlab(label = xLabel) +
ylab(label = yLabel) +
guides(colour = guide_legend(title="model"))
}
## now you can do it all in one:
models <- dp4dsFit(mpg ~ hp, data = mtcars)
summary(models, plot=TRUE)
## or just plot it
plot(models)
## or just look at the model summaries
summary(models)
## or do something else entirely:
plot(models[[1]], which = 1)
mosaic::mplot(models[[2]], which = 1, system="gg")
On Dec 31, 2014, at 12:49 AM, Warford, Stan
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the prompt responses. What a great list!
I am going with Ista�s solution. I appreciate the R way, but the whole point of
this script is to shield students from having to know R as much as possible. I
don�t want to give them any choices. In fact, last year I had them use Deducer
thinking that point and click would be easy and they could experiment to their
hearts content, but that was a complete disaster. This year using these
pre-written scripts with RStudio was much better. Even I have only learned
enough R to show students how to do a curve fit. I am a complete novice.
Dennis questioned the model.
Q: Do you want x:log(x) or x * log(x) in the second geom_smooth() formula?
I hope I am doing this correctly. Computer science theory predicts n lg n
behavior for some data sets and quadratic for others. I hope I am fitting to
A * n * log(n) + B * n + C
where * in the above expression represents multiplication. I was under the
impression that : in the model formula was multiplication. Can someone verify
that.
Thanks,
Stan
J. Stanley Warford
Professor of Computer Science
Pepperdine University
Malibu, CA 90263
[email protected]<mailto:[email protected]>
310-506-4332
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
[[alternative HTML version deleted]]
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching