A few more things for you to consider:

1) ?formula will tell you all about the formula syntax in R.  a * b expands to 
a +  b + a:b, and a:b is an �interaction term� which is basically 
multiplication.  You can also use the I() function to inhibit the formula from 
treating things like + and * in formula-special ways.

2) I think you should modify your approach to R slightly.  You are correct that 
beginners are not well served by having too much thrown at them.  But I think 
they are also poorly served by being fed lots of specialized functions that do 
1 highly specific task in an idiosyncratic way.  So...

2a) If you are providing them scripts, make sure that they �play well together� 
and represent a coherent system.  Doing things �the R way� is part of this, but 
there are actually multiple R ways (and also many things that are definitely 
not an R way). So...

2b) Begin by determining which standard tools and packages you want to use and 
make the rest of what you give them fit into that system.  This reduces the 
cognitive load for your students and allows them to do more with less.

2c) In fact, I recommend that you make a list of all the code you used in your 
most recent course. Then arrange the code into things that you find essential 
and nonessential.  Also mark the things your students found easy and hard.  Now 
try to get rid of as much as you can without losing things students can do.  
This may require replacing some old favorites with some new favorites.  You 
will know you are succeeding if when you introduce something new, your students 
can pretty much guess how it works before you show them.

3) The mosaic package (I am the maintainer) provides one particular way of 
doing this.  It begins by assuming you will want to show students the modeling 
language (to use things like lm()), so it emphasizes using formulas.  For this 
reason, the primary graphics system chosen is lattice rather than ggplot2.  
(There are a few things that support ggplot2 users as well, and you might take 
a look at mplot(), in particular.)  We have also added functions that provide 
formula interfaces to many numerical summaries, including our favorite:

> favstats( disp ~ cyl, data=mtcars )
  .group   min     Q1 median     Q3   max     mean       sd  n missing
1      4  71.1  78.85  108.0 120.65 146.7 105.1364 26.87159 11       0
2      6 145.0 160.00  167.6 196.30 258.0 183.3143 41.56246  7       0
3      8 275.8 301.75  350.5 390.00 472.0 353.1000 67.77132 14       0

In the end, most of what beginners need to do can be done with a single 
template:

goal( formula, data = mydata )

where formula is one of

y ~ x
y ~ x | z
          ~ x
  ~ x | z

For data manipulation (if that is important to you), we are moving to using the 
dplyr and tidyr packages.

4) The internals of your functions are less important (for student use) than 
the API, so choose your API and variable names very carefully.  For example, 
nearly all R functions that receive data call the variable �data�, not 
�dataFrame�.  Why have your function be different from all the rest? That makes 
students need to remember when to use �data� and when to use �dataFrame�.  
Compare your functions to the functions in your list in 2c.

5) The mosaic package also provides functions called makeFun() and plotFun() 
that make it easy to extract from a model a functional representation of the 
fit and to plot that function on top of a scatter plot.  Try example(makeFun).

6) I�m including below another version of your function that provides a formula 
interface:

models <- dp4dsFit(mpg ~ hp, data = mtcars)

It also rearranges the data a bit to be less wasteful of space (at the cost of 
some slightly trickier ggplot2 code).  These plots could be done in lattice as 
well, if you decided to go that route.  I�ve not done any serious debugging or 
tried to make other improvements, I just wanted to demonstrate how to write 
something with a formula interface, in case you decide to go more in that 
direction.

Enjoy!

�rjp

dp4dsFit <- function(formula, data = parent.frame())
{
  # code below presumes formula has y ~ x shape.  Fancier code
  # could check this and throw an error when badly shaped formulas
  # are attempted.
  # The code could be made more beautiful using mosaic::lhs() and
  # mosaic::rhs() to extract left and right sides of formula.
  yName = paste(deparse(formula[[3]]), collapse="")
  xName = paste(deparse(formula[[2]]), collapse="")
  quadraticFormula <-
    substitute(y ~ poly(x, 2),
               list(y = formula[[2]], x = formula[[3]]))
  nlognFormula <-
    substitute(y ~ x + x : log(x),
               list(y = formula[[2]], x = formula[[3]]))

  dp4dsQuadraticFit <- eval(
    substitute( lm(f, data=data), list(f=quadraticFormula))
    )
  dp4dsNlogNFit <- eval(
    substitute(lm(f, data=data), list(f=nlognFormula))
  )

  # could also use dplyr::mutate() to add in extra variables
  data <- transform(
    data,
    predicted_quad = predict(dp4dsQuadraticFit),
    predicted_nlogn = predict(dp4dsNlogNFit)
  )
  R <- list(dp4dsQuadraticFit = dp4dsQuadraticFit,
            dp4dsNlogNFit = dp4dsNlogNFit,
            data = data,
            yName = yName,
            xName = xName)
  class(R) <- c("dp4dsFit", class(R))
  return(R)
}

print.dp4dsFit <- function(x) {
  cat(
"=============\r
Quadratic fit\r
=============\r")
  print(x$dp4dsQuadraticFit)
  cat(
"==========\r
n lg n fit\r
==========\r")
  print(x$dp4dsNlogNFit)
}

summary.dp4dsFit <- function(object, plot = FALSE, ...) {
  R <- sapply(object[1:2],
              summary,
              simplify=FALSE)
  if(plot) print(plot(object))
  class(R) <- c("dp4dsFit", class(R))
  return(R)
}

aes_c <- function( ... ) {
  res <- c( ... )
  class(res) <- "uneval"
  res
}

plot.dp4dsFit <- function(x, y, xLabel = x$yName, yLabel = x$xName, ...) {
  library(ggplot2)
  ggplot() +
    geom_point(data = x$data,
               aes_string(x = x$yName, y = x$xName),
               size = 3) +
    geom_line(data = x$data,
              aes_c(
                aes_string(x = x$yName),
                aes(y = predicted_quad, colour = "quadratic"))) +
    geom_line(data = x$data,
              aes_c(
                aes_string(x = x$yName),
                aes(y = predicted_nlogn, colour = "n log n"))) +
    xlab(label = xLabel) +
    ylab(label = yLabel) +
    guides(colour = guide_legend(title="model"))
}

## now you can do it all in one:
models <- dp4dsFit(mpg ~ hp, data = mtcars)
summary(models, plot=TRUE)
## or just plot it
plot(models)
## or just look at the model summaries
summary(models)
## or do something else entirely:
plot(models[[1]], which = 1)
mosaic::mplot(models[[2]], which = 1, system="gg")


On Dec 31, 2014, at 12:49 AM, Warford, Stan 
<[email protected]<mailto:[email protected]>> wrote:

Thanks for the prompt responses. What a great list!

I am going with Ista�s solution. I appreciate the R way, but the whole point of 
this script is to shield students from having to know R as much as possible. I 
don�t want to give them any choices. In fact, last year I had them use Deducer 
thinking that point and click would be easy and they could experiment to their 
hearts content, but that was a complete disaster. This year using these 
pre-written scripts with RStudio was much better. Even I have only learned 
enough R to show students how to do a curve fit. I am a complete novice.

Dennis questioned the model.

Q: Do you want x:log(x) or x * log(x) in the second geom_smooth() formula?

I hope I am doing this correctly. Computer science theory predicts n lg n 
behavior for some data sets and quadratic for others. I hope I am fitting to

A * n * log(n) + B * n + C

where * in the above expression represents multiplication. I was under the 
impression that : in the model formula was multiplication. Can someone verify 
that.

Thanks,
Stan

J. Stanley Warford
Professor of Computer Science
Pepperdine University
Malibu, CA 90263
[email protected]<mailto:[email protected]>
310-506-4332

_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching


        [[alternative HTML version deleted]]

_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching

Reply via email to