Laura M Marx wrote: > Hi there, > I've looked through the very helpful advice about adding fitted lines to > plots in the r-help archive, and can't find a post where someone has offered > a solution for my specific problem. I need to plot logistic regression fits > from three differently-sized data subsets on a plot of the entire dataset. > A description and code are below: > I have an unbalanced dataset consisting of three different species (hem, > yb, and sm), with unequal numbers of wood pieces in each species group. I > am trying to generate a plot that will show the size of the wood piece on > the X axis, the probability of it having tree seedlings growing on it on the > Y (a binomial yes or no variable), and three fitted curves showing how the > probability of having tree seedlings changes with increasing wood piece size > for each species. > I have no problem generating fits using GLM, and no problem creating the > plot. However, if I try to add a fitted curve based only on the hem data > subset to a plot that shows the entire dataset, I get an error message that > the lengths of those data sets differ. "Error in xy.coords(x,y) : x and y > lengths differ". I could see R's point -- you can't plot a regression line > of babies born as a function of stork abundance on a graph of cherries > produced (Y) versus rainfall (X), which for all the program knows, I'm > trying to do. As a temporary fix, I added NAs to the end of the hem, yb, > and sm subsets to make them the same length as the entire dataset. I can > now add my fitted curves to the plot, but the lines are not connected. That > is, if the hem group only contains wood pieces that are 1, 4, and 10 meters > long, the plot has an X axis that ranges from 1 to 10, but line segments for > the hem group regression line only appear above 1, 4, and 10. How can I fix > this? An ideal solution would not require me to make the hem subset of my > data the same length as the full dataset, either (although the summaries of > regressions with the NAs (or zeroes) added and taken away are identical). > I'd also settle for a work-around that would have R connect the pieces of > the curve so that I get a solid line rather than small dots and dashes where > actual data exist. Thanks so much for your help! > Laura Marx > Michigan State University, Dept. of Forestry > > #Note: hemdata has all the rows that are not hemlock species replaced with > #"NA"s. > hemhem=glm(hempresence~logarea, family=binomial(logit), data=hemdata) > hemyb=glm(hempresence~logarea, family=binomial(logit), data=birchdata) > hemsm=glm(hempresence~logarea, family=binomial(logit), data=mapledata) > > attach(logreg) #logreg is the full dataset > plot(logarea, hempresence, xlab = "Surface area of log (m2)", > ylab="Probability of hemlock seedling presence", type="n", font.lab=2, > cex.lab=1.5, axes=TRUE) > lines(logarea,fitted(hemhem), lty=1, lwd=2) > lines(logarea,fitted(hemyb), lty="dashed", lwd=2) > lines(logarea,fitted(hemsm), lty="dotted", lwd=2) >
Hi, Laura, Would ?predict.glm be better? plot(logarea, hempresence, xlab = "Surface area of log (m2)", ylab="Probability of hemlock seedling presence", type="n", font.lab=2, cex.lab=1.5, axes=TRUE) lines(logarea, predict(hemhem, logreg, "response"), lty=1, lwd=2) lines(logarea, predict(hemyb, logreg, "response"), lty="dashed", lwd=2) lines(logarea, predict(hemsm, logreg, "response"), lty="dotted", lwd=2) Without seeing more description of your data, this is still a guess. --sundar ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html