Hi:
On Sun, Oct 17, 2010 at 8:46 AM, Federico Bonofiglio <[email protected]>wrote: > Dear Masters, > > I have a question to submit > > > consider the following script > > m<-4.95 > obs<-rpois(36,m) # i generate 36 realization from a poisson(m) > > hist(obs,freq=F) > curve(dpois(x,m),add=T,col="red") #i wish to overlay on the histogram the > theorical poisson density function > A histogram is meant to be a crude density estimate of a continuous random variable; similarly, curve() is meant to display a continuous function of x. You have a [discrete] Poisson probability mass function (pmf) with a corresponding sample of size 36 from it. These should be displayed as probability mass functions and bar charts, respectively. I wouldn't be surprised if this were homework, but it's an opportunity to teach you a few things about the graphics packages , including lattice and ggplot2, which will be worth your time if you intend to pursue statistics beyond the present. Luckily(?), I have a little time today.... :) Discrete distributions should be plotted as discrete distributions. Bar charts are good for this task - the separation placed between categories is meant to denote that the values on the x-axis are discrete. You can superimpose a theoretical Poisson pmf onto a bar chart rather easily in both ggplot2 and lattice. I'll leave the base graphics solution to the end. There are two ways to go: (1) Compare the relative frequency bar chart of the sample with its theoretical pmf; (2) Compare the cumulative relative frequencies with the theoretical cumulative pmf. # Approach 1: Bar charts # Preliminary step: data manipulation Your goal is to overlay the theoretical pmf onto the empirical distribution, but the latter has to be a relative frequency distribution in order for the two to be comparable. The hardest part, in some sense, is to arrange the data properly so that the observed relative frequencies and the Poisson probabilities can all be plotted - from a sample of size 36, it is possible that one or more x values will be missing and cause gaps in the plot. We need to handle that contingency first before plotting. (I found that out the hard way while proofing the code.) The below is a little messy, but it's functional. m <- 4.95 obs <- rpois(36, 4.95) # Create a data frame from the frequency table generated from obs d <- as.data.frame(table(obs)) d$Obs <- as.numeric(as.character(d$obs) ) # convert obs from factor to numeric d$rfreq <- with(d, Freq/sum(Freq)) # compute the relative frequencies # Now create a data frame that will compute Poisson probabilities for x = 0 to max(d$Obs): x <- seq(0, max(d$Obs)) dpm <- data.frame(x = x, pm = dpois(x, 4.95)) # Merge d and dpm, retaining all the rows of dpm - there may be NAs # in some places with respect to d dd <- merge(d, dpm, by.x = 'Obs', by.y = 'x', all.y = TRUE) dd$obs <- NULL # remove factor obs dd$Freq <- NULL # remove raw frequencies - no longer needed # Convert NA relative frequencies to zero dd$rfreq[is.na(dd$rfreq)] <- 0 dd str(dd) # everything should be numeric # Add the graphics packages to the session (make sure they're installed first) library(ggplot2) library(lattice) # Superimposed relative frequency bar charts with Poisson pmf: # the first is the lattice version, the second from ggplot2. # Lattice: the panel function is necessary to superimpose two different # graphs onto the same graphics surface. Type 'h' plots vertical lines from 0 # to the y-value for each x; lwd represents line width. xyplot(pm ~ Obs, data = dd, panel = function(x, y, ...) { panel.barchart(x, dd$rfreq, horizontal = FALSE, col = 'orange', ...) panel.xyplot(x, y, type = 'h', col = 'blue', lwd = 4) }, xlab = 'x', ylab = '', ylim = c(0, 0.3)) # ggplot2 is a modular approach to graphics. One puts down graphical layers # with geoms to which other graphical elements can be added. The resulting # graph is pretty much the same as the xyplot, but the default background is # different. Since geom_segment() doesn't use the fill 'aesthetic' (it's black # no matter what), we adjust the legend accordingly. Notice how the quoted # strings in fill = become the labels of the legend. Adjust the ylab() as needed. p <- ggplot(dd, aes(x = Obs)) p + geom_bar(aes(y = rfreq, fill = 'observed'), stat = 'identity') + geom_segment(aes(xend = x, yend = 0, y = pm, fill = 'Poisson(4.95)'), size = 4) + scale_fill_manual('Type', values = c('observed' = 'orange', 'Poisson(4.95)' = 'black')) + xlab('x') + ylab("") + ylim(0, 0.3) Another way to compare empirical and theoretical distributions is to compare their corresponding cdf's. # Approach 2: Empirical cdfs: # ggplot2's geom_step() will plot a step function from its input # arguments. In this case, it is convenient to 'melt' the data frame dd so that # the empirical and theoretical cumulative distributions can easily be plotted # separately in both ggplot2 and lattice. # 'Melt' dd- we want the Obs values to remain the # same for each stacked variable, so it is the 'id' variable: ddm <- melt(dd, id = 'Obs') # to be used for the legend...the value of the each argument # needs to be parameterized because the # number of rows in dd can vary from one sample to another ddm$Type <- factor(rep(c('Observed', 'Poisson(4.95)'), each = nrow(dd))) # Concatenate cumulative relative frequency of sample and cdf of Poisson, # add to melted data frame ddm: dd$crf <- cumsum(dd$rfreq) # cumulative rel freq of sample dd$cpm <- cumsum(dd$pm) # Poisson cdf ddm$cdf <- with(dd, c(crf, cpm)) # add to ddm # Plot of empirical and theoretical cdfs: first one is the ggplot2 version, # second is the lattice version # scale_colour_manual() is used to override default colors r <- ggplot(ddm, aes(x = Obs, y = cdf, groups = Type, colour = Type)) r + geom_step(size = 1) + scale_colour_manual(values = c('Observed' = 'red', 'Poisson(4.95)' = 'blue')) + xlab('x') + ylab("Cumulative relative frequency/probability") # For a different look... last_plot() + theme_bw() + opts(legend.position = c(0.2, 0.9)) # lattice versions # Different ways of specifying a legend key, differing in position. First one # is in upper left corner inside plot region, other two are outside it. mykey1 = list(corner = c(0, 1), text = list(levels(ddm$Type), cex = 0.8), lines = list(lty = 1, col = c('red', 'blue'), lwd = 2), title = 'Type', cex.title = 1.1) mykey2 <- list(space = 'right', text = list(levels(ddm$Type), cex = 0.8), lines = list(lty = 1, col = c('red', 'blue'), lwd = 2), title = 'Type', cex.title = 1.1) mykey3 <- list(space = 'top', text = list(levels(ddm$Type), cex = 0.8), lines = list(lty = 1, col = c('red', 'blue'), lwd = 2), title = 'Type', cex.title = 1.1, columns = 2) xyplot(cdf ~ Obs, data = ddm, groups = Type, type = 's', col = c('red', 'blue'), lwd = 2, key = mykey1, xlab = 'x', ylab = 'Cumulative relative frequency/probability') # Substitute mykey2 and mykey3 for mykey1 to get different looks wrt the legend. I had to do the base graphics version just because, which turned out to be a pain because you have to engage in trial and error to get the spacing correct in the Poisson pmf graph (the second bar chart). Here is one way to do it: # Base graphics version: u <- with(dd, barplot(rfreq, col = 'orange')) # u is a matrix with one column containing thex-coordinates where the bars are centered with(dd, barplot(pm, width = 0.4, space = c(1.3, rep(2, length(u) - 1)), col = 'blue', add = TRUE)) axis(1, at = as.vector(u), labels = c((1:length(u)) - 1)) # insert x-axis labels # Generate a legend - the coordinates will likely need to be adjusted from sample to sample # (first two arguments) legend(8.5, 0.23, legend = c('Observed', 'Poisson(4.95)'), lty = 1, lwd = 2, col = c('orange', 'blue')) # Note: if you change width, you will have to adjust the spacing accordingly. That's a fun little exercise. I'll let you figure out how to do the cumulative distribution plot in base graphics. I'm tired :) HTH, Dennis errors are returned saing the x vector doesn't contain integers.... > > really bizarre i can't give explanation (R version 2.11.1, no source codes) > > would u be so kind to suggest me a solution??? > > thank u > > FB student of statistics at milano bicocca > > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

