Re: [R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

David Winsemius Sat, 03 Mar 2012 17:20:56 -0800

I do agree with the recommendation to consider car::dataEllipse() forplotting a best fit for the theoretical bivariate Normal situation.However, the bagplot function from Has Peter Wold: http://www.wiwi.uni-bielefeld.de/~wolf/#softwareis more likely to "line up" and be suitable for drawing "contours"on with a two-dimensional kernel density estimate for a 2d ECDF in adistribution-agnostic analysis. I had never thought of trying tointegrate its output with an `rgl` representation of a 2d density, butit seems like an interesting and perhaps challenging exercise.

The real question is how much prior assumptions and knowledge aboutdistributions in nature and the joint relationships of the variablesunder consideration should constrain the efforts of the OP regardinga graphical representation. In medical situation, my experience isthat log-normal or Gamma distributions are more common than Normalones. This is in part because preclinical and clinical disease willskew measures.

(And I do thank drflxms for initiating a very interesting discussion.I share his belief that graphical representations are important tools.)


--
David.

On Mar 3, 2012, at 7:10 PM, Michael Friendly wrote:

On 3/3/2012 12:34 PM, drflxms wrote:
Thanx a lot Greg for the hint and letting me not alone with this!

Tried ellipse and it works well. But I am looking for something more
precise. The ellipse fits the real border to the nearest possible
ellipse. I want the "real" contour, if possible.
The 'contours' you are talking about here are estimates of the level
sets of your bivariate distribution with varying density.  There is
no "real" contour -- only what you can estimate from your data.

Following
your subject line, "plotting confidence interval on scatter plot ofbivariate normal distribution", the most straight-forward approach
is in terms of a data ellipse, e.g.

library(car)
dataEllipse(x,y, levels=c(0.40, 0.68))

where the ellipses plotted have the property that the 0.40 coverage
data ellipse has univariate mean +- 1SD shadows on the X,Y axes,
while the 0.68 ellipse has bivariate mean +- 1SD shadows on those
axes.  This is a precise answer to the question you posed, and others
have supplied other versions, but I think car::dataEllipse() is themost
generally useful, but this depends on the assumption that your data
are at least approximately bivariate normal.
Meanwhile I found an interesting function named draw.contour
http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
provided by "Forester", an "Assistant Professor in the Department of
Fisheries". - I love it how statistics brings people from completely
different specialities together. - I am a neurologist.
Similar plots of empirical contours of density for (x,y) can beobtainedwith contour() on the result of MASS::kde2d() andKernSmooth::bdke2D().
These may seem to you to be "more precise", but then they don't give
a way to estimate confidence regions with any coverage. Take yourchoice.
This function results not exactly in the same curve, I got with thecodeI posted last, but is in deed very close by (parallel). So I thinkboth
solutions are probably true and differ only because of rounding and
method (..!?).

Still I am not completely happy with

1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
symbolic, formula driven solution instead.
2. me not comprehending the code completely.

While 2. will be hopefully solved soon by me delving into the code, 1
remains...

Best, Felix


Am 03.03.12 17:28, schrieb Greg Snow:
Look at the ellipse package (and the ellipse function in thepackage)for a simple way of showing a confidence region for bivariate dataon
a plot (a 68% confidence interval is about 1 SD if you just want to
show 1 SD).
On Sat, Mar 3, 2012 at 7:54 AM, drflxms<drfl...@googlemail.com>wrote:
Dear all,

I created a bivariate normal distribution:

set.seed(138813)
n<-100
x<-rnorm(n); y<-rnorm(n)

and plotted a scatterplot of it:

plot(x,y)

Now I'd like to add the 2D-standard deviation.
I found a thread regarding plotting arbitrary confidenceboundaries from
Pascal Hänggi
http://www.mail-archive.com/r-help@r-project.org/msg24013.html
which cites the even older thread
http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
As I am unfortunately only a very poor R programmer, the code ofPascalHänggi is a myth to me and I am not sure whether I was able totranslatethe recommendation of Brain Ripley in the later thread (whichprovides
no code) into the the correct R code. Brain wrote:
You need a 2D density estimate (e.g. kde2d in MASS) then computethedensity values at the points and draw the contour of the densitywhichincludes 95% of the points (at a level computed from the sortedvaluesvia quantile()). [95% confidence interval was desired in threadinstead
of standard deviation...]

So I tried this...
den<-kde2d(x, y, n=n) #as I chose n to be the same as duringcreatingthe distributions x and y (see above), a z-value is assigned toevery
combination of x and y.

# create a sorted vector of z-values (instead of the matrix stored
inside the den object
den.z<-sort(den$z)

# set desired confidence border to draw and store it in variable
confidence.border<- quantile(den.z, probs=0.6827, na.rm = TRUE)
# draw a line representing confidence.border on the existingscatterplot
par(new=TRUE)
contour(den, levels=confidence.border, col = "red", add = TRUE)
Unfortunately I doubt very much this is correct :( In fact I amsurethis is wrong, because the border for probs=0.05 is drawn outsidethe
values.... So please help and check.
Pascal Hänggis code seems to work, but I don't understand themagic he
does with

pp<- array()
for (i in 1:1000){
       z.x<- max(which(den$x<  x[i]))
       z.y<- max(which(den$y<  y[i]))
       pp[i]<- den$z[z.x, z.y]
}

before doing the very same as I did above:

confidencebound<- quantile(pp, 0.05, na.rm = TRUE)

plot(x, y)
contour(den, levels = confidencebound, col = "red", add = TRUE)


My problems:
1.) setting probs=0.6827 is somehow a dirty trick which I canonly useby simply knowing that this is the percentage of values inside+-1sdwhen a distribution is normal. Is there a way doing this with"native"
sd function?
sd(den.z) is not correct, as den.z is in contrast to x and y notnormalany more. So ecdf(den.z)(sd(den.z)) results in a percentile of0.5644 in
this example instead of the desired 0.6827.
2.) I would like to have code that works with any desiredconfidence.Unfortunately setting probs to the desired confidence wouldprobably be
wrong (?!) as it relates to den.z instead of x and y, which are the
underlying distributions I am interested in. To put it short Iwant the
confidence of x/y and not of den.z.


I am really completely stuck. Please help me out of this! Felix


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Reply via email to