Re: [R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Michael Friendly Sat, 03 Mar 2012 16:12:24 -0800

On 3/3/2012 12:34 PM, drflxms wrote:

Thanx a lot Greg for the hint and letting me not alone with this!


Tried ellipse and it works well. But I am looking for something more
precise. The ellipse fits the real border to the nearest possible
ellipse. I want the "real" contour, if possible.


The 'contours' you are talking about here are estimates of the level
sets of your bivariate distribution with varying density.  There is
no "real" contour -- only what you can estimate from your data.

Following

your subject line, "plotting confidence interval on scatter plot ofbivariate normal distribution", the most straight-forward approach

is in terms of a data ellipse, e.g.

library(car)
dataEllipse(x,y, levels=c(0.40, 0.68))

where the ellipses plotted have the property that the 0.40 coverage
data ellipse has univariate mean +- 1SD shadows on the X,Y axes,
while the 0.68 ellipse has bivariate mean +- 1SD shadows on those
axes.  This is a precise answer to the question you posed, and others
have supplied other versions, but I think car::dataEllipse() is the most
generally useful, but this depends on the assumption that your data
are at least approximately bivariate normal.


Meanwhile I found an interesting function named draw.contour
http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
provided by "Forester", an "Assistant Professor in the Department of
Fisheries". - I love it how statistics brings people from completely
different specialities together. - I am a neurologist.


Similar plots of empirical contours of density for (x,y) can be obtained
with contour() on the result of MASS::kde2d() and KernSmooth::bdke2D().
These may seem to you to be "more precise", but then they don't give
a way to estimate confidence regions with any coverage.  Take your choice.


This function results not exactly in the same curve, I got with the code
I posted last, but is in deed very close by (parallel). So I think both
solutions are probably true and differ only because of rounding and
method (..!?).

Still I am not completely happy with

1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
symbolic, formula driven solution instead.
2. me not comprehending the code completely.

While 2. will be hopefully solved soon by me delving into the code, 1
remains...

Best, Felix


Am 03.03.12 17:28, schrieb Greg Snow:

Look at the ellipse package (and the ellipse function in the package)
for a simple way of showing a confidence region for bivariate data on
a plot (a 68% confidence interval is about 1 SD if you just want to
show 1 SD).

On Sat, Mar 3, 2012 at 7:54 AM, drflxms<drfl...@googlemail.com>  wrote:

Dear all,

I created a bivariate normal distribution:

set.seed(138813)
n<-100
x<-rnorm(n); y<-rnorm(n)

and plotted a scatterplot of it:

plot(x,y)

Now I'd like to add the 2D-standard deviation.

I found a thread regarding plotting arbitrary confidence boundaries from
Pascal Hänggi
http://www.mail-archive.com/r-help@r-project.org/msg24013.html
which cites the even older thread
http://tolstoy.newcastle.edu.au/R/help/03b/5384.html

As I am unfortunately only a very poor R programmer, the code of Pascal
Hänggi is a myth to me and I am not sure whether I was able to translate
the recommendation of Brain Ripley in the later thread (which provides
no code) into the the correct R code. Brain wrote:

You need a 2D density estimate (e.g. kde2d in MASS) then compute the
density values at the points and draw the contour of the density which
includes 95% of the points (at a level computed from the sorted values
via quantile()). [95% confidence interval was desired in thread instead
of standard deviation...]

So I tried this...

den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating
the distributions x and y (see above), a z-value is assigned to every
combination of x and y.

# create a sorted vector of z-values (instead of the matrix stored
inside the den object
den.z<-sort(den$z)

# set desired confidence border to draw and store it in variable
confidence.border<- quantile(den.z, probs=0.6827, na.rm = TRUE)

# draw a line representing confidence.border on the existing scatterplot
par(new=TRUE)
contour(den, levels=confidence.border, col = "red", add = TRUE)

Unfortunately I doubt very much this is correct :( In fact I am sure
this is wrong, because the border for probs=0.05 is drawn outside the
values.... So please help and check.
Pascal Hänggis code seems to work, but I don't understand the magic he
does with

pp<- array()
for (i in 1:1000){
        z.x<- max(which(den$x<  x[i]))
        z.y<- max(which(den$y<  y[i]))
        pp[i]<- den$z[z.x, z.y]
}

before doing the very same as I did above:

confidencebound<- quantile(pp, 0.05, na.rm = TRUE)

plot(x, y)
contour(den, levels = confidencebound, col = "red", add = TRUE)


My problems:

1.) setting probs=0.6827 is somehow a dirty trick which I can only use
by simply knowing that this is the percentage of values inside +-1sd
when a distribution is normal. Is there a way doing this with "native"
sd function?
sd(den.z) is not correct, as den.z is in contrast to x and y not normal
any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in
this example instead of the desired 0.6827.

2.) I would like to have code that works with any desired confidence.
Unfortunately setting probs to the desired confidence would probably be
wrong (?!) as it relates to den.z instead of x and y, which are the
underlying distributions I am interested in. To put it short I want the
confidence of x/y and not of den.z.


I am really completely stuck. Please help me out of this! Felix


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Reply via email to