I do agree with the recommendation to consider car::dataEllipse() for
plotting a best fit for the theoretical bivariate Normal situation.
However, the bagplot function from Has Peter Wold: http://www.wiwi.uni-bielefeld.de/~wolf/#software
is more likely to "line up" and be suitable for drawing "contours"
on with a two-dimensional kernel density estimate for a 2d ECDF in a
distribution-agnostic analysis. I had never thought of trying to
integrate its output with an `rgl` representation of a 2d density, but
it seems like an interesting and perhaps challenging exercise.
The real question is how much prior assumptions and knowledge about
distributions in nature and the joint relationships of the variables
under consideration should constrain the efforts of the OP regarding
a graphical representation. In medical situation, my experience is
that log-normal or Gamma distributions are more common than Normal
ones. This is in part because preclinical and clinical disease will
skew measures.
(And I do thank drflxms for initiating a very interesting discussion.
I share his belief that graphical representations are important tools.)
--
David.
On Mar 3, 2012, at 7:10 PM, Michael Friendly wrote:
On 3/3/2012 12:34 PM, drflxms wrote:
Thanx a lot Greg for the hint and letting me not alone with this!
Tried ellipse and it works well. But I am looking for something more
precise. The ellipse fits the real border to the nearest possible
ellipse. I want the "real" contour, if possible.
The 'contours' you are talking about here are estimates of the level
sets of your bivariate distribution with varying density. There is
no "real" contour -- only what you can estimate from your data.
Following
your subject line, "plotting confidence interval on scatter plot of
bivariate normal distribution", the most straight-forward approach
is in terms of a data ellipse, e.g.
library(car)
dataEllipse(x,y, levels=c(0.40, 0.68))
where the ellipses plotted have the property that the 0.40 coverage
data ellipse has univariate mean +- 1SD shadows on the X,Y axes,
while the 0.68 ellipse has bivariate mean +- 1SD shadows on those
axes. This is a precise answer to the question you posed, and others
have supplied other versions, but I think car::dataEllipse() is the
most
generally useful, but this depends on the assumption that your data
are at least approximately bivariate normal.
Meanwhile I found an interesting function named draw.contour
http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
provided by "Forester", an "Assistant Professor in the Department of
Fisheries". - I love it how statistics brings people from completely
different specialities together. - I am a neurologist.
Similar plots of empirical contours of density for (x,y) can be
obtained
with contour() on the result of MASS::kde2d() and
KernSmooth::bdke2D().
These may seem to you to be "more precise", but then they don't give
a way to estimate confidence regions with any coverage. Take your
choice.
This function results not exactly in the same curve, I got with the
code
I posted last, but is in deed very close by (parallel). So I think
both
solutions are probably true and differ only because of rounding and
method (..!?).
Still I am not completely happy with
1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
symbolic, formula driven solution instead.
2. me not comprehending the code completely.
While 2. will be hopefully solved soon by me delving into the code, 1
remains...
Best, Felix
Am 03.03.12 17:28, schrieb Greg Snow:
Look at the ellipse package (and the ellipse function in the
package)
for a simple way of showing a confidence region for bivariate data
on
a plot (a 68% confidence interval is about 1 SD if you just want to
show 1 SD).
On Sat, Mar 3, 2012 at 7:54 AM, drflxms<drfl...@googlemail.com>
wrote:
Dear all,
I created a bivariate normal distribution:
set.seed(138813)
n<-100
x<-rnorm(n); y<-rnorm(n)
and plotted a scatterplot of it:
plot(x,y)
Now I'd like to add the 2D-standard deviation.
I found a thread regarding plotting arbitrary confidence
boundaries from
Pascal Hänggi
http://www.mail-archive.com/r-help@r-project.org/msg24013.html
which cites the even older thread
http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
As I am unfortunately only a very poor R programmer, the code of
Pascal
Hänggi is a myth to me and I am not sure whether I was able to
translate
the recommendation of Brain Ripley in the later thread (which
provides
no code) into the the correct R code. Brain wrote:
You need a 2D density estimate (e.g. kde2d in MASS) then compute
the
density values at the points and draw the contour of the density
which
includes 95% of the points (at a level computed from the sorted
values
via quantile()). [95% confidence interval was desired in thread
instead
of standard deviation...]
So I tried this...
den<-kde2d(x, y, n=n) #as I chose n to be the same as during
creating
the distributions x and y (see above), a z-value is assigned to
every
combination of x and y.
# create a sorted vector of z-values (instead of the matrix stored
inside the den object
den.z<-sort(den$z)
# set desired confidence border to draw and store it in variable
confidence.border<- quantile(den.z, probs=0.6827, na.rm = TRUE)
# draw a line representing confidence.border on the existing
scatterplot
par(new=TRUE)
contour(den, levels=confidence.border, col = "red", add = TRUE)
Unfortunately I doubt very much this is correct :( In fact I am
sure
this is wrong, because the border for probs=0.05 is drawn outside
the
values.... So please help and check.
Pascal Hänggis code seems to work, but I don't understand the
magic he
does with
pp<- array()
for (i in 1:1000){
z.x<- max(which(den$x< x[i]))
z.y<- max(which(den$y< y[i]))
pp[i]<- den$z[z.x, z.y]
}
before doing the very same as I did above:
confidencebound<- quantile(pp, 0.05, na.rm = TRUE)
plot(x, y)
contour(den, levels = confidencebound, col = "red", add = TRUE)
My problems:
1.) setting probs=0.6827 is somehow a dirty trick which I can
only use
by simply knowing that this is the percentage of values inside
+-1sd
when a distribution is normal. Is there a way doing this with
"native"
sd function?
sd(den.z) is not correct, as den.z is in contrast to x and y not
normal
any more. So ecdf(den.z)(sd(den.z)) results in a percentile of
0.5644 in
this example instead of the desired 0.6827.
2.) I would like to have code that works with any desired
confidence.
Unfortunately setting probs to the desired confidence would
probably be
wrong (?!) as it relates to den.z instead of x and y, which are the
underlying distributions I am interested in. To put it short I
want the
confidence of x/y and not of den.z.
I am really completely stuck. Please help me out of this! Felix
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.