Re: [R] Density estimation graphs
Mark Wardle wrote: Dear all, I'm struggling with a plot and would value any help! ... Is there a better way? As always, I'm sure there's a one-liner rather than my crude technique! As always, I've spent ages trying to sort this, and then the minute after sending an email, I find the polygon() function. Ignore previous message! Best wishes, Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Density estimation graphs
Dear all, I'm struggling with a plot and would value any help! I'm attempting to highlight a histogram and density plot to show a proportion of cases above a threshold value. I wanted to cross-hatch the area below the density curve. The breaks and bandwidth are deliberate integer values because of the type of data I'm looking at. I've managed to do this, but I don't think it is very good! It would be difficult, for example, to do a cross-hatch using this technique. allele.plot - function(x, threshold=NULL, hatch.col='black', hatch.border=hatch.col, lwd=par('lwd'),...) { h - hist(x, breaks=max(x), plot=F) d - density(x, bw=1) plot(d, lwd=lwd, ...) if (!is.null(threshold)) { d.t - d$xthreshold d.x - d$x[d.t] d.y - d$y[d.t] d.l - length(d.x) # draw all but first line of hatch for (i in 2:d.l) { lines(c(d.x[i],d.x[i]),c(0,d.y[i]), col=hatch.col,lwd=1) } # draw first line in hatch border colour lines(c(d.x[1],d.x[1]),c(0,d.y[1]), col=hatch.border,lwd=lwd) # and now re-draw density plot lines lines(d, lwd=lwd) } } # some pretend data s8 = rnorm(100, 15, 5) threshold = 19 # an arbitrary cut-off allele.plot(s8, threshold, hatch.col='grey',hatch.border='black') Is there a better way? As always, I'm sure there's a one-liner rather than my crude technique! Best wishes, Mark -- Dr. Mark Wardle Clinical research fellow and specialist registrar, Neurology University Hospital Wales and Cardiff University, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density estimation graphs
On Mar 15, 2007, at 12:37 PM, Mark Wardle wrote: Dear all, I'm struggling with a plot and would value any help! I'm attempting to highlight a histogram and density plot to show a proportion of cases above a threshold value. I wanted to cross- hatch the area below the density curve. The breaks and bandwidth are deliberate integer values because of the type of data I'm looking at. I've managed to do this, but I don't think it is very good! It would be difficult, for example, to do a cross-hatch using this technique. Don't know about a cross-hatch, but in general I use polygon for highlighting areas like that: allele.plot - function(x, threshold=NULL, hatch.col='black', hatch.border=hatch.col, lwd=par('lwd'),...) { h - hist(x, breaks=max(x), plot=F) d - density(x, bw=1) plot(d, lwd=lwd, ...) if (!is.null(threshold)) { d.t - d$xthreshold d.x - d$x[d.t] d.y - d$y[d.t] polygon(c(d.x[1],d.x,d.x[1]),c(0,d.y,0), col=hatch.col,lwd=1) } } # some pretend data s8 = rnorm(100, 15, 5) threshold = 19 # an arbitrary cut-off allele.plot(s8, threshold, hatch.col='grey',hatch.border='black') Perhaps this can help a bit. Btw, what was d.l for? Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density Estimation
On Thu, Jun 08, 2006 at 08:31:26PM +0200, Pedro Ramirez wrote: In mathematical terms the optimal bandwith for density estimation decreases at rate n^{-1/5}, while the one for distribution function decreases at rate n^{-1/3}, if n is the sample size. In practical terms, one must choose an appreciably smaller bandwidth in the second case than in the first one. Thanks a lot for your remark! I was not aware of the fact that the optimal bandwidths for density and distribution do not decrease at the same rate. Besides the computational aspect, there is a statistical one: the optimal choice of bandwidth for estimating the density function is not optimal (and possibly not even jsut sensible) for estimating the distribution function, and the stated problem is equivalent to estimation of the distribution function. The given interval 0x3 was only an example, in fact I would like to estimate the probability for intervals such as 0=x1 , 1=x2 , 2=x3 , 3=x4 , and compare it with the estimates of a corresponding histogram. In this case the stated problem is not anymore equivalent to the estimation of the distribution function. What do you think, can why not? the probabilities you are interested in are of the form F(1)-F(0), F(2)-F(1), and so on where F(.) if the cumulative distribution function (and it must be continuous, since its derivative exists). I go a ahead in this case with the optimal bandwidth for the density? Thanks a lot for your help! no best wishes, Adelchi Best wishes Pedro best wishes, Adelchi PR PR PR -- PR Gregory (Greg) L. Snow Ph.D. PR Statistical Data Center PR Intermountain Healthcare PR [EMAIL PROTECTED] PR (801) 408-8111 PR PR PR -Original Message- PR From: [EMAIL PROTECTED] PR [mailto:[EMAIL PROTECTED] On Behalf Of Pedro PR Ramirez Sent: Wednesday, June 07, 2006 11:00 AM PR To: r-help@stat.math.ethz.ch PR Subject: [R] Density Estimation PR PR Dear R-list, PR PR I have made a simple kernel density estimation by PR PR x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) PR kde - density(x,n=100) PR PR Now I would like to know the estimated probability that a new PR observation falls into the interval 0x3. PR PR How can I integrate over the corresponding interval? PR In several R-packages for kernel density estimation I did not PR found a corresponding function. I could apply Simpson's Rule for PR integrating, but perhaps somebody knows a better solution. PR PR Thanks a lot for help! PR PR Pedro PR PR _ PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR _ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/ -- Adelchi Azzalini [EMAIL PROTECTED] Dipart.Scienze Statistiche, Università di Padova, Italia tel. +39 049 8274147, http://azzalini.stat.unipd.it/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
On Wed, 07 Jun 2006 19:54:32 +0200, Pedro Ramirez wrote: PR Not a direct answer to your question, but if you use a logspline PR density estimate rather than a kernal density estimate then the PR logspline package will help you and it has built in functions for PR dlogspline, qlogspline, and plogspline that do the integrals for PR you. PR PR If you want to stick with the KDE, then you could find the area PR under each of the kernals for the range you are interested in PR (need to work out the standard deviation used from the bandwidth, PR then use pnorm for the default gaussian kernal), then just sum PR the individual areas. PR PR Hope this helps, PR PR Thanks a lot for your quick help! I think I will follow your first PR PR suggestion (logspline PR density estimation) instead of summing over the kernel areas PR because at the boundaries of the range truncated kernel areas can PR occur, so I think it is easier to do it with logsplines. Thanks PR again for your help!! PR PR Pedro PR PR Besides the computational aspect, there is a statistical one: the optimal choice of bandwidth for estimating the density function is not optimal (and possibly not even jsut sensible) for estimating the distribution function, and the stated problem is equivalent to estimation of the distribution function. In mathematical terms the optimal bandwith for density estimation decreases at rate n^{-1/5}, while the one for distribution function decreases at rate n^{-1/3}, if n is the sample size. In practical terms, one must choose an appreciably smaller bandwidth in the second case than in the first one. best wishes, Adelchi PR PR PR -- PR Gregory (Greg) L. Snow Ph.D. PR Statistical Data Center PR Intermountain Healthcare PR [EMAIL PROTECTED] PR (801) 408-8111 PR PR PR -Original Message- PR From: [EMAIL PROTECTED] PR [mailto:[EMAIL PROTECTED] On Behalf Of Pedro PR Ramirez Sent: Wednesday, June 07, 2006 11:00 AM PR To: r-help@stat.math.ethz.ch PR Subject: [R] Density Estimation PR PR Dear R-list, PR PR I have made a simple kernel density estimation by PR PR x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) PR kde - density(x,n=100) PR PR Now I would like to know the estimated probability that a new PR observation falls into the interval 0x3. PR PR How can I integrate over the corresponding interval? PR In several R-packages for kernel density estimation I did not PR found a corresponding function. I could apply Simpson's Rule for PR integrating, but perhaps somebody knows a better solution. PR PR Thanks a lot for help! PR PR Pedro PR PR _ PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
In mathematical terms the optimal bandwith for density estimation decreases at rate n^{-1/5}, while the one for distribution function decreases at rate n^{-1/3}, if n is the sample size. In practical terms, one must choose an appreciably smaller bandwidth in the second case than in the first one. Thanks a lot for your remark! I was not aware of the fact that the optimal bandwidths for density and distribution do not decrease at the same rate. Besides the computational aspect, there is a statistical one: the optimal choice of bandwidth for estimating the density function is not optimal (and possibly not even jsut sensible) for estimating the distribution function, and the stated problem is equivalent to estimation of the distribution function. The given interval 0x3 was only an example, in fact I would like to estimate the probability for intervals such as 0=x1 , 1=x2 , 2=x3 , 3=x4 , and compare it with the estimates of a corresponding histogram. In this case the stated problem is not anymore equivalent to the estimation of the distribution function. What do you think, can I go a ahead in this case with the optimal bandwidth for the density? Thanks a lot for your help! Best wishes Pedro best wishes, Adelchi PR PR PR -- PR Gregory (Greg) L. Snow Ph.D. PR Statistical Data Center PR Intermountain Healthcare PR [EMAIL PROTECTED] PR (801) 408-8111 PR PR PR -Original Message- PR From: [EMAIL PROTECTED] PR [mailto:[EMAIL PROTECTED] On Behalf Of Pedro PR Ramirez Sent: Wednesday, June 07, 2006 11:00 AM PR To: r-help@stat.math.ethz.ch PR Subject: [R] Density Estimation PR PR Dear R-list, PR PR I have made a simple kernel density estimation by PR PR x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) PR kde - density(x,n=100) PR PR Now I would like to know the estimated probability that a new PR observation falls into the interval 0x3. PR PR How can I integrate over the corresponding interval? PR In several R-packages for kernel density estimation I did not PR found a corresponding function. I could apply Simpson's Rule for PR integrating, but perhaps somebody knows a better solution. PR PR Thanks a lot for help! PR PR Pedro PR PR _ PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR PR PR __ PR R-help@stat.math.ethz.ch mailing list PR https://stat.ethz.ch/mailman/listinfo/r-help PR PLEASE do read the posting guide! PR http://www.R-project.org/posting-guide.html PR __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Dear R-list, I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Not a direct answer to your question, but if you use a logspline density estimate rather than a kernal density estimate then the logspline package will help you and it has built in functions for dlogspline, qlogspline, and plogspline that do the integrals for you. If you want to stick with the KDE, then you could find the area under each of the kernals for the range you are interested in (need to work out the standard deviation used from the bandwidth, then use pnorm for the default gaussian kernal), then just sum the individual areas. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro Ramirez Sent: Wednesday, June 07, 2006 11:00 AM To: r-help@stat.math.ethz.ch Subject: [R] Density Estimation Dear R-list, I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Pedro wrote: I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. One possibility is to use splinefun(): spiffy - splinefun(kde$x,kde$y) integrate(spiffy,0,3) 0.2353400 with absolute error 2e-09 cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Not a direct answer to your question, but if you use a logspline density estimate rather than a kernal density estimate then the logspline package will help you and it has built in functions for dlogspline, qlogspline, and plogspline that do the integrals for you. If you want to stick with the KDE, then you could find the area under each of the kernals for the range you are interested in (need to work out the standard deviation used from the bandwidth, then use pnorm for the default gaussian kernal), then just sum the individual areas. Hope this helps, Thanks a lot for your quick help! I think I will follow your first suggestion (logspline density estimation) instead of summing over the kernel areas because at the boundaries of the range truncated kernel areas can occur, so I think it is easier to do it with logsplines. Thanks again for your help!! Pedro -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro Ramirez Sent: Wednesday, June 07, 2006 11:00 AM To: r-help@stat.math.ethz.ch Subject: [R] Density Estimation Dear R-list, I have made a simple kernel density estimation by x - c(2,1,3,2,3,0,4,5,10,11,12,11,10) kde - density(x,n=100) Now I would like to know the estimated probability that a new observation falls into the interval 0x3. How can I integrate over the corresponding interval? In several R-packages for kernel density estimation I did not found a corresponding function. I could apply Simpson's Rule for integrating, but perhaps somebody knows a better solution. Thanks a lot for help! Pedro _ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Hallo I am trying to use the package LocFit to follow the example given in an Introductory note of C Loader concerning density estimation. It involves the geyser dataset (107 observations on durations, inlc in the package). I have tried the following (using the latest version of R): fit.of - locfit(~geyser,flim=c(1,6),alpha=c(0.15,0.9)) plot(fit.of,get.data=T,mpv=200) This produces a plot (after several warnings). My question is: how can I get the plot to cover the range: 1 - 6 ? for durations. The plot covers the observed data range only. It appears there is a problem with flim=c(1,6) flim is not actually correct, and consequently c(1,6) is not used correctly. I have also tried to use xlim=c(1,6), but without success. I need some help on this please. Thanks Jacob Jacob L van Wyk Department of Statistics University of Johannesburg APK P O Box 524 Auckland Park 2006 South Africa Tel: +27-11-489-3080 Fax: +27-11-489-2832 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density estimation with monotonic constaints
There are multiple functions for density estimation in R, but I don't know of any for estimating a monotonically decreasing density. If you haven't already, I encourage you to use, e.g., the help.search and RSiteSearch functions to find and explore their capabililties. Why do you ask? Are you interested in analyzing particular data set(s) or are you doing research on density estimation? If it were my problem, I might just try something like the function density and then evaluate the results to find out if it satisfied my constraints. If it did and if I were only interested in that data set, I'd be done. If not, I'd increase the smoothing until I got something that was monotonic. If I wanted a more general method, I might wrap a call to a function like density inside another function, and automatically adjust the smoothing until it satisfied some optimality criterion I might devise. If I didn't get what I wanted doing that, I might list, e.g., the density function and walk through it line by line until I figured out what I needed to change to get what I wanted. I just listed density and found that it consists solely of a call to UseMethod. To get beyond that, I tried 'methods(density), which told me there was only one method called density.default. Then requesting density.default gave me the code for that. Another tip: I find debug extrememly helpeful for walking through code like this. I suspect this will not solve your problem, but I hope at least it helps. If you'd like further assistance from this listserve, please submit another post. However, I encourage you first to PLEASE do read the posting guide! www.R-project.org/posting-guide.html. Doing so might increase your chances for getting useful information more quickly. spencer graves Debayan Datta wrote: Hi All, I have a sample x={x1,x2,..,xn} fom a distribution with density f. I wish to estimate the density. I know a priori that the density is monotonically decreasing. Is there a way to do this in R? Thanks Debayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density estimation with monotonic constaints
Hi All, I have a sample x={x1,x2,..,xn} fom a distribution with density f. I wish to estimate the density. I know a priori that the density is monotonically decreasing. Is there a way to do this in R? Thanks Debayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation
Hi, I have been looking for a method of estimating a parametric model from the output (x, y) from the R function density. Below is my thought and wonder if it looks OK. Suppose that we build a single gaussian model for each input data point x (x is the mean), the overal model may be a sum of these gaussian models built on each x, i.e. P(y) = \sum_x P(y|x, \sigma), where y is any new data point. Is this right? Any normalization is applied? Thanks in advance for any suggestion that you may offer me! Best regards, Hui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation
hello sorry for my english I would like estimate density for multivariate variable,( f(x,y) , f(x,y ,z) for example) ; for calculate mutual information how is posible with R? thanks Bernard Bernard Palagos Unité Mixte de Recherche Cemagref - Agro.M - CIRAD Information et Technologie pour les Agro-Procédés Cemagref - BP 5095 34033 MONTPELLIER Cedex 1 France http://www.montpellier.cemagref.fr/teap/default.htm Tel: 04 67 04 63 13 Fax: 04 67 04 37 82 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation with weighted sample
Dear all I would like to perform density estimation with a weighted sample (output of an Importance Sampling procedure) in R. Could anybody give me an advice on what function to use (in which package)? Thanks a lot, Lorenzo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density estimation with weighted sample
On Thu, 7 Apr 2005, Tomassini, Lorenzo wrote: I would like to perform density estimation with a weighted sample (output of an Importance Sampling procedure) in R. Could anybody give me an advice on what function to use (in which package)? This could mean 1) You have a sample with weights w, so `w=4' means `I have 4 of those'. 2) You have a sample from a density proportional to w(x)f(x) and want to estimate f. Your title suggests the first, your comment the second. If it is the second, use any package (even density() in R) to estimate the density g of the sampled distribution, for ghat/w and rescale to unit area. If you know a lot about w (e.g. in stereology) there are specialized methods which are better. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] density estimation: compute sum(value * probability) for
On 13-Nov-04 bogdan romocea wrote: Dear R users, However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? What you're missing is the dx! A density estimation estimates the probability density function g(x) such that int[g(x)*dx] = 1, and R's 'density' function returns estimated values of g at a discrete set of points. An integral can be approximated by a discrete summation of the form sum(g(x.i)*delta.x You can recover the set of x-values at which the density is estimated, and hence the implicit value of delta.x, from the returned density. Example: X-rnorm(1000) f-density(X) x-f$x delta.x-x[2]-x[1] g-f$y sum(g*delta.x) [1] 1.000976 Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 14-Nov-04 Time: 08:50:53 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] density estimation: compute sum(value * probability) for given distribution
First thing you probably should realize is that density is _not_ probability. A probability density function _integrates_ to one, not _sum_ to one. If X is an absolutely continuous RV with density f, then Pr(X=x)=0 for all x, and Pr(a X b) = \int_a^b f(x) dx. sum x*Pr(X=x) (over all possible values of x) for a discrete distribution is just the expectation, or mean, of the distribution. The expectation for a continuous distribution is \int x f(x) dx, where the integral is over the support of f. This is all elementary math stat that you can find in any textbook. Could you tell us exactly what you are trying to compute, or why you're computing it? HTH, Andy From: bogdan romocea Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] density estimation: compute sum(value * probability) for given distribution
Andy, Thanks a lot for the clarifications. I was running a simulation a number of times and trying to come up with a number to summarize the results. And, I failed to realize from the beginning that what I was trying to compute was just the mean. Regards, b. --- Liaw, Andy [EMAIL PROTECTED] wrote: First thing you probably should realize is that density is _not_ probability. A probability density function _integrates_ to one, not _sum_ to one. If X is an absolutely continuous RV with density f, then Pr(X=x)=0 for all x, and Pr(a X b) = \int_a^b f(x) dx. sum x*Pr(X=x) (over all possible values of x) for a discrete distribution is just the expectation, or mean, of the distribution. The expectation for a continuous distribution is \int x f(x) dx, where the integral is over the support of f. This is all elementary math stat that you can find in any textbook. Could you tell us exactly what you are trying to compute, or why you're computing it? HTH, Andy From: bogdan romocea Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density estimation: compute sum(value * probability) for given distribution
bogdan romocea wrote: Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? I don't get the point. You are estimating using a gaussian kernel. Hint: What's the probability to get x=0 for a N(0,1) distribution? So sum(values*probabilities) is zero! The probabilities produced by the density function sum to only 26%: and could also sum to, e.g., 783453.9, depending on the number of observations and the estimated parameters of the desnity ... sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? No. den$x is a point where the density function is equal to den$y, but den$y is not the probability to get den$x (you know, the stuff with intervals)! I fear you are mixing theory from discrete with continuous distributions. Uwe Ligges Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation: compute sum(value * probability) for given distribution
Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Dear Brian, I can suggest you to use density() function to get an estimate of the pdf you're finding (I believe it's unknown). Then you can plot the point you got by density() using plot(). In this way you have a graphic representation of you unknown pdf. According its shape and helping by the graphic you could try to understand what kind of pdf it would be (normal, gamma, weibul, etc.) After you can estimate parameters of pdf using your data with LS or ML methods. Then you can calculate the goodness of fit for each model of pdf and use the best one. I hope I get you a little help. Cordially Vito Ricci [EMAIL PROTECTED] wrote: Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. = Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml ___ http://it.seriea.fantasysports.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Try fitting it with a Johnson function -- see SuppDists. If you can fit it you will then be able to use the functions in SuppDists just as you can for any other distribution supported by R. Brian Mac Namee wrote: Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Bob Wheeler --- http://www.bobwheeler.com/ ECHIP, Inc. --- Randomness comes in bunches. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Density Estimation
Hi! The function density returns you a object of class density. This object has an x and an y attribute which you can access by x y, Hi! Use approx and runif. eg.: dd-density(rnorm(100,3,5)) plot(dd) Using the function ?approx you can compute the density value for any x. #the x is a dummy here. mydist-function(x,dd) { while(1) { tmp - runif(1,min=min(dd$x),max=max(dd$x)) lev - approx(dd$x,dd$y,tmp)$y if(runif(1,c(0,1)) = lev) { return(tmp) } } } x - 0 mydist(x,dd) res-rep(0,500) res-sapply(res,mydist,dd) lines(density(res),col=2) /E. *** REPLY SEPARATOR *** On 9/15/2004 at 12:36 PM Brian Mac Namee wrote: Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Dipl. bio-chem. Witold Eryk Wolski @ MPI-Moleculare Genetic Ihnestrasse 63-73 14195 Berlin'v' tel: 0049-30-83875219/ \ mail: [EMAIL PROTECTED]---W-Whttp://www.molgen.mpg.de/~wolski [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Density Estimation
On 15-Sep-04 Brian Mac Namee wrote: Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? It's not clear what you're really after, but it looks as though you may be wanting to sample from the distribution estimated by 'density'. A possible approach, which you could refine, is exemplified by x-rnorm(1000) d-density(x,n=4096) y-sample(d$x,size=1000,prob=d$y) Check performance with hist(y) Looks OK to me! See ?density and ?sample. On an alternative interpretation, perhaps you want to first estimate the density based on data you already have, and then when you have got further data (but these would then be seen and not unseen) come to a judgement about whether these new points are compatible with coming from the distributikon you have estimated. A possible approach to this question (again susceptible to refinement) would be as follows. 1. Use a fine-grained grid for 'density', i.e. a large value for n. 2. Replace each of the points in the new data by the nearest point in this grid. Call these values z1, z2, ... , zk corresponding to index values i1, i2, ... , ik in d$x. 3. Evaluate the probability P(z1,...,zk) from the density as the product of d$y[i] where i-c(i1,...,ik). Better still, evaluated the logarithm of this. Call the result L. 4. Now simulate a large number of draws of k values from d on the lines of sample(d$x,size=k,prob=d$y) as above, and evaluate L for each of these. Where is the value of L from (3) situated in the distribution of these values of L from (4)? If (say) only 1 per cent of the simulated values of L from d are less than the value of L from (3), then you have a basis for a test that your new data did not come from the distribution you have estimated from your old data, in that the new data are from the low-density part of the estimated distribution. There are of course alternative ways to view this question. The value of k is relevant. In particular, if k is small (say 3 or 4) then the suggestion in (4) is probably the best way to approach it. However, if k is large then you can use a test on the lines of Kolmogorov-Smirnov with the reference distribution estimated as the cumulative distribution of d$y and the distribution being tested as the empirical cumulative distribution of your new data. Even sharper focus is available if you are in a position to make a paramatric model for your data, but your description does not suggest that this is the case. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 15-Sep-04 Time: 15:07:33 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Density Estimation
help.search(kernel density) reports KernSec(GenKern)Univariate kernel density estimate KernSur(GenKern)Bivariate kernel density estimation bkde(KernSmooth)Compute a Binned Kernel Density Estimate bkde2D(KernSmooth) Compute a 2D Binned Kernel Density Estimate dpik(KernSmooth)Select a Bandwidth for Kernel Density Estimation kde2d(MASS) Two-Dimensional Kernel Density Estimation amongst others, and package sm also has a user-friendly selection. So, apart from point out alternatives I wanted to point out how easy it was to find the information originally requested. On Sat, 10 Apr 2004, Ko-Kang Kevin Wang wrote: -Original Message- From: [EMAIL PROTECTED] Dear Sir/Madam; Would you please tell me what is the command that allows the estimation of the Kernel Density for some data. Thanks, ?density -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Density Estimation
Dear Sir/Madam; Would you please tell me what is the command that allows the estimation of the Kernel Density for some data. Thanks, Thami Rachidi [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Density Estimation
-Original Message- From: [EMAIL PROTECTED] Dear Sir/Madam; Would you please tell me what is the command that allows the estimation of the Kernel Density for some data. Thanks, ?density __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html