On 03-Nov-05 Kilian Hagemann wrote: > Hi there, > > I'm new to R but I thought this is the most likely place I > could get advice or hints w.r.t the following problem: > > I have a series of measurements xi with associated uncertainties dxi. > I would like to construct the probability density histogram of this > data where each density estimate has an associated error that is > derived from the dxi. > In other words, for large dxi the histogram should also display > large uncertainties and vice versa. I need this for a curve fitting > algorithm. > > I have seen many crude ways of working out the error in each bin based > on the bin count alone, but that's obviously independent of the dxi > and thus not what I'm after. > > So, > > 1) Is there an R package that can do this (there's nothing in the > refence of > 2.1.1)? If so, what algorithm does it use? > > 2) Could anybody please point me in the right direction (papers, books, > websites etc.) > > Thanks, > > -- > Kilian Hagemann
I don't know about an R package that would deal with this directly, but I can think of an aproach, not difficult to implement in R, which may be helpful. I'm going to assume (at any rate for the time being), that you are interested in a "per-bin" uncertainty, i.e. that you want to be able to to answer "For each bin, what is the uncertainty in the count for this bin, regardless of any other bins?" I.e. you are ignoring the fact that there is correlation between bins (what goes into one bin can not go into another). 1. Say you have N observations (i = 1:N). Draw a preliminary histogram, and from this decide on a good set of fixed breaks. 2. Extend this list by a few bins on either side (you may have to return to this point, depending on the outcome of later stages). Say this gives you K bins (j = 1:K). 3. For each data-point xi, with associated dxi, and for each bin j, use this to compute the "probability" pij that a point with mean xi and "error" dxi should fall into bin j. This might be based on something as naive as integrating a Normal distribution with mean xi and SD dxi over the range of the bin j. 4. You then have an array P = p[i,j], say, where in row i you have the computed probabilities for bins 1:K 5. Now: for bin j, you have an N-column of pij values. The expected number of the N which might "really" be in bin j is then Ej = sum(P[,j]) and its variance (assuming that the "errors" in the xi are independent of each other) is Vj = sum(P[,j]*(1 - P[,j])) 6. So now you have the "bin that might have been", with expected value Ej and standard deviation Sj = sqrt(Vj). Now draw a "histogram" (you can use 'lines()' for this in R) with "bin heights" Ej, and "errors bars" +/- Sj. It is at this stage that you may have to go back to stage 2. In order to be sure that you will not overlook xi values that spill outside the range of the bins you chose in stage 2, you need to verify that the bins you are using extend beyond the range of the original data, and that the two end-bins have negligible E1 and EK. 7. NOTE that the Ej will in general be different from the counts in bin j from the original data. This is due to "overspill": if you have an original bin with a small count, which has next to it a bin with a large count, the uncertainty about whether some of the latter should really be in the former will contribute positively to the bin with the small count, by a larger amount than the bin with the small count will contribute to the bin with the large count. As you can see, this will have an effect of somewhat "flattening" the histogram, and of smoothing irregular variation from bin to bin. This is just an outline of a possible approach, which you may be able to develop to better suit your purposes if they are different from what I've been assuming. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 03-Nov-05 Time: 15:45:35 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html