For the record, some of these claims are untrue: > df(0, 2, 2) [1] 1 > df(0, 1.3, 2) [1] Inf > x <- 1e-170 > pbeta(x, x, x) [1] 0.5
qnbinom(1e-10,1e3,1e-7,TRUE,FALSE) is an error, and so is qnbinom(1E-300,0.000002,10000000000) Here we can guess at what you meant, maybe correctly. There were comments in the source code about needing a better search, and I have recently implemented one. So > qnbinom(1e-10, 1e3, 1e-7) # instant [1] 8117986721 > qnbinom(0.5, 10000000000, 0.000000002) [1] 5e+18 > qnbinom(1e-300, 10000000000, 0.000002) [1] 4.998138e+15 seem to be solved. There were two problems with dbeta, one easily overcome (f underflows) the other fundamental to the way dbinom_raw is computed (n*p can underflow). What I cannot see is why a formula which worked correctly in this region was replaced by one that did not. It is precisely in order not to generate such errors that I used TOMS 708 only in the area where the existing algorithm is problematic. (It may be better elswehere, but I did not have the time to do the requisite analysis. It seems neither do some other people: I would prefer not to spend the time to clear up after such unneeded changes.) On pt, thank you for the report. pt(x, df=1) is not interesting for |x| > 1e150, but it is for smaller values of df and those were underflowing. It is easy to use an asymptotic formula to regain the accuracy. R was potentially generating reports of lack of convergence and loss of accuracy in quite a number of its algorithms, but for reasons unknown to me these were being buried (ML_ERROR did nothing, and has not for a very long time). It's a matter of debate whether in some of these it would be better to return NaN as well, but warnings should have been generated (and now are). As for `panic' (your word: why is it 'panic' to submit a correct bug report?), a major platform returning +Inf for a log probability is very bad news, as is another failing a regression test by getting NaN for a probability which is 0.5. On Sat, 28 Jan 2006 [EMAIL PROTECTED] wrote: > On 23/02/05 I suggested that given R had included TOMS 708 to correct for t= > he=20 > poor performance of pbeta, TOMS 654 should be included to fix all the pgamm= > a=20 > problems. I was slightly surprised to find Morten's code had been included= > =20 > instead 2 days later. I noticed but did not worry that the reference to me = > had=20 > been removed.=20 > > The derivation of the asymptotic expansion for the gamma distribution used = > by=20 > Morten can be found at http://members.aol.com/iandjmsmith/PoissonApprox.htm= > =20 > It is fairly easy to understand and find error bounds for and hence include= > =20 > sensibly in an algorithm to calculate pgamma. > > The basis and accuracy of the some of the algorithms I use is discussed in= > =20 > http://members.aol.com/iandjmsmith/Accuracy.htm In this case, the absolute = > error=20 > in the log of the probability gives a good indication of the accuracy of yo= > ur=20 > answer. In the least extreme example you consider=20 > (pgamma(0.9*1e25,1e25,log=3DT)the absolute error would be about 5360515 and= > if you exponentiated the result=20 > the relative error would be about 10 to the power 2328042. So the answer yo= > u=20 > wish to calculate is K times 10 to the power -2.32804237034994E+22, where K= > is=20 > somewhere between 10 to the power plus or minus 2328042. In other words whe= > n=20 > you make the changes to correct this problem, your calculation will still= > =20 > return values with no real meaning but at least users might be aware of thi= > s which=20 > would be no bad thing! For me this answer is possibly so meaningless that N= > an=20 > would be preferable. > > I did mention to Morten that I had updated my code but I believed that for= > =20 > Gnumeric he was quite satisfied with what he had. If you look at the VBA co= > de at=20 > http://members.aol.com/iandjmsmith/Examples.txt you can see the changes I= > =20 > made to stop the overflow problems you seem to be worried about. My code fo= > r the=20 > pdf of the gamma distribution still fails for shape parameters > 2e307 due = > to=20 > multiplication of the shap parameter by 2pi. The code for dgamma will have = > the=20 > same problem unless it is hidden by use of an 80 or more bit floating point= > =20 > processor. The code for the asymptotic expansion for the gamma distribution= > =20 > seems to be fine for any number, excluding silly ones like Nan and Inf. Ind= > eed it=20 > takes the difference from the mean as a parameter and if you supply an=20 > accurate value you get a sensible answer as mentioned in=20 > http://members.aol.com/iandjmsmith/Accuracy.htm > > I do not share your apparent sense of panic on this matter. I have no=20 > problems with error signals like NaNs because it is obvious to the user tha= > t things=20 > have gone wrong. Inaccurate answers when the user has no reason to expect t= > hem=20 > are usually far more difficult to spot and in many cases the results are ju= > st=20 > believed. That for me is a serious problem. I think you will find that the= > =20 > pgamma code of 2.0.0 did not work for small shape parameters (similar to th= > e=20 > problems exhibited by pbeta still for small parameters see PR#2894), was=20 > inaccurate for large shape parameters (> 1e5) when it resorted to the norma= > l=20 > approximation and was pretty slow in between. Indeed, the normal approximat= > ion was the=20 > cause of PR#7307. > > > I don't understand your comments about=20 > "pt_ =3D -x * (log(1 + (lambda-x)/x) - (lambda-x)/x) =3D -x * log((lambda-x= > )/x) -=20 > (lambda-x)=20 > and naively assumes that this is small enough to use a power series expansi= > on=20 > in 1/x with coefficients as powers of pt_. To make matters worse, consider = > =E2=80=A6" > In the example you go on to discuss, |(lambda-x)/x| is 0.1 and I don't thin= > k=20 > it can be bigger than 0.2. Calculating log(1+x)-x is done several ways. If = > |x|=20 > < .01 it is evaluated by a power series, if x < -0.8 or x > 1 it uses=20 > log1p(1+x)-x and for other values it uses a continued fraction which essent= > ially=20 > evaluates more of the same series used when |x| < .01. > > Your comments about replacing logspace_add with logspace_sub with simpler= > =20 > code which works at first sight to be a very sensible improvement. However,= > I=20 > would be a bit nervous that lnd-lnp could be very large and the exp of it c= > ould=20 > return infinity. I'm sure this can be accounted for in the code and lnp += > =20 > log1p(f*exp(lnd-lnp))evaluated as lnp or log(f)+lnd accordingly. > > I am not responsible for the code for calculating the logs of probabilities= > =20 > but I seem to remember that the extremely poor performance of the algorithm= > s in=20 > R2.0.0 with logged probabilities was one of the reasons Morten became=20 > interested in changing the pgamma code (see PR#7307). I have had a quick lo= > ok and=20 > once the corections mentioned above are made it should be giving nonsense a= > nswers=20 > with no difficulty. > > > Unfortunately there are still a few examples of sloppy coding and accuracy= > =20 > errors remaining in R. > > The non-central distribution functions have horrible 1- cancellation errors= > =20 > associated with them (see PR#7099) and separate code is required for the tw= > o=20 > tails of the distributions to get round the problem. > > The fix for PR#8251 is a kludge and just moves the inaccuracies to examples= > =20 > with higher non-centrality parameters. > > pt(x,1) will overflow or return 0 for values < -2e154 for 64-bit=20 > implementations. pcauchy works but I believe the pt function is also suppos= > ed to work for=20 > non integral degrees of freedom so making it work one degree of freedom via= > =20 > pcauchy is hardly much use. > > qnbinom(1e-10,1e3,1e-7,TRUE,FALSE) is slow and by varying the parameters,= > =20 > qnbinom can be made very slow indeed. I do not think there is anything wron= > g with=20 > the Cornish-Fisher expansion. It just seems that it is not always very good= > =20 > for the Negative Binomial distribution. In the example above, the initial= > =20 > approximation is out by 2e6. > > A slightly different problem which can cause qnbinom and qbinom to go into= > =20 > infinite loops is when the q-value is too big. For example=20 > qnbinom(1E-300,0.000002,10000000000) should return 4.99813787561159E+15 app= > rox but the code works=20 > with values where one of the statements y :=3D y +1 or y =3D y - 1; is exec= > uted but=20 > does not alter the value of y. > > df(0,2,2,FALSE) should be 1 not 0 > df(0,df,2,FALSE) should be infinity for df < 2 not 0 > dbeta(1e-162,1e-162,1e-162,FALSE) should be 0.5 not 0 > > Presumably R also has similar problems with the pbeta function. As I recall= > =20 > the TOMS 708 code was pretty much included without edits and therefore didn= > 't=20 > calculate logs of probabilities except by calculating the probability and t= > hen=20 > logging it. I assumed this was why it was not used for small shape paramete= > rs=20 > where the current code does not work, although it did not seem logical to m= > e.=20 > Of course, my memory is not what it was but if that is the case and there a= > re=20 > problems with modifying the TOMS code, you could try an asymptotic expansio= > n=20 > based on http://members.aol.com/iandjmsmith/BinomialApprox.htm > > This response has been very rushed. I do not write well when I have plenty = > of=20 > time and I felt I had so many different things to say so I apologise if it = > is=20 > all a bit of a jumble.=20 > > Ian Smith > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel