On Sat, 28 Feb 2004, Fulvio Copex wrote: > Hello everyone, > I'd like to have suggestions about a common basic statistical approach, > hope to be useful also for other R beginners.
Isn't this about statistics beginners? You seem to be assuming univariate data and a very specific distribution form. There are standard procedures, but yours seems erroneous. > When you first get some data (i.e length of river) you may want to look at its > distribution. > Then you probably want to find which law follows this distribution, > and to test the goodness of the fit. > > For doing this simple analysis I am writing some code schematically represented > below: > 1)calculate the histogram: > myhist<-hist(myData, breaks=....) > Question: how to calculate the standard deviation of each class of the hist? I have > not seen it in the output of hist The counts are jointly multinomial. Approximately each count is Poisson-distributed, so the standard deviation is approximately the square root of the count. (You seem to be assuming that at 3).) > 2)Looking at the graph it seems to follow a linear model: What does that mean? The histogram is a density estimate, and theoretically a pdf cannot be linear unless you restrict the range, and even then you would want to constrain the estimation. You want to the fit the pdf to the actual data, not grouped data, if you can. > I plot the points: points(myhist$mids,myhist$counts) > Question: How to plot also the weights (vertical segments)? > 3)I Calculate the linear equation using "lm" (in the case of linear > model) knowing the weights computed in points 2). > 4)To test the goodness of the fit, a simply way is to use the reduced > chi squared test which I haven't found on the base package. But it is > simple to calculate like this > chisq.reduced<-(1/N)*sum((e-o)/w^2) > where e=expected values from fit > o=observed values > w=weights That is not a correct test, as the counts are not independent (they must sum to one). I presume you have a typo: it is (o-e)^2/e, where I think e and w are probably the same thing. You can use chisq.test to do the correct test. > 5) Conclusion: If my chisq is lower than 1 I can conclude the model > approximate well my data distribution. No, you need to refer the correct statistic to a proper chi-squared distribution. If you fit parameters of the distribution, the theory assumes that you fitted them by maximum-likelihood (to the grouped data). > Is it a good analysys of the problem? No. > Any answers to my questions or a better standard procedure (or package) > where this work can be done easily, for the basic kind of distribution > types? ?fitdistr (in MASS) for how to fit univariate distributions, and ?density for how to find non-parametric density estimates. ?chisq.test for Chi-squared test of goodness of fit. library(stepfun); ?ecdf for other ways to examine data, and ?ks.test for other fit statistics which may be more appropriate for continuous univariate measurements. > Any kind of answer should be appreciated, including documentations or > tutorial. This sort of thing is covered in most introductory statistics classes and texts. I think you need to seek the advice of a local statistical consultant. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
