On 02-Dec-04 Siegfried Gonzi wrote: > Hello: > > Oh yes I know it isn't so much related to R, but I gather > there are a lot of statisticians reading the mailing list. > > My boss repeatedly tried to explain me the following. > > == > Lets assume you have got daily measurements of a variable > in natural sciences. It turned out that the aformentioned > daily measurements follow a log-normal distribution when > considered over the course of a year. > Okay. He also tried to explain me that the monthly means > (based on the daily measurements) must follow a log-normal > distribution too then over the course of a year. > == > > I somehow get his explanation.
Hmm, perhaps you should think again! If X and Y have log-normal distributions (mathematically exactly), then (X+Y)/2 does not (mathematically) have a log-normal distribution -- still less the arithmetic mean of some 30 such variables. So one wonders what the basis of his "explanation" was. However, the conclusion would hold for the *geometric* mean of the variables. X has a log-normal distribution if log(X) has a normal distribution. So let X, Y, ... be log-normal. The geometric mean is exp((log(X)+log(Y)+...)/n), and since log(X), log(Y), ... are normal, so is (log(X)+log(Y)+...)/n, and so the geometric mean is log-normal. > But I have measurements which are log-normal distributed > when evaluated on a daily basis over the course of a year > but they are close to a Gaussian distribution when considered > under the light of monthly means over the course of a year. > > Is such a latter case feasible. And if not why. This is, broadly, to be expected. If X1, X2, ... are independent and with similar means and variances, then regardless of their precise distributions the distribution of (X1+X2+...+Xn)/n approaches the normal distribution as n->infinity ("Central Limit Theorem"). How rapidly this happens depends on how much the distributions of X1,... differ from a normal distribution. One feature which can cause the approach to "normal" to be slow is skewness: the more skew the distribution of each X1, ... , the slower the convergence. The log-normal distribution is positively skewed, sometimes grossly so -- experiment on the lines of: X<-exp(0+1.0*rnorm(10000)); hist(X,n=100) X<-exp(0+0.8*rnorm(10000)); hist(X,n=100) X<-exp(0+0.6*rnorm(10000)); hist(X,n=100) X<-exp(0+0.4*rnorm(10000)); hist(X,n=100) X<-exp(0+0.3*rnorm(10000)); hist(X,n=100) X<-exp(0+0.2*rnorm(10000)); hist(X,n=100) X<-exp(0+0.1*rnorm(10000)); hist(X,n=100) X<-exp(1+1.0*rnorm(10000)); hist(X,n=100) X<-exp(1+0.8*rnorm(10000)); hist(X,n=100) X<-exp(1+0.7*rnorm(10000)); hist(X,n=100) X<-exp(1+0.6*rnorm(10000)); hist(X,n=100) X<-exp(1+0.4*rnorm(10000)); hist(X,n=100) X<-exp(1+0.2*rnorm(10000)); hist(X,n=100) X<-exp(1+0.1*rnorm(10000)); hist(X,n=100) X<-exp(1+0.05*rnorm(10000)); hist(X,n=100) (hoping this brings the query "on-topic") to get an impression of the variety. A few of these look approximately normal as they stand; the majority do not. As for exploring the "central limit" tendency, you can try things like N<-500;X<-exp(0+1.0*rnorm(N*1000)); Y<-matrix(X,nrow=N); M<-colMeans(Y);hist(M,n=20) hist(X,n=100) [The first line draws a histogram of 1000 means, each of N=500 log-normal variates. The second shows a histogram of the original N*1000 variates, allowing you to compare the two and perceive the extent to which the approach to a normal distribution had been achieved. In this case, the means still have a perceptibly skew distribution, and of course the original data were very heavily skewed. You can evaluate the results for less skew log-normals in a similar way, building on the information from the first series of experiments. This may have been a consideration underlying your boss's argument: If the original data are heavily skew, then the distribution of the monthly means may well still be quite skew and better described by a log-normal than by a normal. However, your observation that the monthly means seem to be close to a normal distribution perhaps indicates this was not the case, so probably the original data, though log-normal, were not so skew that the N=30 or so gave results which were still perceptibly non-normal. (As stated above, as N -> infinity, you will eventually get a normal).] So you can use R usefully to eveluate general statisical issues of this kind! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 02-Dec-04 Time: 11:30:01 ------------------------------ XFMail ------------------------------ ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html