Thanks for the reply.
First, the reason for using NormalDistributionImpl is that I'm
translating from FORTRAN and I wasn't thinking a whole lot. Using Random
makes sense.
Second, I'm not really caring, at least not yet, about how normal my
output samples are. What I do really care about is that the means of my
generated samples match the observed means. For example, if I start with
an observed max temperature of 30, I want to determine that 30 is within
the confidence interval of the sample at a 90% level. What I think that
I am getting is, using the same technique, half the time I am hitting my
goal and the other half my samples stink. If my understanding is
correct, I should be getting a 90% confidence level 9 times out of 10,
more or less, which clearly isn't happening.
wjr
Phil Steitz wrote:
On 9/17/07, William J Rust <[EMAIL PROTECTED]> wrote:
I'm working on a climate simulation program that takes monthly averages
and generates daily readings that are assumed to be normally
distributed. The following program creates 10 sets of 100,000 random
deviates with mean 10 and SD 5. It then applies a t test (results below)
to ensure that the generated numbers are good enough. As the results
show, they aren't. I'm wondering a) I am doing something wrong or b) is
there something wrong with the stats routines?
There are a couple of problems here. First, while your inversion
method should generate approximately normally distributed values, it
is better to use the JDK-supplied method for this (much faster and a
better algorithm). There is a wrapped version of this provided in
org.apache.commons.math.random.RandomDataImpl. To use that:
import org.apache.commons.math.random.RandomData;
import org.apache.commons.math.random.RandomDataImpl;
RandomData randomData = new RandomDataImpl();
...
arry[idx] = randomData.nextGaussian(10, 5);
Second, I don't understand what you are expecting from the t-test.
TestUtils.tTest(mu, array) returns the p-value associated with a
two-tailed test with the null hypothesis that the values in the array
come from a distribution with mean = mu. So small p-values, say less
than .01, would indicate that the mean appears to differ significantly
from 10. This should happen roughly one in every 100 times.
Differences as large as what you observed on your first run should
happen about 34 out of every 100 times, etc. The values reported
below do not look surprising to me. They do not support rejecting the
null hypothesis that the mean is what it is supposed to be, which is a
good thing.
To test normality of the deviates, you should apply a normality test
to the deviates themselves, e.g. a Kolmogorov-Smirnov test. Commons
math does not currently include normality tests (patches welcome :).
To do this, you would need to dump the generated arrays to a file and
then do the test with R or some other package that includes normality
tests.
Unless I am missing something, I don't think a t-test is going to give
you the information that you need to verify that the generated values
are normally distributed. Another thing that you could do is to
examine the empirical distribution of the generated values - lay a
grid over the range and count how many fall into each range and
compare these counts to what you would expect under the hypothesis of
normality (essentially what the K-S test does). You can use
org.apache.commons.random.EmpircalDistribution to bin the generated
data and get bin counts.
If you do find that normality tests fail on the generated values using
either your inversion method or the RandomDataImpl.nextGaussian
method, please open a Jira ticket
(http://commons.apache.org/math/issue-tracking.html) including the R
script or output from the package that you used for testing. Thanks!
hth,
Phil
Thanks,
wjr
package usda.weru.cligen2;
import org.apache.commons.math.MathException;
/**
*
* @author wjr
*/
public class TestNormal {
static org.apache.commons.math.distribution.NormalDistributionImpl nd =
new
org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);
public static void main(String[] args) {
double[] arry = new double[100000];
java.util.Random ran = new java.util.Random(1l);
for (int jdx = 0; jdx < 10; jdx++) {
for (int idx = 0; idx < arry.length; idx++) {
try {
arry[idx] =
nd.inverseCumulativeProbability(ran.nextDouble());
} catch (MathException ex) {
ex.printStackTrace();
}
}
try {
System.out.println("ttest " +
org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
} catch (IllegalArgumentException ex) {
ex.printStackTrace();
} catch (MathException ex) {
ex.printStackTrace();
}
}
}
}
Output:
run-single:
ttest 0.3433300114960922
ttest 0.1431930575825282
ttest 0.12336027805916228
ttest 0.49478850669361796
ttest 0.9216887341410063
ttest 0.9937228334312525
ttest 0.13669784550400177
ttest 0.9646134537758599
ttest 0.9965741269090211
ttest 0.03815948891784959
BUILD SUCCESSFUL (total time: 20 seconds)
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]