Re: [R] Null and Alternate hypothesis for Significance test

Robert A LaBudde Thu, 21 Aug 2008 20:04:28 -0700

The K-S test has Ho: dist(A) = dist(B) and Ha: dist(A) <> dist(B).Rejecting Ho means that maintaining dist(A) does not differ in anyway from dist(B) is untenable, because of the low P-value encountered.

If you wish to test a hypothesis, you must be able to calculate theprobability of seeing a difference as large or larger as found usingthe distributions assumed in formulating Ho. Otherwise you will notbe able to calculate a P-value and conduct the test.

Apparently you are interested proving that dist(A) = dist(B). Thereare two ways of doing this:

1. Test Ho: dist(A)=dist(B) vs. Ha: dist(A)<>dist(B), and calculatethe power associated with the test. Logically, you will need a power= 0.95 to give the beta = 0.05 you need for the level of reversesignificance you are looking for. This will be difficult to dowithout quantifying what you mean by difference.

2. Test Ho: dist(A) <> dist(B) vs. Ha: dist(A) = dist(B), but youwill need to structure what you mean by "different" so tightly thatyou can construct a set of simple tests that you can get nullprobability distributions from. I suggest you read up on the topic of"equivalence testing", when two one-sided tests are used to prove"equivalence" within a specified size of difference in means.

You need to think long and hard about what you are trying to do, andwhy you are doing it. Why are you doing this test? Are you reallyconcerned about "any" difference? Or do you simply wish to test therelative locations of medians or some other quantile? Can you assumea particular form of the probability distributions for A and B? Areyou concerned with a monotonic dominance? Etc.

If you can pin down what you mean by "different", you might be ableto find the test you need, or construct it.

If you've got a ton of data, the simple solution is to do a K-S test,and show the power >= 0.95 for your size of "difference", if you canquantify it precisely.

Along the lines of George Box's oft-quoted "All models are wrong, butsome are useful", the traditional viewpoint on testing is as follows:We want to be able to hang onto the assumption that Ho is true,because it is the simplest theory to work with, and therefore isparsimonious and convenient. However, naive inspection indicates thatthere is some type of difference visible. So we conduct a test of Hoto see if the difference is real (i.e., "significant"). For smallscale experiments, such as statisticians usually run across, gettinga significant test at the 0.05 level typically means a fairlymaterial difference, not a minor one. If the test is significant,life gets complicated, because we must deal with the reality of thedifference. If the test is not significant, we hold onto ourpresumption that Ho is true (even though it may not really be),because we know that the random error is large enough to obscure thesize of the difference present anyway.

When we have very large datasets, this logic breaks down, because nowwe have enough power to detect very small differences, andsignificance in the test of Ho is a foregone conclusion. Now the sizeof the difference is the most critical issue. If it's small, it won'tbe material, if it's large, we can't ignore it. But now we needsubject-matter expertise in lieu of statistical tradition.

This sloppiness in test logic is why most statisticians try to avoidhypothesis testing. Instead, it is more fruitful to quantify what youmean precisely by "difference", and then estimate that difference andget a 95% confidence interval. You then go directly to subject-matterknowledge and deal with the materiality of the size of thedifference. This is better science.



At 10:14 PM 8/21/2008, Nitin Agrawal wrote:

would it be possible to give an example of how I
can have more specific null hypothesis in R?
I am not aware of how to specify it for the K-S test in R.

And repeating my second question, what is a good way to measure the

difference between observed and expected samples? Is the D statisticof the KS test a good choice?


Nitin

On Thu, Aug 21, 2008 at 7:40 PM, Moshe Olshansky <[EMAIL PROTECTED]>wrote:

> Hi Nitin,
>
> I believe that you can not have null hypothesis to be that A and B come
> from different distributions.
> Asymptotically (as both sample sizes go to infinity) KS test has power 1,
> i.e. it will reject H0:A=B for any case where A and B have different
> distributions.
> To work with a finite sample you must be more specific, i.e. your null
> hypothesis must be not that A and B just have different distributions but
> must be more specific, i.e that their means are different by at least
> something or that certain distance between their distributions is bigger
> than something, etc. and such hypotheses can be tested (and rejected).
>
>

> --- On Fri, 22/8/08, Nitin Agrawal<[EMAIL PROTECTED]<[EMAIL PROTECTED]>>

> wrote:
>
> > From: Nitin Agrawal <[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> >
> > Subject: [R] Null and Alternate hypothesis for Significance test
> > To: r-help@r-project.org
> > Received: Friday, 22 August, 2008, 6:58 AM
> > Hi,
> > I had a question about specifying the Null hypothesis in a
> > significance test.
> > Advance apologies if this has already been asked previously
> > or is a naive question.
> >
> > I have two samples A and B, and I want to test whether A

> > and B come from the same distribution. The default Nullhypothesis would be H0: A=B

> > But since I am trying to prove that A and B indeed come
> > from the same distribution, I think this is not the right choice for the
> > null hypothesis (it should be one that is set up to be rejected)
> >
> > How do I specify a null hypothesis H0: A not equal to B for
> > say a KS test.
> > An example to do this in R would be greatly appreciated.
> >
> > On a related note: what is a good way to measure the

> > difference between observed and expected PDFs? Is the Dstatistic of the KS test a good choice?

> >
> > Thanks!
> > Nitin
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Null and Alternate hypothesis for Significance test

Reply via email to