The K-S test has Ho: dist(A) = dist(B) and Ha: dist(A) <> dist(B). Rejecting Ho means that maintaining dist(A) does not differ in any way from dist(B) is untenable, because of the low P-value encountered.

If you wish to test a hypothesis, you must be able to calculate the probability of seeing a difference as large or larger as found using the distributions assumed in formulating Ho. Otherwise you will not be able to calculate a P-value and conduct the test.

Apparently you are interested proving that dist(A) = dist(B). There are two ways of doing this:

1. Test Ho: dist(A)=dist(B) vs. Ha: dist(A)<>dist(B), and calculate the power associated with the test. Logically, you will need a power = 0.95 to give the beta = 0.05 you need for the level of reverse significance you are looking for. This will be difficult to do without quantifying what you mean by difference.

2. Test Ho: dist(A) <> dist(B) vs. Ha: dist(A) = dist(B), but you will need to structure what you mean by "different" so tightly that you can construct a set of simple tests that you can get null probability distributions from. I suggest you read up on the topic of "equivalence testing", when two one-sided tests are used to prove "equivalence" within a specified size of difference in means.

You need to think long and hard about what you are trying to do, and why you are doing it. Why are you doing this test? Are you really concerned about "any" difference? Or do you simply wish to test the relative locations of medians or some other quantile? Can you assume a particular form of the probability distributions for A and B? Are you concerned with a monotonic dominance? Etc.

If you can pin down what you mean by "different", you might be able to find the test you need, or construct it.

If you've got a ton of data, the simple solution is to do a K-S test, and show the power >= 0.95 for your size of "difference", if you can quantify it precisely.

Along the lines of George Box's oft-quoted "All models are wrong, but some are useful", the traditional viewpoint on testing is as follows: We want to be able to hang onto the assumption that Ho is true, because it is the simplest theory to work with, and therefore is parsimonious and convenient. However, naive inspection indicates that there is some type of difference visible. So we conduct a test of Ho to see if the difference is real (i.e., "significant"). For small scale experiments, such as statisticians usually run across, getting a significant test at the 0.05 level typically means a fairly material difference, not a minor one. If the test is significant, life gets complicated, because we must deal with the reality of the difference. If the test is not significant, we hold onto our presumption that Ho is true (even though it may not really be), because we know that the random error is large enough to obscure the size of the difference present anyway.

When we have very large datasets, this logic breaks down, because now we have enough power to detect very small differences, and significance in the test of Ho is a foregone conclusion. Now the size of the difference is the most critical issue. If it's small, it won't be material, if it's large, we can't ignore it. But now we need subject-matter expertise in lieu of statistical tradition.

This sloppiness in test logic is why most statisticians try to avoid hypothesis testing. Instead, it is more fruitful to quantify what you mean precisely by "difference", and then estimate that difference and get a 95% confidence interval. You then go directly to subject-matter knowledge and deal with the materiality of the size of the difference. This is better science.


At 10:14 PM 8/21/2008, Nitin Agrawal wrote:
would it be possible to give an example of how I
can have more specific null hypothesis in R?
I am not aware of how to specify it for the K-S test in R.

And repeating my second question, what is a good way to measure the
difference between observed and expected samples? Is the D statistic of the KS test a good choice?

Nitin

On Thu, Aug 21, 2008 at 7:40 PM, Moshe Olshansky <[EMAIL PROTECTED]>wrote:

> Hi Nitin,
>
> I believe that you can not have null hypothesis to be that A and B come
> from different distributions.
> Asymptotically (as both sample sizes go to infinity) KS test has power 1,
> i.e. it will reject H0:A=B for any case where A and B have different
> distributions.
> To work with a finite sample you must be more specific, i.e. your null
> hypothesis must be not that A and B just have different distributions but
> must be more specific, i.e that their means are different by at least
> something or that certain distance between their distributions is bigger
> than something, etc. and such hypotheses can be tested (and rejected).
>
>
> --- On Fri, 22/8/08, Nitin Agrawal <[EMAIL PROTECTED]<[EMAIL PROTECTED]>>
> wrote:
>
> > From: Nitin Agrawal <[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> >
> > Subject: [R] Null and Alternate hypothesis for Significance test
> > To: r-help@r-project.org
> > Received: Friday, 22 August, 2008, 6:58 AM
> > Hi,
> > I had a question about specifying the Null hypothesis in a
> > significance test.
> > Advance apologies if this has already been asked previously
> > or is a naive question.
> >
> > I have two samples A and B, and I want to test whether A
> > and B come from the same distribution. The default Null hypothesis would be H0: A=B
> > But since I am trying to prove that A and B indeed come
> > from the same distribution, I think this is not the right choice for the
> > null hypothesis (it should be one that is set up to be rejected)
> >
> > How do I specify a null hypothesis H0: A not equal to B for
> > say a KS test.
> > An example to do this in R would be greatly appreciated.
> >
> > On a related note: what is a good way to measure the
> > difference between observed and expected PDFs? Is the D statistic of the KS test a good choice?
> >
> > Thanks!
> > Nitin
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to