Re: [R] Test if data uniformly distributed (newbie)

2018-04-10 Thread Huber, Florian
Dear Mr. Savicky,

I am currently working on a project where I want to test a random number 
generator, which is supposed to create 10.000 continuously uniformly 
distributed random numbers between 0 and 1. I am now wondering if I can use the 
Chi-Squared-Test to solve this problem or if the Kolmogorov-Smirnov-test would 
be a better fit.

I came across one of your threads on the internet where you answer a similar 
question and thought I'd reach out to you.


Thanks in advance
Florian Huber




Diese Nachricht einschliesslich etwa beigefuegter Anhaenge ist vertraulich und 
kann dem Bank- und Datengeheimnis unterliegen oder sonst rechtlich geschuetzte 
Daten und Informationen enthalten. Wenn Sie nicht der richtige Adressat sind 
oder diese Nachricht irrtuemlich erhalten haben, informieren Sie bitte sofort 
den Absender �ber die Antwortfunktion. Anschliessend moechten Sie bitte diese 
Nachricht einschliesslich etwa beigefuegter Anhaenge unverzueglich vollstaendig 
loeschen. Das unerlaubte Kopieren oder Speichern dieser Nachricht und/oder der 
ihr etwa beigefuegten Anhaenge sowie die unbefugte Weitergabe der darin 
enthaltenen Daten und Informationen sind nicht gestattet. Wir weisen darauf 
hin, dass rechtsverbindliche Erklaerungen namens unseres Hauses grundsaetzlich 
der Unterschriften zweier ausreichend bevollmaechtigter Vertreter unseres 
Hauses beduerfen. Wir verschicken daher keine rechtsverbindlichen Erklaerungen 
per E-Mail an Dritte. Demgemaess nehmen wir per E-Mail auch keine 
rechtsverbindlichen Erklaerungen oder Auftraege von Dritten entgegen. 
Sollten Sie Schwierigkeiten beim Oeffnen dieser E-Mail haben, wenden Sie sich 
bitte an den Absender oder an i...@berenberg.de. Please refer to 
http://www.berenberg.de/my_berenberg/disclaimer_e.html for our confidentiality 
notice.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test if data uniformly distributed (newbie)

2011-06-12 Thread Petr Savicky
On Fri, Jun 10, 2011 at 10:15:36PM +0200, Kairavi Bhakta wrote:
> Thanks for your answer. The reason I want the data to be uniform: It's the
> first step in a machine learning project I am working on. If I know the data
> isn't uniformly distributed, then this means there is probably something
> wrong and the following steps will be biased by the non-uniform input data.
> I'm not checking an assumption for another statistical test.
> 
> Actually, the data has been normalized because it is supposed to represent a
> probability distribution. That's why it sums to 1. My assumption is that,
> for a vector of 5, the data at that point should look like 0.20 0.20 0.20
> 0.20 0.20, but of course there is variation, and I would like to test
> whether the data comes close enough or not.

As others told you, this is not the right format for KS test. The words
"testing uniformity" can mean different things and the meaning depends
on which statistical model you assume. If we have a random variable
with values in [0, 1], then testing uniformity means to test, to which
extent its distribution is close to the uniform distribution on [0, 1].
The numbers, which concentrate around 0.2, will not satisfy this.

If we have a discrete variable with k values, for which we have m
independent observations, and the number of observations of value i
is m_i, then it is possible to test, whether the variable has the uniform
distribution on {1, ..., k} using Chi-squared test. Note that for
this test, the original counts are needed, not their normalized values,
which sum up to 1. For example, if we have 20 observations and
the counts (m_1, ..., m_5) are (4, 3, 5, 2, 6), then this is quite
consistent with the assumption of uniform distribution. On the
other hand, if we have 200 observations and the counts are
(40, 30, 50, 20, 60), then the null hypothesis of uniform distribution
may be rejected (the uniform distribution is the default, see argument
p in ?chisq.test)

  x <- c(40, 30, 50, 20, 60)
  chisq.test(x)

  Chi-squared test for given probabilities

  data:  x 
  X-squared = 25, df = 4, p-value = 5.031e-05

It is not clear, whether this is suitable for your application.
If you generate the values in a different way, then another
test may be needed. Can you specify more detail on how the 
numbers are generated?

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test if data uniformly distributed (newbie)

2011-06-10 Thread Kairavi Bhakta
Thanks for your answer. The reason I want the data to be uniform: It's the
first step in a machine learning project I am working on. If I know the data
isn't uniformly distributed, then this means there is probably something
wrong and the following steps will be biased by the non-uniform input data.
I'm not checking an assumption for another statistical test.

Actually, the data has been normalized because it is supposed to represent a
probability distribution. That's why it sums to 1. My assumption is that,
for a vector of 5, the data at that point should look like 0.20 0.20 0.20
0.20 0.20, but of course there is variation, and I would like to test
whether the data comes close enough or not.

At the moment I am only testing whether there are more a's than b's in the
top and bottom portion of the each file (with a wilcoxon test, I have 8 reps
of the model I am trying to build). But that sort of felt like a very adhoc
solution and I figured maybe testing for uniformity would be better, or at
least a important addition. I've also been looking into testing for the
randomness of the sequence of a's and b's instead of the wilcoxon test,
although that may or may not involve R.

Kairavi.


> Yes, punif is the function to use, however the KS test (and the others)
are based on an assumption of independence, and if you know that your data
points sum to 1, then they are not independent (and not uniform if there are
more than 2).  Also note that these tests only rule out distributions (with
a given type I error rate), but cannot confirm that the data comes from a
given distribution (just that either they do, or there is not enough power
to distinguish between the actual and the test distributions).

> What is your ultimate question/goal?  Why do you care if the data is
uniform or not?

> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111


[Hide Quoted Text]
-Original Message-
From: 
r-help-boun...@r-project.org[mailto:
r-help-bounces@r-
project.org] On Behalf Of Kairavi Bhakta
Sent: Friday, June 10, 2011 11:24 AM
To: 
r-help@r-project.org
Subject: [R] Test if data uniformly distributed (newbie)

Hello,

I have a bunch of files containing 300 data points each with values from 0
to 1 which also sum to 1 (I don't think  the last element is relevant
though). In addition, each data point is annotated as an "a" or a "b".

I would like to know in which files (if any) the data is uniformly
distributed.

I used Google and found out that a Kolmogorov-Smirnov or a Chi-square
goodness-of-fit test could be used. Then I looked up ?kolmogorov and found
"ks.test", but the example there is for the normal distribution and I am not
sure how to adapt it for the uniform distribution. I did ?runif and read
about the uniform distribution but it doesn't say what the "cumulative
distribution" is. Is it "punif", like "pnorm"? I thought of that because I
found a message on this list where someone was told to use "pnorm" instead
of "dnorm". But the help page on the uniform distribution says punif is the
"distribution function". Are the "cumulative distribution" and the
"distribution function" the same thing? Having several names for the same
thing has always confused me very much in statistics.

Also, I am not sure whether I need to specify any parameters for the
distribution and which. I thought maybe I should specify "min=0" and "max=1"
but those appear to be the defaults. Do I need to specify q, the vector
of quantiles?

So is
ks.test(x, punif)
correct or not for what I am attempting to do?
After this I will also need to find out whether the a's and b's are
distributed randomly in each file. I would be greatful for any pointers
although I have not researched this issue yet.

Kairavi.

[[alternative HTML version deleted]]

__
R-help@r-project.orgmailing
list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting
-
guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test if data uniformly distributed (newbie)

2011-06-10 Thread Greg Snow
OK, that is not the correct format for the KS test (which is expecting data 
ranging from 0 to 1 with a fairly flat histogram).  You could possibly test 
this with a Chi-squared test.  Can you tell us more about how the numbers you 
are looking at are generated?  The Chi-squared test could be used on counts of 
1-5 and compared to the assumption that each is equally likely, but there still 
is the question of power and how close to uniform is uniform enough.  You would 
need huge samples to find a difference if the true distribution is only 
slightly non uniform.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: kairavibha...@googlemail.com [mailto:kairavibha...@googlemail.com] On 
Behalf Of Kairavi Bhakta
Sent: Friday, June 10, 2011 2:16 PM
To: Greg Snow; r-help@r-project.org
Subject: RE: [R] Test if data uniformly distributed (newbie)

Thanks for your answer. The reason I want the data to be uniform: It's the 
first step in a machine learning project I am working on. If I know the data 
isn't uniformly distributed, then this means there is probably something wrong 
and the following steps will be biased by the non-uniform input data. I'm not 
checking an assumption for another statistical test.

Actually, the data has been normalized because it is supposed to represent a 
probability distribution. That's why it sums to 1. My assumption is that, for a 
vector of 5, the data at that point should look like 0.20 0.20 0.20 0.20 0.20, 
but of course there is variation, and I would like to test whether the data 
comes close enough or not.

At the moment I am only testing whether there are more a's than b's in the top 
and bottom portion of the each file (with a wilcoxon test, I have 8 reps of the 
model I am trying to build). But that sort of felt like a very adhoc solution 
and I figured maybe testing for uniformity would be better, or at least a 
important addition. I've also been looking into testing for the randomness of 
the sequence of a's and b's instead of the wilcoxon test, although that may or 
may not involve R.

Kairavi.


> Yes, punif is the function to use, however the KS test (and the others) are 
> based on an assumption of independence, and if you know that your data points 
> sum to 1, then they are not independent (and not uniform if there are more 
> than 2).  Also note that these tests only rule out distributions (with a 
> given type I error rate), but cannot confirm that the data comes from a given 
> distribution (just that either they do, or there is not enough power to 
> distinguish between the actual and the test distributions).

> What is your ultimate question/goal?  Why do you care if the data is uniform 
> or not?

> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org<https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOX&index=81599>
> 801.408.8111

[Hide Quoted Text]
-Original Message-
From: 
r-help-boun...@r-project.org<https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOX&index=81599>
 
[mailto:r-help-bounces@r-<https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOX&index=81599>
project.org<http://project.org>] On Behalf Of Kairavi Bhakta
Sent: Friday, June 10, 2011 11:24 AM
To: 
r-help@r-project.org<https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOX&index=81599>
Subject: [R] Test if data uniformly distributed (newbie)

Hello,

I have a bunch of files containing 300 data points each with values from 0 to 1 
which also sum to 1 (I don't think  the last element is relevant though). In 
addition, each data point is annotated as an "a" or a "b".

I would like to know in which files (if any) the data is uniformly distributed.

I used Google and found out that a Kolmogorov-Smirnov or a Chi-square 
goodness-of-fit test could be used. Then I looked up ?kolmogorov and found 
"ks.test", but the example there is for the normal distribution and I am not 
sure how to adapt it for the uniform distribution. I did ?runif and read about 
the uniform distribution but it doesn't say what the "cumulative distribution" 
is. Is it "punif", like "pnorm"? I thought of that because I found a message on 
this list where someone was told to use "pnorm" instead of "dnorm". But the 
help page on the uniform distribution says punif is the "distribution 
function". Are the "cumulative distribution" and the "distribution function" 
the same thing? Having several names for the same thing has always confused me 
very much in statistics.

Also, I am not sure whether I need to specify any parameters for the 
distribution and which. I thought maybe I should specify "min=0" and "

Re: [R] Test if data uniformly distributed (newbie)

2011-06-10 Thread Greg Snow
Yes, punif is the function to use, however the KS test (and the others) are 
based on an assumption of independence, and if you know that your data points 
sum to 1, then they are not independent (and not uniform if there are more than 
2).  Also note that these tests only rule out distributions (with a given type 
I error rate), but cannot confirm that the data comes from a given distribution 
(just that either they do, or there is not enough power to distinguish between 
the actual and the test distributions).

What is your ultimate question/goal?  Why do you care if the data is uniform or 
not?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Kairavi Bhakta
> Sent: Friday, June 10, 2011 11:24 AM
> To: r-help@r-project.org
> Subject: [R] Test if data uniformly distributed (newbie)
> 
> Hello,
> 
> I have a bunch of files containing 300 data points each with values
> from 0
> to 1 which also sum to 1 (I don't think  the last element is relevant
> though). In addition, each data point is annotated as an "a" or a "b".
> 
> I would like to know in which files (if any) the data is uniformly
> distributed.
> 
> I used Google and found out that a Kolmogorov-Smirnov or a Chi-square
> goodness-of-fit test could be used. Then I looked up ?kolmogorov and
> found
> "ks.test", but the example there is for the normal distribution and I
> am not
> sure how to adapt it for the uniform distribution. I did ?runif and
> read
> about the uniform distribution but it doesn't say what the "cumulative
> distribution" is. Is it "punif", like "pnorm"? I thought of that
> because I
> found a message on this list where someone was told to use "pnorm"
> instead
> of "dnorm". But the help page on the uniform distribution says punif is
> the
> "distribution function". Are the "cumulative distribution" and the
> "distribution function" the same thing? Having several names for the
> same
> thing has always confused me very much in statistics.
> 
> Also, I am not sure whether I need to specify any parameters for the
> distribution and which. I thought maybe I should specify "min=0" and
> "max=1"
> but those appear to be the defaults. Do I need to specify q, the vector
> of
> quantiles?
> 
> So is
> > ks.test(x, punif)
> correct or not for what I am attempting to do?
> 
> After this I will also need to find out whether the a's and b's are
> distributed randomly in each file. I would be greatful for any pointers
> although I have not researched this issue yet.
> 
> Kairavi.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.