Re: [R] Variance Computing - HELP

2003-08-19 Thread Thomas W Blackwell
The variance of Xbar decreases as 1/n;  the sample variance
of X does not.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Tue, 19 Aug 2003, Padmanabhan, Sudharsha wrote:

 I am running a few simulations for clinical trial anlysis. I want some help
 regarding the following.

 We know trhat as the sample size increases, the variance should decrease, but
 I am getting some unexpected results. SO I ran a code (shown below) to check
 the validity of this.

 large-array(1,c(1000,1000))
 small-array(1,c(100,1000))
 for(i in 1:1000){large[i,]-rnorm(1000,0,3)}
 for(i in 1:1000){small[i,]-rnorm(100,0,3)}}
 yy-array(1,100)
 for(i in 1:100){yy[i]-var(small[i,])}
 y1y-array(1,1000)
 for(i in 1:1000){y1y[i]-var(large[i,])}
 mean(yy);mean(y1y);
 [1] 8.944
 [1] 9.098

 This shows that on an average,for 1000 such samples of 1000 Normal numbers,
 the variance is higher than that of a 100 samples of 1000 random numbers.

 Why is this so?
 Can someone please help me out


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!

2003-08-19 Thread Jonathan Baron
On 08/19/03 17:42, Padmanabhan, Sudharsha wrote:

Hello,

I am running a few simulations for clinical trial anlysis. I want some help 
regarding the following.

We know trhat as the sample size increases, the variance should decrease, but 
I am getting some unexpected results. SO I ran a code (shown below) to check 
the validity of this.

large-array(1,c(1000,1000))
small-array(1,c(100,1000))
for(i in 1:1000){large[i,]-rnorm(1000,0,3)}
for(i in 1:1000){small[i,]-rnorm(100,0,3)}}
yy-array(1,100)
for(i in 1:100){yy[i]-var(small[i,])}
y1y-array(1,1000)
for(i in 1:1000){y1y[i]-var(large[i,])}
mean(yy);mean(y1y);
[1] 8.944
[1] 9.098


This shows that on an average,for 1000 such samples of 1000 Normal numbers, 
the variance is higher than that of a 100 samples of 1000 random numbers.

Why is this so?

Don't know, but it could be a fluke.  You don't say how many
times you did it.

I did the following, with 1000 in each test.  You have 100 in the
small test and 1000 in the big one.  My numbers look pretty
close.

 bigmat - matrix(rnorm(100),1000,1000) # 1000 rows of 1000 each
 smallmat - matrix(rnorm(10),1000,100) # 1000 rows of 100 each
 mean(apply(bigmat,1,var)) # get variance of each row, then take mean
[1] 0.344
 mean(apply(smallmat,1,var))
[1] 0.9967427

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:http://www.sas.upenn.edu/~baron
R page:   http://finzi.psych.upenn.edu/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!

2003-08-19 Thread Liaw, Andy
First of all, your subscripting is wrong.  The first index is for row, and
the second for column.  Thus large[i,] refers to the i-th row of large,
rather than the i-th column.  Also, the code as you provided contain syntax
error.

Try:

set.seed(311)  ## Always a good idea to set seed for simulation!
large - matrix(rnorm(1000*1000), 1000, 1000)
small - matrix(rnorm(100*1000), 100, 1000)
var.large - apply(large, 2, var)  ## Apply the var function to each column
var.small - apply(small, 2, var)

The result looks like:
 summary(var.large); summary(var.small)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
 0.8617  0.9705  1.0010  1.0020  1.0320  1.1520 
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
 0.5846  0.9021  0.9948  0.9990  1.0850  1.5360 

as expected:  The mean is about the same, but the spread is much smaller for
larger sample size.

This sort of things can be computed exactly using basic math stat, BTW.

Andy


 -Original Message-
 From: Padmanabhan, Sudharsha [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, August 19, 2003 1:43 PM
 To: [EMAIL PROTECTED]
 Subject: [R] Variance Computing- - HELP!!
 
 
 
 Hello,
 
 I am running a few simulations for clinical trial anlysis. I 
 want some help 
 regarding the following.
 
 We know trhat as the sample size increases, the variance 
 should decrease, but 
 I am getting some unexpected results. SO I ran a code (shown 
 below) to check 
 the validity of this.
 
 large-array(1,c(1000,1000))
 small-array(1,c(100,1000))
 for(i in 1:1000){large[i,]-rnorm(1000,0,3)}
 for(i in 1:1000){small[i,]-rnorm(100,0,3)}}
 yy-array(1,100)
 for(i in 1:100){yy[i]-var(small[i,])}
 y1y-array(1,1000)
 for(i in 1:1000){y1y[i]-var(large[i,])}
 mean(yy);mean(y1y);
 [1] 8.944
 [1] 9.098
 
 
 This shows that on an average,for 1000 such samples of 1000 
 Normal numbers, 
 the variance is higher than that of a 100 samples of 1000 
 random numbers.
 
 Why is this so?
 
 
 Can someone please help me out
 
 Thanks.
 
 Regards
 
 ~S.
 
 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo /r-help
 

--
Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (Whitehouse Station, New Jersey, USA), and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp  Dohme or MSD) that may be confidential, proprietary copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message.  If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!

2003-08-19 Thread James MacDonald
I think you are confused. As sample size increases, the variance of an
estimate based on that sample will decrease asymtotically to zero (e.g.,
the standard error of the mean will go to zero). However the variance of
the sample itself will not change. Any difference you see in your data
is simply due to chance. If you repeat, the larger set may or may not
have a larger variance.

 var(rnorm(1, 0, 3))
[1] 8.958727
 var(rnorm(1, 0, 3))
[1] 9.155332
 var(rnorm(1, 0, 3))
[1] 9.050894
 var(rnorm(1, 0, 3))
[1] 9.282509
 var(rnorm(10, 0, 3))
[1] 8.990778
 var(rnorm(10, 0, 3))
[1] 9.024343
 var(rnorm(10, 0, 3))
[1] 8.999064
 
 var(rnorm(10, 0, 3))
[1] 9.088034


HTH

Jim



James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

 Padmanabhan, Sudharsha [EMAIL PROTECTED] 08/19/03 01:42PM


Hello,

I am running a few simulations for clinical trial anlysis. I want some
help 
regarding the following.

We know trhat as the sample size increases, the variance should
decrease, but 
I am getting some unexpected results. SO I ran a code (shown below) to
check 
the validity of this.

large-array(1,c(1000,1000))
small-array(1,c(100,1000))
for(i in 1:1000){large[i,]-rnorm(1000,0,3)}
for(i in 1:1000){small[i,]-rnorm(100,0,3)}}
yy-array(1,100)
for(i in 1:100){yy[i]-var(small[i,])}
y1y-array(1,1000)
for(i in 1:1000){y1y[i]-var(large[i,])}
mean(yy);mean(y1y);
[1] 8.944
[1] 9.098


This shows that on an average,for 1000 such samples of 1000 Normal
numbers, 
the variance is higher than that of a 100 samples of 1000 random
numbers.

Why is this so?


Can someone please help me out

Thanks.

Regards

~S.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!

2003-08-19 Thread Tony Plate
Perhaps you were trying for as sample size increases, variance *of the 
mean* decreases (a least when variance is finite).  If you swap mean and 
var in your code, I think you will get what you are looking for.

-- Tony Plate

At Tuesday 05:42 PM 8/19/2003 +, Padmanabhan, Sudharsha wrote:

Hello,

I am running a few simulations for clinical trial anlysis. I want some help
regarding the following.
We know trhat as the sample size increases, the variance should decrease, but
I am getting some unexpected results. SO I ran a code (shown below) to check
the validity of this.
large-array(1,c(1000,1000))
small-array(1,c(100,1000))
for(i in 1:1000){large[i,]-rnorm(1000,0,3)}
for(i in 1:1000){small[i,]-rnorm(100,0,3)}}
yy-array(1,100)
for(i in 1:100){yy[i]-var(small[i,])}
y1y-array(1,1000)
for(i in 1:1000){y1y[i]-var(large[i,])}
mean(yy);mean(y1y);
[1] 8.944
[1] 9.098
This shows that on an average,for 1000 such samples of 1000 Normal numbers,
the variance is higher than that of a 100 samples of 1000 random numbers.
Why is this so?

Can someone please help me out

Thanks.

Regards

~S.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!

2003-08-19 Thread Wiener, Matthew
Hi.

There is no reason the variance of a normal should decrease as you take
larger samples.  Indeed, in your call itself, you say that you want a sample
from a normal with a standard deviation of 3, and so a variance of 9.  As
expected, both of your estimates of variance are close to 9.

What should decrease is the variance of the estimate of the mean, which is
the variance of the sample divided by the number of elements in your sample.
That will indeed decrease as n increases.
 
Also, a couple of R programming points raised by your example:

You can populate your entire matrix of random numbers with a single call,
with good time savings.  (That probably doesn't matter much in this toy
example, but might if you do larger simulations for some problem.)
For example:  matrix(rnorm(10, 0, 3), nr = 100, nc = 1000) gets you your
matrix small.

Similarly, your loop over the rows for taking variance can be replaced by 
yy - apply(small, 1, var)
Which may not be faster, but is certainly easier to read.  And of course
you'd want to replace the call to var with a function that calculates
standard error.

Hope this helps,

Matt Wiener


-Original Message-
From: Padmanabhan, Sudharsha [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 19, 2003 1:43 PM
To: [EMAIL PROTECTED]
Subject: [R] Variance Computing- - HELP!!



Hello,

I am running a few simulations for clinical trial anlysis. I want some help 
regarding the following.

We know trhat as the sample size increases, the variance should decrease,
but 
I am getting some unexpected results. SO I ran a code (shown below) to check

the validity of this.

large-array(1,c(1000,1000))
small-array(1,c(100,1000))
for(i in 1:1000){large[i,]-rnorm(1000,0,3)}
for(i in 1:1000){small[i,]-rnorm(100,0,3)}}
yy-array(1,100)
for(i in 1:100){yy[i]-var(small[i,])}
y1y-array(1,1000)
for(i in 1:1000){y1y[i]-var(large[i,])}
mean(yy);mean(y1y);
[1] 8.944
[1] 9.098


This shows that on an average,for 1000 such samples of 1000 Normal numbers, 
the variance is higher than that of a 100 samples of 1000 random numbers.

Why is this so?


Can someone please help me out

Thanks.

Regards

~S.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

--
Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (Whitehouse Station, New Jersey, USA), and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp  Dohme or MSD) that may be confidential, proprietary copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message.  If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!

2003-08-19 Thread Richard A. O'Keefe
Padmanabhan, Sudharsha [EMAIL PROTECTED]
We know trhat as the sample size increases, the variance should
decrease,

Should it?

I can paraphrase his test case thus:

v100 - sapply(1:100, function(i) var(rnorm(100, 0, 3)))
# We expect the elements of v100 to cluster around 3^2
v1000 - sapply(1:1000, function(i) var(rnorm(1000, 0, 3)))
# We expect the elements of v1000 to cluster around 3^2 too.
fivenum(v100)
=  [1]  6.469134  7.884637  8.916314 10.189463 13.897817
#
fivenum(v1000)
=  [1]  7.874345  8.692326  8.967684  9.268955 10.503038
#

The population parameter sigma-squared is 3^2 = 9.
The estimates are 8.92 in one case and 8.97 in the other;
sounds about right to me.

Looking at density(v100) and density(v1000) is enlightening.

Means and standard deviations:

mean(v100)  var(v100)
=  9.0806762.376193
mean(v1000) var(v1000)
=  8.98147 0.1721246

Are these not pretty much as expected?  Not that a t-test is the
ideal test for the distributions involved, but it's familiar and
since the distribution is pretty bell-shaped, it may be usable as
a rough guide to whether to be worried or not.

 t.test(v100, v1000)

Welch Two Sample t-test

data:  v100 and v1000 
t = 0.6413, df = 100.439, p-value = 0.5228
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.2077100  0.4061231 
sample estimates:
mean of x mean of y 
 9.080676  8.981469

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help