95% confidence interval

2001-01-31 Thread James Ankeny

 Hello, 
   I am currently taking a first course in statistics, and I was hoping that
perhaps someone might be kind enough to answer a question for me. I
understand that, while a quantitative variable may not be normally
distributed, we may calculate the mean of the sample, and use facts about
the Central Limit Theorem, to form a 95% confidence interval for the
population mean. As far as I know, this means that in 95/100 samples, the
interval will contain the true population mean. This seems very useful at
first, but then something begins to confuse me. Yes, we have an interval
that may contain the true population mean, but ... if the distribution is
heavily skewed to the right, say like income, why do we want an interval for
the population mean, when we are taught that the median is a better measure
of central tendency for skewed distributions? This is what confuses me. I
hope that I have phrased my question in such a way that people can
understand what I am saying, and why I am confused. There is just one more
thing I would like to get off my chest. My textbook talks about simple
random sampling, where you can specify the probability of a sample being
selected from the population. Yet, there are examples in the book which deal
with conceptual populations, such as the set of all cars of a particular
model which may be manufactured in the future. Suppose you have a sample of
several of these autos, and you want to find a 95% confidence interval for
mean miles/gallon. How is this an SRS when you can't specify the probability
of a sample being selected, because the population is conceptual? Perhaps I
am simply looking at everything the wrong way, but this is very confusing to
me. Any help would be greatly appreciated. 

   






___
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



basic stats question

2001-02-26 Thread James Ankeny

 Hello, 
   I have a question regarding basic probability and statistics. If I
understand correctly, the definition of independence holds for two events
that are subsets of the same sample space. In other cases, we may need to
construct a new sample space, such as with the flipping of a coin twice.
Here, we construct the new sample space S=S1xS2={HH,HT,TH,TT}, where
Si={H,T} for i=1,2. This way, two events that are independent, such as
A="head on first toss" and B="tails on second toss," are subsets of the same
sample space. 
   Now, the problem that I have is that, while it is not difficult to
construct sample spaces intuitively for textbook problems, it is difficult
to do so using these basic definitions of probability. For ex., consider a
problem where a manufacturer has five seemingly identical computers, though
two are really defective and three are good. An order calls for two of the
computers, and we want the probability of the event A="order is filled with
two good computers."
Intuitively, it is obvious that if D1 and D2 are the bad computers, and
G1-G3 are the good computers, then
S={D1D2,D1G1,D1G2,D1G3,D2G1,D2G2,D2G3,G1G2,G1G3,G2G3}. Thus, P(A)= 0.30.
However, I cannot think of any way of constructing the sample space using
definitions like the cartesian product. Perhaps this is because the second
computer chosen depends on which computer is chosen first. Yet, another
similar problem in my textbook states that the probabilities of a computer
being good and defective (from a particular manufacturer) are 0.90 and 0.10,
respectively. Then, if we want to test five computers, we may construct the
sample space S=S1xS2xS3xS4xS5, where Si={G,D} for i=1,...,5. Hence, if
A="all five computers tested are good," P(A)=(0.90)^5. Why is that we can
use the Cartesian product in this case but not in the other case? Is it that
in the first case we are not performing an experiment, but just sampling?
Perhaps I am thinking about this too much, but it would be nice to be able
to construct these sample spaces for problems using some sort of formulaic
method, as opposed to intuition (perhaps this isn't the right way to view
this subject?). Any help would be greatly appreciated. 





___
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



probability definition

2001-02-28 Thread James Ankeny

 Hello,
   I have a question regarding the definition of probability. If I
understand correctly, probability may be defined using just axioms. However,
my textbook also uses a relative frequency definition, in which a
probability is defined as being the proportion of times an outcome occurs in
repeated trials of an experiment. This makes sense when one flip of the coin
is one trial, and in repeated trials, the proportion of heads is 1/2. But
what about a situation (an ex. in my textbook) where the probability of rain
tomorrow is 0.70. How do you define this experiment? Perhaps you measure
rainfall, temperature, pressure, etc. for each day over a long time period.
Then the probability of rain tomorrow is the proportion of times that rain
occurred on days with similar values for temp., humidity, etc.? This seems a
bit awkard to me. Also, how many trials must one perform an experiment,
before you know that the proportion converges to a particular fraction? Any
help on interpretation of relative frequency probabilities would be greatly
appreciated. In many cases, it seems difficult, at least for textbook
examples, to define what the actual experiment is. 





___
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



elementary prob./stats concepts

2001-03-22 Thread James Ankeny

  According to a textbook I have, a random sample of n objects from a random
variable X, is composed of n random variables itself, namely, X1,X2,...,Xn.
I am having some difficulties in figuring out how to interpret this. For
example, suppose that you are considering the population of adult males in
the U.S., and the random variable is weight. If you take a random sample of
n individuals, are the elements of the sample random (prior to observing
them, of course) because you might observe something different in another
sample due to measurement error? Or perhaps you might get something
different if you took the sample at a different time when weight has
changed? Also, if the elements of a random sample are random variables
themselves, do they have their own parameters, such as mean and standard
deviation, as well as their own density functions and cumulative
distribution functions? 

  Also, if a statistic is a function of random variables, can a statistic
take the form of a density function with a random vector representing the n
variables? I know, conceptually, that the sampling distribution of a
statistic is purely theoretical and that it represents how a statistic
varies from one sample to another. Mathematically, however, I do not
understand how to represent this, or if the sampling distribution of a
statistic is analogous to the distribution of a random variable which may
have a density function. 
  
  I do not know if these questions even make any sense, but the concepts are
fairly confusing to me. Any help would be greatly appreciated.





___
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



normal approx. to binomial

2001-04-09 Thread James Ankeny

  Hello,
I have a question regarding the so-called normal approx. to the binomial
distribution. According to most textbooks I have looked at (these are
undergraduate stats books), there is some talk of how a binomial random
variable is approximately normal for large n, and may be approximated by the
normal distribution. My question is, are they saying that the sampling
distribution of a binomial rv is approximately normal for large n?
Typically, a binomial rv is not thought of as a statistic, at least in these
books, but this is the only way that the approximation makes sense to me.
Perhaps, the sampling distribution of a binomial rv may be normal, kind of
like the sampling distribution of x-bar may be normal? This way, one could
calculate a statistic from a sample, like the number of successes, and form
a confidence interval. Please tell me if this is way off, but when they say
that a binomial rv may be normal for large n, it seems like this would only
be true if they were talking about a sampling distribution where repeated
samples are selected and the number of successes calculated.






___
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



simple linear regression

2001-09-12 Thread James Ankeny

  I have two questions regarding simple linear regression that I was hoping
someone could help me with.

1) According to what I have learned so far, the levels of X are fixed, so
that only Y is the random variable ( error is random as well). My question
is, what if X is a random variable as well? It seems like this could be the
case with some of my textbook examples. Does simple model of y=a+bx+e still
hold? Are assumptions the same, such as conditional distributions of Y are
normal with same variance, E(Y) is a straight line function of X, and
independence/normality of error terms? Also, in repeated sampling the sample
slope is normal because Y is normal. However, if X also varies from sample
to sample, is the sample slope still normally distributed (sampling
distribution)?

2) My second question regards the prediction interval. I can perform this on
a computer, but it is difficult for me to conceptualize. If you are using
Y-hat (the mean of estimated regression function) to estimate a future
response, does this mean that the difference, 
(Y(future response)-Y hat), is a statistic that has a sampling distribution,
from which you can derive the standard error? It seems like this might be
the case, but there is no parameter. I don't even know if what I just said
makes any sense. 

 I understand that my questions are long, and perhaps not in any logical
order, but I would greatly appreciate any help with these conceptual
matters.

 Thank you





___
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



semi-studentized residual

2001-10-01 Thread James Ankeny

 Hello,
  I have a question regarding the so-called semi-studentized residual,
which is of the form (e_i)* = ( e_i - 0 ) / sqrt(MSE). Here, e_i is the ith
residual, 0 is the mean of the residuals, and sqrt(MSE) means the square
root of MSE. Now, if I understand correctly, the population simple linear
regression model assumes that the E_i, the error terms, are independent and
identically distributed random variables with N(0, sigma^2). My question is,
are semi-studentized residuals not fully studentized because MSE is not the
variance of all the residuals? It seems like MSE would be the variance of
the residuals, unless of course the residuals from the sample data are not
independent and identically distributed random variables. If not, each
residual may have its own variance, in which case we would have to find this
and studentize each residual by its own standard error? I am not sure if I
am thinking about this in the right way.
 Also, if the E_i are iid random variables, does this mean that the
observations Y_i are iid random variables within a particular level of X? (I
know that in general the Y_i are not iid r.v. since they have different
means depending on the level of X). I hope these questions make sense. Thank
you for your help.   





___
Send a cool gift with your E-Card
http://www.bluemountain.com/giftcenter/




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=