Re: 3-D regression planes graph,

2000-04-10 Thread Jon Cryer

The free ARC software from the University of Minnesota will do some of this.
Look at

http://stat.umn.edu/ARCHIVES/archives.html

Jon Cryer

At 01:59 PM 4/10/00 -0500, you wrote:

Hello all,

I'm looking for software that can display a 3-D regression environment (x,
y, and z variables) and draw a regression plane for each of two subgroups.

So far, Minitab does a good job of the 3-D scatterplots (regular,
wireframe, and surface (plane) plots), but there's no option (as in the
regular scatterplots) to either code data points according to categorical
variables or to overlay two graphs on the same set of axes.

I'm saving the data in both Minitab and SPSS files, and I can easily
convert to Excel (as a standard go-between spreadsheet file).

Any help will be greatly appreciated.  The effect in my research that I'm
finding so far is that my two groups look similar in univariate and
bivariate settings, but the trivariate regression planes are different.  I
know I could do what I needed to with regression equations (and will do
so), but I'd l-o-v-e to have some graphs to go with it.  SPSS will be fine
for the actual regression equations-- it can deal with subgroups like
that.

Thank you very much in advance,

Cherilyn Young




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



density of integral(RV(t)~f(t), 0..T, dt)

2000-04-19 Thread Jon Cryer

Can't be done without knowledge of the joint distributions of
Y(t1), Y(t2),..., Y(t).

Jon Cryer

--- Text of forwarded message ---
X-Authentication-Warning: jse.stat.ncsu.edu: majordom set sender to
[EMAIL PROTECTED] using -f
To: [EMAIL PROTECTED]
Date: Wed, 19 Apr 2000 16:46:32 +0200
From: Thomas Peter Burg [EMAIL PROTECTED]
Organization: University of Illinois at Urbana-Champaign
Reply-To: [EMAIL PROTECTED]
Subject: density of integral(RV(t)~f(t), 0..T, dt)
Sender: [EMAIL PROTECTED]
Precedence: bulk

Does anyone know if there's an answer to the following problem:

I'm given a function of time Y(t), with the property that all values of
Y are
random variables which are drawn from a time dependent distribution with

known time dependent density f(t). I.e. the probability that Y(t)x is
Integral(f(t),-inf..x,dt):

d/dx P( Y(t)  x ) = f(t)

With these facts given, is there anything that can be said about the
distribution of

Integral(Y(tau), 0..t, dtau) ??

or its density function?

Is there a nice expression for that in terms of the known density f(t)
in
general?
or maybe with specific assumptions about f? (E.g. Gaussian with mean(t)
and
var(t))

I'd greatly appreciate answers to any of these questions or any
references
that might deal with this problem.

Thanks,

Thomas Burg
Dept. of Physics,
Swiss Federal Institute of Technology

[EMAIL PROTECTED]




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===


   _
- | \
Jon Cryer[EMAIL PROTECTED]   (   )
Department of Statistics http://www.stat.uiowa.edu\  \_ University
 and Actuarial Science   office 319-335-0819   \   *   \ of Iowa
The University of Iowa   dept.  319-335-0706\  / Hawkeyes
Iowa City, IA   52242FAX319-335-3017 | )
- V



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-20 Thread Jon Cryer

I thought everone knew there was a difference in Anatomy between male
and female professors! ;)

At 12:19 PM 4/20/00 +0100, you wrote:
dennis roberts wrote:
 
 At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:
 
  There's a chapter in J. Utts' mostly wonderful but flawed low-math
intro
 text "Seeing Through Statistics", in which she does much the same. She
 presents a case study based on some of her own work in which she looked at
 the question of gender discrimination in pay at her own university, and
 fails to reject the null hypothesis [no systemic difference in pay between
 male and female faculty]. She heads the example "Important, but not
 significant, differences in salaries"; comments (_perhaps_ technically
 correctly but misleadingly) that "a statistically naive reader could
 conclude that there is no problem" and in closing states:
 
 the flaw here is that ... she has population data i presume ... or about as
 close as one can come to it ... within the institution ... via the budget
 or comptroller's office ... THE salary data are known ... so, whatever
 differences are found ... DEMS are it!
 
 the notion of statistical significance in this case seems IRRELEVANT ...
 the real issue is ... given that there are a variety of factors that might
 account for such differences (numbers in ranks, time in ranks, etc. etc.)
  is the remaining difference (if there is one) IMPORTANT TO DEAL
WITH ...

Yes! This reminds me of a newspaper article and radio news item in the UK
this
year about female and male professors. They had data to show that there was a
large salary difference. However, they went on to say that the largest
difference was in Anatomy. I mentioned this to a female colleague of mine
(who
works in that area) who pointed out there was only one female professor of
Anatomy in the UK.

Thom


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===


   _
----- | \
Jon Cryer[EMAIL PROTECTED]   (   )
Department of Statistics http://www.stat.uiowa.edu\  \_ University
 and Actuarial Science   office 319-335-0819   \   *   \ of Iowa
The University of Iowa   dept.  319-335-0706\  / Hawkeyes
Iowa City, IA   52242FAX319-335-3017 | )
- V



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: normality and regression analysis

2000-05-11 Thread Jon Cryer

Mike:

It's really the error terms in the regression model that are required to
have normal distributions with constant variance. We check this by looking
at the properties of the residuals from the regression. You shouldn't expect
the response (dependent) variable to have a normal distribution with a fixed
mean since then you wouldn't be doing regression.

By the way, you have a fine Statistics Department at VPI. I am sure they
do excellent consulting.

Jon Cryer

At 06:39 PM 5/11/00 -0400, you wrote:
I would like to obtain a prediction equation using linear regression for
some data that I have collected.  I have read in some stats books that
linear regression has 4 assumptions, 2 of them being that 1) data is
normally distributed and 2) constant variance.  In SAS, I have run
univariate analysis testing for normality on both my dependent and
independent variable (n=147). Both variables have distributions that are
skewed.

For the dependent variable:  skewness=0.69 and Kurtosis=0.25.
For the independent variable: skewness=0.52 and Kurtosis= -0.47.

The normality test (Shapiro-Wilk Statistic) states that both the dependent
and independent variables are not normally distributed.

I have also transformed the data (both dependent and independent variables)
using log, arcsine, and square root transformations.  When I run the
normality tests on the transformed data, the test shows that even the
transformed data is not normally distributed.

I realize that I can use nonparametric tests for correlation (I will use
Spearman), but is there a nonparametric linear regression?  If not, is it
acceptable to use linear regression analysis on data that is not normally
distributed as a way to show there is a linear relationship?

thanks in advance..Mike




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===


   _
- | \
Jon Cryer[EMAIL PROTECTED]   (   )
Department of Statistics http://www.stat.uiowa.edu\  \_ University
 and Actuarial Science   office 319-335-0819   \   *   \ of Iowa
The University of Iowa   dept.  319-335-0706\  / Hawkeyes
Iowa City, IA   52242FAX319-335-3017 | )
- V



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: normal distribution table online for download??

2000-07-05 Thread Jon Cryer

If you think you need more precision than given in the
usual tables or with a caculator, think again. You are
probably fooling yourself since no distribution in the real
world is _exactly_ normal.

Jon Cryer

At 03:55 PM 7/5/00 GMT, you wrote:
Trying to use in finacial calcs.  Hardcosed one to four decimals.  Prefer
more
precision.Thanks.  [EMAIL PROTECTED]


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===


   _
- | \
Jon Cryer[EMAIL PROTECTED]   (   )
Department of Statistics http://www.stat.uiowa.edu\  \_ University
 and Actuarial Science   office 319-335-0819   \   *   \ of Iowa
The University of Iowa   dept.  319-335-0706\  / Hawkeyes
Iowa City, IA   52242FAX319-335-3017 | )
- V



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Density Function in Minitab

2000-07-06 Thread Jon Cryer

Olympio:

I used the Minitab menus to produce the following code
and graph the standard normal density. To do other densities
you need to change the range of values appropriately and 
change the density calculated and stored.

Hope this helps.

Jon Cryer

MTB  Name c1 = 'z'
MTB  Set 'z'
DATA   1( -4 : 4 / .01 )1 # Could shorten to -4:4/.01  Change for other
densities
DATA   End.
MTB  Name c2 = 'Density'
MTB  PDF 'z' 'Density';   # change for other densities
SUBC   Normal 0.0 1.0.# change for other densities
MTB  Plot 'Density'*'z';
SUBC   Connect;   # Connect as a smooth curve
SUBC   ScFrame;   # not needed
SUBC   ScAnnotation;  # not needed
SUBC   Reference 2 0. # added to put a nice base on the curve


At 04:47 AM 7/6/00 GMT, you wrote:
Friends:

How Minitab can show the Density Function of a variable? Can the
program calculate this one and show the formula?

Thanks
Olympio


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===


 density.jpg


Re: urgent problem (statistics for management)

2000-12-13 Thread Jon Cryer

This is quite a silly problem. No wonder statistics (for business)
gets so little respect. This is time series or process data--not a random
sample
from some fixed population. There is no information about the stability
of the process over time. Very few business processes are stable over five
years.
Why can't we teach meaningful statistics?

Jon Cryer

At 05:14 PM 12/13/00 +0100, you wrote:
I have some difficulties with following problem
(I need the solution urgently for tomorrow):

Production levels for Giles Fashion vary greatly according to consumer
acceptance of the latest styles. Therefore, the company's
weekly orders of wool cloth are difficult
to predict in advance. On the basis of 5 years data, the following
probability distribution for the company's weekly demand for wool
has been computed:

Amount of wool (lb) Probability
25000.30
35000.45
45000.20
55000.05

From these data, the raw-materials purchaser computed the
expected number of pounds required. Recently, she noticed
that the company's sales were lower in the last year than in years
before.
Extrapolating, she observed that the company will be lucky
if its weekly demand averages 2,500 this year.

(a) What was the expected weekly demand for wool based
on the distribution from past data?

(b) If each pound of wool generates $5 in revenue and costs $4 to
purchase, ship, and handle, how much would Giles Fashion stand
to gain or lose each week if it orders wool based on the past
expected value and company's demand is only 2,500?

(End of the text of the problem.)

Possible solution (in my opinion):

I.
(a) I fink is obvious: If X means company's weekly demand for wool
(lb), then the expected weekly demand for wool based  on the
distribution from past data =E(X) =
0.3*2500+0.45*3500+0.20*4500+0.05*5500=
= 3500. Am I right?

(b)
Actually I am not sure what company's weekly demand for
wool in the past data (table of probability distr.) means.
It is the amount of wool which company bought weekly
or is the amount of wool which company sold (in it's products)
weekly?
The last sentence make difference between
company's orders (it orders wool based...) and company's demand
( and company's demand is only 2,500)
(I think but I am not sure, it's actually company's weekly demand for
wool).
So In my opinion company's weekly demand for wool means:
the amount of wool which company sold (in it's products) weekly?
Am I right?

I am not sure what the last sentence means.
Does it mean that the company orders weekly
3500 lb of wool ( it orders wool based on the past
expected value and  the past expected value = 3500 from (a))
and it sells weekly 2500 lb in their products
(and company's demand is only 2,500)?
 If so the solution seems to be:
The company should expect to gain weekly: 2500*1$-1000*4$=-1500$
so in fact it should expect to lose weekly 1500$.
--

Am I right?

Maybe I should consider that the company's weekly demand
is 2500 lb but it orders are:

Amount of wool (lb) Probability
25000.30
35000.45
45000.20
55000.05

(Loss | Orders=2500 )   0$  -1500$  ...
probability 0.30 0.45

E(Loss | Orders=2500 ) = 0*0.3+(-1500)*0.45+ ...


Please somebody correct me if I am wrong.

Jan



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=


 ___
--- |   \
Jon Cryer, Professor [EMAIL PROTECTED]   ( )
Dept. of Statistics  www.stat.uiowa.edu/~jcryer \\_University
 and Actuarial Science   office 319-335-0819 \ *   \of Iowa
The University of Iowa   dept.  319-335-0706  \/Hawkeyes
Iowa City, IA 52242  FAX319-335-3017   |__ )
---   V



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Number of classes.

2001-01-11 Thread Jon Cryer

I asked Minitab support how they did it. Here is their answer:

Date: Fri, 26 Sep 1997 15:07:50 -0400
To: [EMAIL PROTECTED]
From: Tech Support 
Subject: number of bars in MINITAB histogram

Jonathan,

I finally found an answer for you.  Here's the algorithm.

There are upper and lower bounds on the number of bars.

Lower bound = Round( (16.0*N)**(1.0/3.0) + 0.5 )
Upper bound = Lower bound + Round(0.5*N)

After you find the bounds, MINITAB will always try to get as close to the
lower bound as it can.

Then we have a "nice numbers" algorithm that finds interval midpoints,
given the constraints on the number of intervals.

But there is special code for date/time data and for highly granular data
(e.g., all 1's and 2's).

Find the largest integer p such that each data value can be written (within
fuzz) as an integer times 10**p.

Let BinWidth = 10**p.

Let BinCount =  1 + Round( ( range of data ) / BinWidth )

If BinCount  is = 10, then let the bin midpoints run from the data min to
the data max in increments of BinWidth.

Otherwise, use the "nice numbers" algorithm.

Hope this helps.

Andy Haines
Minitab, Inc.

At 11:01 PM 1/4/01 -0500, you wrote:
To determine the number of classes for a histogram, Excel uses square root
of the number of observations. Is it also true for the number of
observations greater than 200, say, for 2000?. Does the MINITAB use the same
for determining the number of classes for a histogram?
Any help would be appreciated.





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=


 ___
------- |   \
Jon Cryer, Professor [EMAIL PROTECTED]   ( )
Dept. of Statistics  www.stat.uiowa.edu/~jcryer \\_University
 and Actuarial Science   office 319-335-0819 \ *   \of Iowa
The University of Iowa   dept.  319-335-0706  \/Hawkeyes
Iowa City, IA 52242  FAX319-335-3017   |__ )
---   V

"It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so." --Artemus Ward 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Excel Graphics

2001-01-29 Thread Jon Cryer

The absolute best advice concerning the use of Excel for
graphics (or for statistics for that matter) is: DON'T!

The _majority_ of graph-types available in Excel should never
be used for any purpose as they produce misleading graphs -- mainly
false third dimensions that can only serve to hide important features
in the graph.)

Jon Cryer

At 02:26 PM 1/27/01 GMT, you wrote:
Not sure if this is the best place to ask, but can anyone point me 
towards any web sites that provide advice on using Excel for 
technical/scientific graphing.

I am not sure why exactly, but I find the graphs produced by Excel, 
compared to S-Plus or Statistica, to look out of place in a technical 
report. As I know others feel the same way, I was hoping that there 
might be some advice out there on how to improve their appearance.

Many thanks,

Graham S

.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Student's t vs. z tests

2001-04-20 Thread Jon Cryer

Alan:

Could you please give us an example of such a situation?

"Consider first a set of measurements taken with
a measuring instrument whose sampling errors have a known standard
deviation (and approximately normal distribution)."

Jon

At 01:10 PM 4/20/01 -0400, you wrote:
(This note is largely in support of points made by Rich Ulrich and
Paul Swank.)

I disagree with the claim (expressed in several recent postings) that
z-tests are in general superseded by t-tests.  The t-test (in simple
one-sample problems) is developed under the assumption that independent
observations are drawn from a normal distribution (and hence the mean and
sample SD are independent and have specific distributional forms).
It is widely applicable because it is fairly robust against violations
of this assumptions.

However, there are also situations in which the t-test is clearly 
inferior to a z-test.  Consider first a set of measurements taken with
a measuring instrument whose sampling errors have a known standard
deviation (and approximately normal distribution).  In this case, with
a few observations (let's say 1 or 2, if you want to make it very clear),
the z-based procedure that uses the known SD will give much more useful
tests or intervals than a t-based procedure (which estimates the SD from
the data at hand).

snip
   Alan Zaslavsky
   Harvard Med School



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Student's t vs. z tests

2001-04-20 Thread Jon Cryer

Alan:

I don't understand your comments about the estimation of a proportion.
It sounds to me as if you are using the estimated standard error. (Surely
you are not assuming a known standard error.) You are presumably, also
using the normal approximation to the binomial (or perhaps the
hypergeometric.)
To do so requires a "large" sample size in which case it doesn't matter
whether
you use the normal or t distribution. Both would be acceptable approximations.
(and both would be approximations.) So what is your point?

Once more I think you need to separate the issues of what statistic to use
and what distribution to use.

Jon

At 01:10 PM 4/20/01 -0400, you wrote:
(This note is largely in support of points made by Rich Ulrich and
Paul Swank.)

snip

Now consider estimation of a proportion.  Using the information that the
data consist only of 0's and 1's, and an approximate value of the
proportion, we can calculate an approximate standard error more
accurately (for p near 1/2) than we could without this information.  The
interval based on the usual variance formula p(1-p) and the z
distribution is therefore better than the one based on the t
distribution.  This is why (as Paul pointed out) everybody uses z
tests in comparing proportions, not t tests.  The same applies to
generalizations of tests of proportions as in logistic regression.

snip

   Alan Zaslavsky
   Harvard Med School



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Student's t vs. z tests

2001-04-23 Thread Jon Cryer

These examples come the closest I have seen to having a known variance.
However, often measuring instruments, such as micrometers, quote their
accuracy as a percentage of the size of the measurement. Thus, if you
don't know the mean you also don't know the variance.

Jon Cryer

At 09:28 AM 4/23/01 -0400, you wrote:
 Date: Fri, 20 Apr 2001 13:02:57 -0500
 From: Jon Cryer [EMAIL PROTECTED]
 
 Could you please give us an example of such a situation?
 
 Consider first a set of measurements taken with
 a measuring instrument whose sampling errors have a known standard
 deviation (and approximately normal distribution).

Sure.  Suppose we use an instrument such as a micrometer, electronic
balance or ohmmeter to measure a series of similar items.  (For
concreteness, suppose they are components coming off a mass production
machine such as a screw machine.)  As long as the measuring instrument
isn't broken, we don't have to conduct an extensive series of repeated
measurements every time we use it to determine its error variance with a
part of the given conformation.  Normality is also reasonably likely under
those circumstances.

Slightly more sophisticated version of the same: Supposed the operating
characteristics of such a machine can be characterized by slow drift (due
to tool wear, heat expansion of machine parts, settings that gradually
shift, etc.) plus independent random noise that is approximately normal.
It is plausible in that setting that the variance of measurements on a
short series of parts would be fairly constant.  (I'm not just making
this up; it's consistent with my own experience in my former career as a
machinist.)  Again, you don't have to calibrate the error variance of the
measurement (in this case, average measurement of several successive
parts to estimate the current system mean) every time you do it.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Presenting results of categorical data?

2001-08-15 Thread Jon Cryer

I do not see how (probabilistic) inference is appropriate here at all.
I assume that _all_ employees are rated. There is no sampling, random
or otherwise.

Jon Cryer

At 11:14 AM 8/15/01 -0300, you wrote:


Silvert, Henry wrote:
 
 I would like to add that with this kind of data [three-level ordinal] 
 we use the median instead of the average.

   Might I suggest that *neither* is appropriate for most purposes?  In
many ways, three-level ordinal data is like dichotomous data - though
there are a couple critical differences.

   Nobody would use the median (which essentially coincides with the
mode) for dichotomous data unless thay had a very specific reason for
wanting that specific bit of information (and I use the word bit in
its technical sense.)  By contrast, the mean (=proportion) is a lossless
summary of the data up to permutation (and hence a sufficient statistic
for any inference that assumes an IID model) - about as good as you can
get.  

  With three levels, the mean is of course hopelessly uninterpretable
without some way to establish the relative distances between the levels.
However, the median is still almost information-free (total calorie
content per 100-gram serving = log_2(3)  2 bits).  I would suggest
that unless there is an extremely good reason to summarize the data as
ONE number, three-level ordinal data should be presented as a frequency
table. Technically one row could be omitted but there is no strong
reason to do so. 

   What about inference?  Well, one could create various nice
modifications on a confidence interval; most informative might be a
confidence (or likelihood) region within a homogeneous triangle plot,
but a double confidence interval for the two cutoff points would be
easier. As for testing - first decide what your question is. If it *is*
really are the employees in state X better than those in state Y? you
must then decide what you mean by better. *Do* you give any weight to
the number of exceeded expectations responses?  Do you find 30-40-30
to be better than 20-60-20, equal, or worse? What about 20-50-30?  If
you can answer all questions of this type, by the way, you may be ready
to establish a scale to convert your data to ratio. If you can't, you
will have to forego your hopes of One Big Hypothesis Test.  

   I do realize that we have a cultural belief in total ordering and
single parameters, and we tend to take things like stock-market and
cost-of-living indices, championships and MVP awards, and quality- of-
living indices, more seriously than we should. We tend to prefer events
not to end in draws; sports that can end in a draw tend to have
(sometimes rather silly) tiebreaking mechanisms added to them. Even in
sports (chess, boxing) in which the outcomes of (one-on-one) events are
known to be sometimes intransitive, we insist on finding a champion. 
But perhaps the statistical community ought to take the lead in opposing
this bad habit!

   To say that 75% of all respondents ranked Ohio employees as having
'Met Expectations' or 'Exceeded Expectations.' , as a single measure,
is not a great deal better than taking the mean in terms of information
content *or* arbitrariness. Pooling  two levels and taking the
proportion is just taking the mean with a 0-1-1 coding.  It says, in
effect, that we will consider 

   (Exceed - Meet)/(Meet - Fail) = 0 

while taking the mean with a 0-1-2 coding says that we will consider 

   (Exceed - Meet)/(Meet - Fail) = 1.

One is no less arbitrary than the other. (An amusing analogy can be
drawn with regression, when users of OLS regression, implicitly assuming
all the variation to be in the dependent variable, sometimes criticise
the users of neutral regression for being arbitrary in assuming the
variance to be equally divided.)

   -Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

 ___
--- |   \
Jon Cryer, Professor Emeritus  ( )
Dept. of Statistics  www.stat.uiowa.edu/~jcryer \\_University
 and Actuarial Science   office 319-335-0819 \ *   \of Iowa
The University of Iowa   home   319-351-4639  \/Hawkeyes
Iowa City, IA 52242  FAX319-335-3017   |__ )
---   V

It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so. --Artemus Ward 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http

Re: Free program to generate random samples

2001-09-21 Thread Jon Cryer

I wouldn't call bootstrapping sampling from a population.
Would you?

Jon Cryer

At 06:03 PM 9/21/01 GMT, you wrote:
Jon Cryer wrote:
 
 But it would be bad statistics to sample with replacement.

Whew!  saves me from having to learn about all that bootstrap
stuff!  :-)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Free program to generate random samples

2001-09-21 Thread Jon Cryer
But it would be bad statistics to sample with replacement.

Jon Cryer

At 08:35 AM 9/21/01 -0300, you wrote:
>"@Home" wrote:
>> >
>> > Is there any downloadable freeware that can generate let's say 2000 random
>> > samples of size n=100 from a population of 100 numbers.
>> >
>> 
>and Randy Poe responded:
>> Um.
>> 
>> A sample of 100 from a population of 100 is going to
>> give you the entire population.
>
>	Depends whether you sample with or without replacement.
>
>	-Robert Dawson
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>  http://jse.stat.ncsu.edu/
>=
>

= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = 

Re: how to compare generated values with the specified distribution basis

2001-09-20 Thread Jon Cryer

Robert:


even when N=20,  a uniform distribution can be treated as
normal for most purposes.

I assume you meant to say that for N=20, the sample mean based on a random
sample from a uniform distribution can be assumed to have a normal
distribution
for most purposes.

Right?

Jon Cryer

At 01:16 PM 9/20/01 -0300, you wrote:


JHWB wrote:
 
 Hm, hope I didn't make that subject to complex, resulting in zero replies.
 But hopefully you can answer this:
 
snip

   The gotcha is that while these may be roughly equivalent questions for
(say) N=20, for N small deviations from normality are important and the
test is poor at detecting them; for N large, deviations from normality
do not matter very much but the test is hypersensitive.

   For instance: even when N=20,  a uniform distribution can be treated as
normal for most purposes. However, it will generally fail the
Ryan-Joiner test at a 5% level!

   -Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

 ___
--- |   \
Jon Cryer, Professor Emeritus  ( )
Dept. of Statistics  www.stat.uiowa.edu/~jcryer \\_University
 and Actuarial Science   office 319-335-0819 \ *   \of Iowa
The University of Iowa   home   319-351-4639  \/Hawkeyes
Iowa City, IA 52242  FAX319-335-3017   |__ )
---   V

It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so. --Artemus Ward 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: What is a confidence interval?

2001-09-26 Thread Jon Cryer

Dennis:

Example A is a mistaken interpretation of a confidence interval for a mean.
Unfortunately, this is is a very common misinterpretation.
What you have described in Example A is a _prediction_ interval for
an individual observation. Prediction intervals rarely get taught except
(maybe)
in the context of a regression model.

Jon

At 03:11 PM 9/26/01 -0400, you wrote:
as a start, you could relate everyday examples where the notion of CI seems 
to make sense

A. you observe a friend in terms of his/her lateness when planning to meet 
you somewhere ... over time, you take 'samples' of late values ... in a 
sense you have means ... and then you form a rubric like ... for sam ... if 
we plan on meeting at noon ... you can expect him at noon + or - 10 minutes 
... you won't always be right but, maybe about 95% of the time you will?

B. from real estate ads in a community, looking at sunday newspapers, you 
find that several samples of average house prices for a 3 bedroom, 2 bath 
place are certain values ... so, again, this is like have a bunch of means 
... then, if someone asks you (visitor) about average prices of a bedroom, 
2 bath house ... you might say ... 134,000 +/- 21,000 ... of course, you 
won't always be right but  perhaps about 95% of the time?

but, more specifically, there are a number of things you can do

1. students certainly have to know something about sampling error ... and 
the notion of a sampling distribution

2. they have to realize that when taking a sample, say using the sample 
mean, that the mean they get could fall anywhere within that sampling 
distribution

3. if we know something about #1 AND, we have a sample mean ... then, #1 
sets sort of a limit on how far away the truth can be GIVEN that sample 
mean or statistic ...

4. thus, we use the statistics (ie, sample mean) and add and subtract some 
error (based on #1) ... in such a way that we will be correct (in saying 
that the parameter will fall within the CI) some % of the time ... say, 95%?

it is easy to show this via simulation ... minitab for example can help you 
do this

here is an example ... let's say we are taking samples of size 100 from a 
population of SAT M scores ... where we assume the mu is 500 and sigma is 
100 ... i will take a 1000 SRS samples ... and summarize the results of 
building 100 CIs

MTB  rand 1000 c1-c100;  made 1000 rows ... and 100 columns ... each 
ROW will be a sample
SUBC norm 500 100.  sampled from population with mu = 500 and sigma = 100
MTB  rmean c1-c100 c101  got means for 1000 samples and put in c101
MTB  name c1='sampmean'
MTB  let c102=c101-2*10   found lower point of 95% CI
MTB  let c103=c101+2*10   found upper point of 95% CI
MTB  name c102='lowerpt' c103='upperpt'
MTB  let c104=(c102 lt 500) and (c103 gt 500)   this evaluates if the 
intervals capture 500 or not
MTB  sum c104

Sum of C104

Sum of C104 = 954.00954 of the 1000 intervals captured 500
MTB  let k1=954/1000
MTB  prin k1

Data Display

K10.954000   pretty close to 95%
MTB  prin c102 c103 c104   a few of the 1000 intervals are shown below

Data Display


  Row   lowerpt   upperpt   C104

1   477.365   517.365  1
2   500.448   540.448  0   here is one that missed 500 ...the 
other 9 captured 500
3   480.304   520.304  1
4   480.457   520.457  1
5   485.006   525.006  1
6   479.585   519.585  1
7   480.382   520.382  1
8   481.189   521.189  1
9   486.166   526.166  1
   10   494.388   534.388  1





_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

 ___
--- |   \
Jon Cryer, Professor Emeritus  ( )
Dept. of Statistics  www.stat.uiowa.edu/~jcryer \\_University
 and Actuarial Science   office 319-335-0819 \ *   \of Iowa
The University of Iowa   home   319-351-4639  \/Hawkeyes
Iowa City, IA 52242  FAX319-335-3017   |__ )
---   V

It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so. --Artemus Ward 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

But then you should use a binomial (or hypergeometric)
distribution.
Jon Cryer
p.s. Of course, you might approximate
by an appropriate normal distribution.
At 11:39 AM 12/10/01 -0400, you wrote:
Dennis Roberts wrote:
 this is pure speculation ... i have yet to hear of any convincing
case
 where the variance is known but, the mean is not
What about that other application used so prominently in texts of
business statistics, testing for a proportion?
=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

http://jse.stat.ncsu.edu/
=



Jon Cryer, Professor Emeritus
Dept. of Statistics
www.stat.uiowa.edu/~jcryer

and Actuarial Science office 319-335-0819
The University of Iowa home 319-351-4639
Iowa City, IA 52242 FAX 319-335-3017 
It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so. --Artemus Ward 


Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

I always thought that the precision of a scale was
proportional
to the amount weighed. So don't you have to know the mean
before you
know the standard deviation? But wait a minute - we are trying
assess
the size of the mean!
Jon Cryer
At 03:42 PM 12/10/01 +, you wrote:
Dennis Roberts wrote:
 this is pure speculation ... i have yet to hear of any convincing
case
 where the variance is known but, the mean is not
A scale (weighing device) with known precision.

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

http://jse.stat.ncsu.edu/
=



Jon Cryer, Professor Emeritus
Dept. of Statistics
www.stat.uiowa.edu/~jcryer

and Actuarial Science office 319-335-0819
The University of Iowa home 319-351-4639
Iowa City, IA 52242 FAX 319-335-3017 
It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so. --Artemus Ward 


Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

Only as an approximation.

At 12:57 PM 12/10/01 -0400, you wrote:
Art Kendall wrote:

(putting below the previous quotes for readability)

  Gus Gassmann wrote:
 
   Dennis Roberts wrote:
  
this is pure speculation ... i have yet to hear of any convincing case
where the variance is known but, the mean is not
  
   What about that other application used so prominently in texts of
   business statistics, testing for a proportion?

  the sample mean of the dichotomous (one_zero, dummy) variable is known, It
  is the proportion.

Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
population variance. So if the CLT applies, you should use a z-table, no?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: Excel2000- the same errors in stat. computations and graphics

2002-01-05 Thread Jon Cryer

David:
I have certainly never said nor implied that Excel
cannot produce reasonably
good graphics. My concern is that it makes it so easy to produce
poor
graphics. The defaults are absurd and should never be used. It
seems to me that
defaults should produce at least something useful. The default graphs are
certainly not good
business graphs if the intent is to produce good visual display of
quantitative information!
Isn't that what graphs are for?
In business applications, accuracy is not that important, except
when money is involved.
Huh?
Jon
At 09:39 PM 1/4/2002 -0800, you wrote:
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On
Behalf Of Shareef Siddeek
Sent: Friday, January 04, 2002 1:22 PM
To: [EMAIL PROTECTED]
Subject: Excel2000- the same errors in stat. computations and
graphics
Happy new year to all.
I frequently use Excel2000 for graphic presentation, spreadsheet
maths,
simple nonlinear model fitting (using the Excel solver) with one or
two
parameters, and simulations. I thought Excel2000 corrected those
errors
found in the analysis tool pack and other in-built computational
procedures in the older 97 version. However, following articles
point
out that the developers have done nothing to corrected those errors.
I
would like your comments on this. Thanks. Siddeek
---
1. I appreciate receiving your note and the URLs.
2. One really can't effectively use EXCEL without having to make the
effort
of learning it from the books. Some of the complaints from Cryer have to
do
with the fact that he never learned how to build charts in EXCEL.
This
includes chart layouts, legends, scales, axis, labels, etc. One can use
the
drawing overlay features to build up text on the charts. I always
recommend
spending time reading the big commercial manuals available on EXCEL 2000.
I
have several. EXCEL HELP is lousy for finding the information you
really
need.
3. The EXCEL stat package was an add-on developer package by
GreyMatter
International Inc, Cambridge, MA. back in the early 90's. Microsoft did
not
write it. Being familiar with developers, the people writing the
software
have to be familiar with an enormous lexicon of object links and
protocols.
Stat is not one of the courses toward a degree in computer science.
Consequently much of the formula building comes from a convenient
textbook.
I really am surprised at the developers/programmers out there that have
no
knowledge of basic math, or how time works (calendar-time linkage). Much
of
the problem has to do with the assumption that software built-in
functions
work as the programmer thinks they work, not how they actually work. It
is
obvious that Bill Gates has no interest in fixing EXCEL accuracy, only
in
it's appearance and ability to fit in as a part of larger program
packages.
His only interest now is .NET and the ability to pull off company data
in
spreadsheet format using the internet as the company's internal
network.
3. There is a problem with EXCEL histograms. This has been commented on
in
previous edstat e-mails. In general EXCEL produces simple graphs,
primarily
for business purposes. It does not produce good scientific graphics. All
it
does is get you a quick graph with a minimum of effort.
4. Part of the inaccuracy problem has to do with the fact that each
EXCEL
cell by default is treated as a variant variable. Unless you format all
the
numerical cells properly (as decimal or integer), you are likely to
have
problems. I Always format all my cells properly, declaring the type of
cell
contents. If for example you are to precede a number by a space, EXCEL
may
interpret the number as text. By use of the variant, empty cells can
be
handled, and not cause computational halts.
5. The primary use of EXCEL is in business, doing the type of
calculations
and reports described in Microsoft EXCEL User's Guide. In business
applications, accuracy is not that important, except when money is
involved.
If for example if McCullough were to declare his numbers as currency
instead
of variant, his accuracy would probably improve. Considering the type
of
business applications for stat (for example see The Complete
Idiot's Guide
to Business Statistics) what EXCEL does is fine. From what I have
observed,
many business type have a very limited math background, and even
learning
simple business stat is a major problem. For example try getting them
to
understand the difference between using z and t tests, and to
understand
confidence intervals. Business people expect the computer to give them
a
number. The statement by McCullough that ..it is important for the
package
to determine whether the answer is likely to be corrupted by
cumulated
rounding errors as to be worthless and if so, not to display the
answer.
This policy is not acceptable to business types, and this is one of
the
ongoing problems on the nets. They would rather get a wrong number,
then
none. In most cases, the computed 

Re: Student's t vs. z tests

2001-04-19 Thread Jon Cryer

Why not introduce hypothesis testing in a binomial setting where there are
no nuisance parameters and p-values, power, alpha, beta,... may be obtained
easily and exactly from the Binomial distribution?

Jon Cryer

At 01:48 AM 4/20/01 -0400, you wrote:
At 11:47 AM 4/19/01 -0500, Christopher J. Mecklin wrote:
As a reply to Dennis' comments:

If we deleted the z-test and went right to t-test, I believe that 
students' understanding of p-value would be even worse...


i don't follow the logic here ... are you saying that instead of their 
understanding being "bad"  it will be worse? if so, not sure that this 
is a decrement other than trivial

what makes using a normal model ... and say zs of +/- 1.96 ... any "more 
meaningful" to understand p values ... ? is it that they only learn ONE 
critical value? and that is simpler to keep neatly arranged in their mind?

as i see it, until we talk to students about the normal distribution ... 
being some probability distribution where, you can find subpart areas at 
various baseline values and out (or inbetween) ... there is nothing 
inherently sensible about a normal distribution either ... and certainly i 
don't see anything that makes this discussion based on a normal 
distribution more inherently understandable than using a probability 
distribution based on t ... you still have to look for subpart areas ... 
beyond some baseline values ... or between baseline values ...

since t distributions and unit normal distributions look very similar ... 
except when df is really small (and even there, they LOOK the same it is 
just that ts are somewhat wider) ... seems like whatever applies to one ... 
for good or for bad ... applies about the same for the other ...

i would be appreciative of ANY good logical argument or empirical data that 
suggests that if we use unit normal distributions  and z values ... z 
intervals and z tests ... to INTRODUCE the notions of confidence intervals 
and/or simple hypothesis testing ... that students somehow UNDERSTAND these 
notions better ...

i contend that we have no evidence of this ... it is just something that we 
think ... and thus we do it that way



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=


 ___
------- |   \
Jon Cryer, Professor [EMAIL PROTECTED]   ( )
Dept. of Statistics  www.stat.uiowa.edu/~jcryer \\_University
 and Actuarial Science   office 319-335-0819 \ *   \of Iowa
The University of Iowa   dept.  319-335-0706  \/Hawkeyes
Iowa City, IA 52242  FAX319-335-3017   |__ )
---   V

"It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so." --Artemus Ward 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=