Re: hyp testing -Reply

2000-04-20 Thread Thom Baguley

Robert Dawson wrote:
 As far as random samples are concerned: it is *very* rare for a true
 random sample, based on an equal-probability sample of the population to
 which the inference is intended to extend, to be taken.  Say a researcher is
 studying the behaviour of humans. (S)he may take a random sample from the
 student subject pool, but not from the human race; and yet the paper
 published will claim to be about "Artificially Inducing The Gag Reflex in
 Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in
 Psych 1000 at Miskatonic U. (Fall '00)". Even if some future world
 government were to allow researchers access to a list of all humans alive at
 some moment to use as a sampling frame, most researchers would not disclaim
 any applicability of their research to those dead or not yet born. The
 implicit "Platonic" population larger than that available for study is a
 problem that is always with us; a bad sample is one in which this causes
 bias.  The situation in which the entire actual population is available for
 study is an extreme case, of course.

I don't think the problem is as severe as you imply. Scientific hypotheses are
about infinite populations, because scientists draw inferences about
processes, theories and so. The paleontologist example is interesting, because
it is obviously true that there is something about those 20 individuals as a
group which disposes them to drive certain cars (price, salary, whatever).
However, the (more) interesting claim is that being a paleontologist makes you
drive a certain kind of car. This claim embraces Fred (presently a window
cleaner) who becomes a paleontologist (after night school) and suddenly
purchases a new car. The population is effectively infinite if you want to
embrace paleontologist last year, next year etc.

A true random sample is rarely possible and may not be a random sample of the
population for which you wish to generalize to. However, generalization does
not rest soley on statistics. In fact statistical generalization is necessary,
but less important than generalization with respect to theory in most
sciences. If we know about (i.e. have useful theories of) lung (or brain, or
...) function and development then we can generalize from one sample with
lungs or brains to another sample with lungs (or brains, or ...) more
powerfully than through statistics alone. Many of the problems with
traditional statistics are really problems of weak theory or weak experimental
design. Hypothesis testing can't solve these, but neither can any other
statistical method. (Indeed some alternatives to hypothesis testing may be
more susceptible to these problems. For example, effect size calculation, meta
analysis etc. may place more emphasis on strong theory. This can be good if it
forces a researcher back to theory, but I can see little evidence of this, so far.)

Thom


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-20 Thread Jon Cryer

I thought everone knew there was a difference in Anatomy between male
and female professors! ;)

At 12:19 PM 4/20/00 +0100, you wrote:
dennis roberts wrote:
 
 At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:
 
  There's a chapter in J. Utts' mostly wonderful but flawed low-math
intro
 text "Seeing Through Statistics", in which she does much the same. She
 presents a case study based on some of her own work in which she looked at
 the question of gender discrimination in pay at her own university, and
 fails to reject the null hypothesis [no systemic difference in pay between
 male and female faculty]. She heads the example "Important, but not
 significant, differences in salaries"; comments (_perhaps_ technically
 correctly but misleadingly) that "a statistically naive reader could
 conclude that there is no problem" and in closing states:
 
 the flaw here is that ... she has population data i presume ... or about as
 close as one can come to it ... within the institution ... via the budget
 or comptroller's office ... THE salary data are known ... so, whatever
 differences are found ... DEMS are it!
 
 the notion of statistical significance in this case seems IRRELEVANT ...
 the real issue is ... given that there are a variety of factors that might
 account for such differences (numbers in ranks, time in ranks, etc. etc.)
  is the remaining difference (if there is one) IMPORTANT TO DEAL
WITH ...

Yes! This reminds me of a newspaper article and radio news item in the UK
this
year about female and male professors. They had data to show that there was a
large salary difference. However, they went on to say that the largest
difference was in Anatomy. I mentioned this to a female colleague of mine
(who
works in that area) who pointed out there was only one female professor of
Anatomy in the UK.

Thom


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===


   _
- | \
Jon Cryer[EMAIL PROTECTED]   (   )
Department of Statistics http://www.stat.uiowa.edu\  \_ University
 and Actuarial Science   office 319-335-0819   \   *   \ of Iowa
The University of Iowa   dept.  319-335-0706\  / Hawkeyes
Iowa City, IA   52242FAX319-335-3017 | )
- V



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-18 Thread Robert Dawson

Joe Ward wrote:

Yes, there occasionally were discussions in our Air Force research
whether or not we were working with the POPULATION or a SAMPLE.

As Dennis comments:
|
|  the flaw here is that ... she has population data i presume ... or about
| as
|  close as one can come to it ... within the institution ... via the
budget
|  or comptroller's office ... THE salary data are known ... so, whatever
|  differences are found ... DEMS are it!
| 

One of my Professors used to use the Invertebrate Paleontologists as his
example of a POPULATION.  I think at that time there were less than 20
people who were Invertebrate Paleontologists.


OK. Now, suppose that you knew them all, and noticed that ten of them
drove convertibles. You would probably make some generalization about
invertebrate paleontologists, consider that this was a genuine phenomenon,
and assume that if one more invertebrate paleontologist *did* turn up, it
might well be in a convertible. [Maybe convertibles are easier than sedans
to get into if you're invertebrate? grin]

Suppose there were also exactly two extraterrestrial paleontologists in
the world, and one of them drove a convertible. You would be less likely to
think in the same way.

Now, if you discovered that around 50% of the vertebrate paleontologists
in the world drove convertibles, you would consider that you had ironclad
proof that something was going on.

I suggest that even if these groups are not true random samples (and
they are not - more on that later) that the informal inferential process
described has much in common with formal statistical inference. And, if it
walks like a duck and quacks like a duck, it makes some sense to cook it
like a duck. (Similarly, if you were to toss a coin and cover it unseen, and
offer a frequentist various odds that it had landed heads, most frequentists
would put their cutoff betweeen accepting and rejecting the wager at odds
corresponding to a 50% probability, even if they refused to admit that that
was the probability that the coin was heads-up.) There are obvious problems
with the sampling technique - though probably less than if a convenience
sample of (say) the most accessible half the population had been taken.

As far as random samples are concerned: it is *very* rare for a true
random sample, based on an equal-probability sample of the population to
which the inference is intended to extend, to be taken.  Say a researcher is
studying the behaviour of humans. (S)he may take a random sample from the
student subject pool, but not from the human race; and yet the paper
published will claim to be about "Artificially Inducing The Gag Reflex in
Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in
Psych 1000 at Miskatonic U. (Fall '00)". Even if some future world
government were to allow researchers access to a list of all humans alive at
some moment to use as a sampling frame, most researchers would not disclaim
any applicability of their research to those dead or not yet born. The
implicit "Platonic" population larger than that available for study is a
problem that is always with us; a bad sample is one in which this causes
bias.  The situation in which the entire actual population is available for
study is an extreme case, of course.

-Robert Dawson








===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-18 Thread Rich Ulrich

On Mon, 17 Apr 2000 20:07:56 GMT, Charles D Madewell
[EMAIL PROTECTED] wrote:

 As a working engineer and part time graduate student I do not even
 understand why anyone would want to do away with hypothesis testing.
 I have spent many, many hours of my graduate school life learning,
 reading, calculating, and analyzing using hypothesis tests.
 Hypothesis testing is not bad.  It is errors in designing the
 experiment that are bad and this comes from PEOPLE not the math.  What
 is the fuss?  Are you guys telling me that all of this knowledge I am
 being taught will be worthless?  Come on, find something else to say
 

The training is fine and useful.

As training in pure logic, you can't lose by it.  

Most of the research problems can be expressed in terms of hypotheses;
the people who can't express those problems that way are muddled
thinkers, or are tackling problems that (so far) are too complex for
them.

Some other research problems are questions of estimation: 
 - How reliable is this rater?  You certainly want to be well above
the value of 0.  You might want to judge by the point estimator that
is above .80 (say), or you might want to see a Confidence interval
(90%? 75%? 50?) that is entirely above some stated  value like .70.  
 - Or, for huge samples, "significance" is obtained on every
interesting comparison, so the only useful results are the ones where
the effect size is greater than some target-amount.

Technically, there is not much difference between the two (hypothesis
vs estimation).  If a research team can't put the question in terms of
hypthesis testing, or tell you WHY it should not be put that way, that
is probably a good enough test of their logic and competence that you
can be safe in dismissing them.

I don't know how well they handle real data, but (a) Dennis has seemed
to fail this STANDARD, on certain hypothetical questions.  However, I
don't like those hypothetical questions, because it is too easy to
pretend that they are something else.  I think Dennis gets led off by
the hypothetical semantics.  (b) Robert Frick has published on
hypthesis testing, and some of his seems quite unrealistic and wrong
to me, too, especially in the description of two competing hypotheses.

It might not be the only way, or eventually it might not be the best
way, but one of the best organizing principles that we have -- right
now -- is that of framing questions as hypotheses.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-18 Thread dennis roberts

At 03:37 PM 4/18/00 -0400, Rich Ulrich wrote:

I don't know how well they handle real data, but (a) Dennis has seemed
to fail this STANDARD, on certain hypothetical questions.  However, I
don't like those hypothetical questions, because it is too easy to
pretend that they are something else.  I think Dennis gets led off by
the hypothetical semantics.


it's nice to know that i A)  have 'failed' a standard (it is not the first 
one) ... and if that is not enough ... B) and get led off by the 
hypothetical semantics

if nothing else, i have THOSE two things in life

the problem i see ... stated in simple terms without lots of semantical 
gobblygook ... is that we don't spend nearly enough time on thinking about 
the questions we want to explore ... as researchers (ask joe ward) ... BUT, 
we sure spend tons of time on learning inferential statistics ... so, the 
bias in the field is the tendency to think that inferential statistics ... 
and the logic behind it ... is THE way to knowledge ... but, it is not 
(though it helps). see below

in a way ... what we should do is to BAN ANY discussion of statistical 
analysis ... UNtil we have a good grasp on the issue at hand ... or, if you 
want to say it this way (if there is some deduction from some theoretical 
position) ... formed a sensible hypothesis ... and if this takes time ... 
or we have to revise it till we get something that is reasonable ... then 
we need to take the time.

then and ONLY then should we allow ourselves to ask: how can data analysis 
help me in this quest to the answer to the questions i have posed ... or, 
help me to sort out ways in which to test this deduction from the theory 
that i have made ...

so, to get this ball rolling along SOME line of inquiry ... let's pose the 
basic question:

if we had to opt one way OR the other (there is no middle groud) ... in our 
instruction related to statistics or analysis ... which way should we go: 
take a bayesian approach ... or, the way most have been doing it for seems 
like a zillion years? (and so no one thinks i have loaded the deck ... i 
don't really care which way we would good ... my only concern here is that 
IF we have to make a decision ... how would we decide that?)

this seems like a legitimate question to ask but, certainly, it would take 
a lot of PRE data collection work (if it ever came to that point) ... to 
focus in on subparts of this overall question ... and to try to define 
important issues that would have to be dealt with ... before one could ever 
be in a position to even conduct some 'study' about this  and attempt 
to arrive at some answer ...

so, i offer a challenge: let's rationally discuss this question (not that 
this is any better than many others that could be framed) ... and restrict 
our discussion to NON statistical matters ... and see if we could develop a 
plan that if implemented ... would help us answer the question of interest ...

if we can do that ... THEN let's see what might be an appropriate way or 
ways ... to handle any data that might come out of this exercise






===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-18 Thread Alan McLean

Spot on, Robert.
Alan



Robert Dawson wrote:

 Joe Ward wrote:

 Yes, there occasionally were discussions in our Air Force research
 whether or not we were working with the POPULATION or a SAMPLE.

 As Dennis comments:
 |
 |  the flaw here is that ... she has population data i presume ... or about
 | as
 |  close as one can come to it ... within the institution ... via the
 budget
 |  or comptroller's office ... THE salary data are known ... so, whatever
 |  differences are found ... DEMS are it!
 | 

 One of my Professors used to use the Invertebrate Paleontologists as his
 example of a POPULATION.  I think at that time there were less than 20
 people who were Invertebrate Paleontologists.

 OK. Now, suppose that you knew them all, and noticed that ten of them
 drove convertibles. You would probably make some generalization about
 invertebrate paleontologists, consider that this was a genuine phenomenon,
 and assume that if one more invertebrate paleontologist *did* turn up, it
 might well be in a convertible. [Maybe convertibles are easier than sedans
 to get into if you're invertebrate? grin]

 Suppose there were also exactly two extraterrestrial paleontologists in
 the world, and one of them drove a convertible. You would be less likely to
 think in the same way.

 Now, if you discovered that around 50% of the vertebrate paleontologists
 in the world drove convertibles, you would consider that you had ironclad
 proof that something was going on.

 I suggest that even if these groups are not true random samples (and
 they are not - more on that later) that the informal inferential process
 described has much in common with formal statistical inference. And, if it
 walks like a duck and quacks like a duck, it makes some sense to cook it
 like a duck. (Similarly, if you were to toss a coin and cover it unseen, and
 offer a frequentist various odds that it had landed heads, most frequentists
 would put their cutoff betweeen accepting and rejecting the wager at odds
 corresponding to a 50% probability, even if they refused to admit that that
 was the probability that the coin was heads-up.) There are obvious problems
 with the sampling technique - though probably less than if a convenience
 sample of (say) the most accessible half the population had been taken.

 As far as random samples are concerned: it is *very* rare for a true
 random sample, based on an equal-probability sample of the population to
 which the inference is intended to extend, to be taken.  Say a researcher is
 studying the behaviour of humans. (S)he may take a random sample from the
 student subject pool, but not from the human race; and yet the paper
 published will claim to be about "Artificially Inducing The Gag Reflex in
 Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in
 Psych 1000 at Miskatonic U. (Fall '00)". Even if some future world
 government were to allow researchers access to a list of all humans alive at
 some moment to use as a sampling frame, most researchers would not disclaim
 any applicability of their research to those dead or not yet born. The
 implicit "Platonic" population larger than that available for study is a
 problem that is always with us; a bad sample is one in which this causes
 bias.  The situation in which the entire actual population is available for
 study is an extreme case, of course.

 -Robert Dawson

 ===
 This list is open to everyone.  Occasionally, less thoughtful
 people send inappropriate messages.  Please DO NOT COMPLAIN TO
 THE POSTMASTER about these messages because the postmaster has no
 way of controlling them, and excessive complaints will result in
 termination of the list.

 For information about this list, including information about the
 problem of inappropriate messages and information about how to
 unsubscribe, please see the web page at
 http://jse.stat.ncsu.edu/
 ===

--
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-18 Thread Alan McLean

Hi Dennis,

Robert's observation is 'spot on' because it is the way things are, rather than the
way we would like to think things are. I (of course) agree that people writing
papers should have some sense of proportion the claims made in their papers.
Nevertheless, if you want to study the gag reflex in humans, you simply cannot take
a simple random sample of all humans, so you have to use a surrogate population of
some sort.

In fact I would claim that except for artificial (ie class room) examples, you
pretty well always have to use surrogate populations. This is of course
particularly true when the population is not well defined. In market research, for
example, when your population allegedly consists of 'customers of XYZ Store', but
equally well in pretty well any branch of research.

It has been my opinion for quite some time now that the uncertainties in
conclusions due to use of surrogate populations, plus those due to measurement (eg
uncertainty in interpretation of questions in a questionnaire), far exceed sampling
errors - probably even exceed nonsampling errors.

However, this is quite consistent with my observations about models. In applying
the results obtained from the surrogate population to the general 'true' population
of interest, we apply those results as a model. If the surrogate population was
well chosen (and the analysis well done) then the model is likely to be reasonably
appropriate. That is, it is likely to 'work'.

It cannot be emphasized too much that the statistical analysis - including the
definition of variables, the design, the collection of data and the analysis of the
data - is simply a part of a process of investigation. This part provides evidence
- to some extent objective, and hopefully objective - which will help the
researcher to argue his or her case. It is only part of the evidence. And that
evidence may be sufficiently strong to persuade the scientific community that the
researcher's argument is valid, or it may not.

Regards,
Alan

dennis roberts wrote:

 Robert Dawson wrote:
 
  As far as random samples are concerned: it is *very* rare for a true
  random sample, based on an equal-probability sample of the population to
  which the inference is intended to extend, to be taken.  Say a researcher is
  studying the behaviour of humans. (S)he may take a random sample from the
  student subject pool, but not from the human race; and yet the paper
  published will claim to be about "Artificially Inducing The Gag Reflex in
  Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in
  Psych 1000 at Miskatonic U. (Fall '00)".

 well, perhaps journal editors should INSIST that the author say very
 clearly ... that this only applies to students enrolled in psy 1000 at
 miskatonic u ... fall 1999 ... since that is what it is ...

 the only way we can get around this ... is to REPLICATE investigations and
 see if we can find comparable results across disparate subject pools ..
 but, unfortunately ... if you do like benton j underwood did many years
 ago: studies in the meaningfulness of learning 1 ... then 2 ... then ... 29
 ... your tenure would be 'on hold' ... you are not allowed to replicate 10
 times ... you must move onward and upward ...

 we would be MUCH better off ... reducing drastically the NUMBER of things
 we tried to be unique in (that no one else has done) ... and spend more
 time replicating work ... that is deemed to be MORE important ... in the
 long run ... our knowledge base would be better and stronger ... rather
 than relying on some p value to suggest that THIS has been researched and
 THE answer found ... now we should move on to something else ... just
 another weight place on the poor little p ... when its back is already
 crushing!

 the more i think about it ... the more i think our overall effort is
 misguided ... and this is but one reason why there is so much crappy
 research ... and believe me ... there is plenty to go around across all the
 disciplines

  Even if some future world
  government were to allow researchers access to a list of all humans alive at
  some moment to use as a sampling frame, most researchers would not disclaim
  any applicability of their research to those dead or not yet born. The
  implicit "Platonic" population larger than that available for study is a
  problem that is always with us; a bad sample is one in which this causes
  bias.  The situation in which the entire actual population is available for
  study is an extreme case, of course.

 i would suggest that inferential statistics .. as we know it ... is not
 robust to cruddy samples and if samples are cruddy ... what's the point in
 using some standard error that is BASED on the assumption that samples are
 NOT cruddy ... but rather, have some connection to random error ...

 you can't have it both ways ... either we make a good faith effort to
 sample in a reasonble way .. such that our standard errors can be expected
 to be about 

Re: hyp testing -Reply

2000-04-17 Thread dennis roberts

At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:

 There's a chapter in J. Utts' mostly wonderful but flawed low-math intro
text "Seeing Through Statistics", in which she does much the same. She
presents a case study based on some of her own work in which she looked at
the question of gender discrimination in pay at her own university, and
fails to reject the null hypothesis [no systemic difference in pay between
male and female faculty]. She heads the example "Important, but not
significant, differences in salaries"; comments (_perhaps_ technically
correctly but misleadingly) that "a statistically naive reader could
conclude that there is no problem" and in closing states:

the flaw here is that ... she has population data i presume ... or about as 
close as one can come to it ... within the institution ... via the budget 
or comptroller's office ... THE salary data are known ... so, whatever 
differences are found ... DEMS are it!

the notion of statistical significance in this case seems IRRELEVANT ... 
the real issue is ... given that there are a variety of factors that might 
account for such differences (numbers in ranks, time in ranks, etc. etc.) 
 is the remaining difference (if there is one) IMPORTANT TO DEAL WITH ...





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread Robert Dawson


- Original Message -
From: dennis roberts
 At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:

  There's a chapter in J. Utts' mostly wonderful but flawed low-math
intro
 text "Seeing Through Statistics", in which she does much the same. She
 presents a case study based on some of her own work in which she looked
at
 the question of gender discrimination in pay at her own university, and
 fails to reject the null hypothesis [no systemic difference in pay
between
 male and female faculty]. She heads the example "Important, but not
 significant, differences in salaries"; comments (_perhaps_ technically
 correctly but misleadingly) that "a statistically naive reader could
 conclude that there is no problem" and in closing states:

and Dennis Roberts replied:

 the flaw here is that ... she has population data i presume ... or about
as
 close as one can come to it ... within the institution ... via the budget
 or comptroller's office ... THE salary data are known ... so, whatever
 differences are found ... DEMS are it!

 the notion of statistical significance in this case seems IRRELEVANT ...
 the real issue is ... given that there are a variety of factors that might
 account for such differences (numbers in ranks, time in ranks, etc. etc.)
  is the remaining difference (if there is one) IMPORTANT TO DEAL WITH
...


If one can totally explain all contributing factors, so that a model
with significantly fewer parameters than there are faculty fits everybody to
within a practically significant margin of error, then yes, either the model
continues to work with gender removed or it doesn't.

If, on the other hand, there are unknown sources of variation (a
reasonable assumption in any situation involving people), or more sources of
variation than there are data (another good bet if one thought hard enough),
one cannot automatically go from the observation

(*)  "The average pay of female faculty members here is less than that of
male faculty members"

to the apparently desired conclusion

(**)  "There is a gender-based _pattern_ of discrimination in faculty
salaries"

without considering the study as a pseudo-experiment, and analyzing it as
such.  One would be trying to decide: is the difference between mean male
and female faculty salaries greater than one would expect if one took N1
males and N2 females and assigned factors such as experience, rank,
skill/luck at negotiating a first contract, demand for specialties,  merit
pay actually deserved [as opposed to given on a gender basis], etc. at
random?

This is what Utts and her coauthors were, it seems, trying to do.
However, when the tests were not significant at the chosen level they seem
to have fallen back on inferring (**) directly from (*).

-Robert Dawson



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread ssolla

Response embedded within message:

In article [EMAIL PROTECTED],
  [EMAIL PROTECTED] wrote:

 The way this world is ---
A master's candidate, or a phD candidate, or a professor,
or a working scientist, has put a lot into his project.
In terms of time, in terms of money, and more important
still, in terms of emotional commitment, (S)he has lived
with this project for two years or more.

 That is a source of subjective bias:  (S)he WANTS the data to
 show something, preferably to support the original idea behind
 the research, but even failing that, to show something.

 There needs be an objective brake on this wish.  An hypothesis
 test is that a brake.  NOT rejecting the null hypothesis means
 that the data has no information (about whatever aspect of the
 data the test was designed to look at),  STOP THERE; go no
 further.

I hope not to get too off topic here, but sometimes the failure to
reject the null hypothesis has more implications than successfully
rejecting it. I understand your point here, and certainly have seen it
happen both personally and in the literature. However, as long as the
experiment has a sufficient sample size to detect a meaningful effect
(not necessarily just a null of an effect size of zero), then there is
something to say. For example, the literature has been overflowing with
reports of "estrogenic compounds" such as DDT/DDE that affect sexual
development of exposed animals. If someone found that DDE has little
ability to competitively bind to estrogen receptors (which someone has
found), at least to an extent necessary to elicit strong estrogenic
activity, this would not only mean that the null hypothesis that DDE is
estrogenic was rejected, but that something ELSE must be happening; ie.
that the known alterations to sexual development after exposure to DDE
is not due to estrogenic actvity. I am sure that this sort of thing must
be happening in other fields.


 Without some objective brake, the master's student, etc. will
 go ahead to say something about the data, even when the test
 would have told her(im) there is nothing to say.

Failure to reject null hypotheses that have been "successfully rejected"
in numerous previous experiments, and thus are generally accepted by the
scientific community at large, can have big implications, even if the
alternative explanations were not tested and thus remain unknown. It may
not happen often, but failure to reject a null hypothesis, particularly
one that was expected to be rejected, may indicate a poorly executed
study, but it may signal that the underlying theory from which the
experiment is based upon is wrong. That alone is valuable.

Shane de Solla
[EMAIL PROTECTED]

snip


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread dennis roberts

At 08:07 PM 4/17/00 +, Charles D Madewell wrote:

As a working engineer and part time graduate student I do not even
understand why anyone would want to do away with hypothesis testing.
I have spent many, many hours of my graduate school life learning,
reading, calculating, and analyzing using hypothesis tests.
Hypothesis testing is not bad.  It is errors in designing the
experiment that are bad and this comes from PEOPLE not the math.  What
is the fuss?  Are you guys telling me that all of this knowledge I am
being taught will be worthless?  Come on, find something else to say

some of us find it very difficult ... given how we learned/or were taught a 
subject matter ... AND how we have been practicing it for dozens and dozens 
of years ... to come to the realization that perhaps ... what we have been 
taught ... and what we have practiced ... is disproportional to its benefit 
and utility ...

if we take all the courses that teach (particularly at the more 
introductory levels) statistical material ... and try to establish some 
percent of that that deals with hypothesis testing and related matters ... 
VERSUS time spent on other things ... and then ask: is all that time worth 
the investment of energy?

i think the answer is clearly no ...

but, we are so slow to change ... if we change at all ...

i grew up like that ... and have spent all these years teaching that (have 
to fill those students with sufficient statistical info) ... but, the 
reality is: hypothesis testing the way we do it ... has limited utility ... 
and is overblown to the nth degree

now, that does not mean it is not important ... it is ... just not nearly 
as important as our expenditure of time suggests ... for us AND for students

sure, design is much more important than inferential statistics  but we 
have to share some of the blame ... when we push it so ... and as the ONLY 
way to go about things ... this is not only using our time unwisely ... but 
also doing a disservice to students ...



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread Joe Ward

Hi, Robert and all --

Yes, there occasionally were discussions in our Air Force research
whether or not we were working with the POPULATION or a SAMPLE.

As Dennis comments:
| 
|  the flaw here is that ... she has population data i presume ... or about
| as
|  close as one can come to it ... within the institution ... via the budget
|  or comptroller's office ... THE salary data are known ... so, whatever
|  differences are found ... DEMS are it!
| 

One of my Professors used to use the Invertebrate Paleontologists as his
example of a POPULATION.  I think at that time there were less than 20
people who were Invertebrate Paleontologists.

-- Joe
 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *




- Original Message - 
From: Robert Dawson [EMAIL PROTECTED]
To: dennis roberts [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, April 17, 2000 9:54 AM
Subject: Re: hyp testing -Reply


| 
| - Original Message -
| From: dennis roberts
|  At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:
| 
|   There's a chapter in J. Utts' mostly wonderful but flawed low-math
| intro
|  text "Seeing Through Statistics", in which she does much the same. She
|  presents a case study based on some of her own work in which she looked
| at
|  the question of gender discrimination in pay at her own university, and
|  fails to reject the null hypothesis [no systemic difference in pay
| between
|  male and female faculty]. She heads the example "Important, but not
|  significant, differences in salaries"; comments (_perhaps_ technically
|  correctly but misleadingly) that "a statistically naive reader could
|  conclude that there is no problem" and in closing states:
| 
| and Dennis Roberts replied:
| 
|  the flaw here is that ... she has population data i presume ... or about
| as
|  close as one can come to it ... within the institution ... via the budget
|  or comptroller's office ... THE salary data are known ... so, whatever
|  differences are found ... DEMS are it!
| 
|  the notion of statistical significance in this case seems IRRELEVANT ...
|  the real issue is ... given that there are a variety of factors that might
|  account for such differences (numbers in ranks, time in ranks, etc. etc.)
|   is the remaining difference (if there is one) IMPORTANT TO DEAL WITH
| ...
| 
| 
| If one can totally explain all contributing factors, so that a model
| with significantly fewer parameters than there are faculty fits everybody to
| within a practically significant margin of error, then yes, either the model
| continues to work with gender removed or it doesn't.
| 
| If, on the other hand, there are unknown sources of variation (a
| reasonable assumption in any situation involving people), or more sources of
| variation than there are data (another good bet if one thought hard enough),
| one cannot automatically go from the observation
| 
| (*)  "The average pay of female faculty members here is less than that of
| male faculty members"
| 
| to the apparently desired conclusion
| 
| (**)  "There is a gender-based _pattern_ of discrimination in faculty
| salaries"
| 
| without considering the study as a pseudo-experiment, and analyzing it as
| such.  One would be trying to decide: is the difference between mean male
| and female faculty salaries greater than one would expect if one took N1
| males and N2 females and assigned factors such as experience, rank,
| skill/luck at negotiating a first contract, demand for specialties,  merit
| pay actually deserved [as opposed to given on a gender basis], etc. at
| random?
| 
| This is what Utts and her coauthors were, it seems, trying to do.
| However, when the tests were not significant at the chosen level they seem
| to have fallen back on inferring (**) directly from (*).
| 
| -Robert Dawson
| 
| 
| 
| ===
| This list is open to everyone.  Occasionally, less thoughtful
| people send inappropriate messages.  Please DO NOT COMPLAIN TO
| THE POSTMASTER about these messages because the postmaster has no
| way of controlling them, and excessive complaints will result in
| termination of the list.
| 
| For information about this list, including information about the
| problem of inappropriate messages and information a

Re: hyp testing -Reply

2000-04-15 Thread Herman Rubin

In article [EMAIL PROTECTED], bill knight  [EMAIL PROTECTED] wrote:
  dennis roberts [EMAIL PROTECTED] 04/07 2:46 pm 



 ===

The way this world is --- 
   A master's candidate, or a phD candidate, or a professor,
   or a working scientist, has put a lot into his project.
   In terms of time, in terms of money, and more important 
   still, in terms of emotional commitment, (S)he has lived
   with this project for two years or more.  

That is a source of subjective bias:  (S)he WANTS the data to 
show something, preferably to support the original idea behind
the research, but even failing that, to show something.  

There needs be an objective brake on this wish.  An hypothesis
test is that a brake.  NOT rejecting the null hypothesis means
that the data has no information (about whatever aspect of the
data the test was designed to look at),  STOP THERE; go no
further.

It is a brake, but is it a meaningful brake?  That data
HAS information; ignoring it has risks.  And if the sample
size is huge, the proper brake has been removed; the result
will almost certainly be significant, even if unimportant.

Without some objective brake, the master's student, etc. will
go ahead to say something about the data, even when the test
would have told her(im) there is nothing to say.

So 100 investigators look at a problem, and on the 
average at least 5 will find significance.  So we
have 5 positive papers published, and the magnitude
of the effect is exaggerated.  This is the converse.
Then some investigator who has had statistical 
methods courses does a meta-analysis, and gets an
"important" effect.

For a given experiment, more statical significance is
usually associated with a larger effect.  But across
experiments, this is not the case.

In the British study on Type 2 diabetics, one comparison
gave a p value of .052.  This was then classed as an
unimportant effect.  If it had been .048, it would have
been called important.  If the study did not have other
results, this data would have been buried.

Regard rejecting the null hypothesis as permission to look
the data.

Looking at the data takes much more understanding of
probability and statistics than the classical view 
even permits.  

SUMMARY:
* Don't be like a certain social sciences graduate
* student at our university who, after failing to reject her
* null hypothesis, nevertheless went on to draw conclusions
* from her data.  (Worse than that, her department
* had her seminar presented as a star example.)


bill knight  http://www.math.unb.ca/~knight


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-14 Thread bill knight

  dennis roberts [EMAIL PROTECTED] 04/07 2:46 pm 
 i was not suggesting taking away from our arsenal of tricks ... but,  since
 i was one of those old guys too ... i am wondering if we were mostly lead
 astray ...?
 
 the more i work with statistical methods, the less i see any meaningful (at
 the level of dominance that we see it) applications of hypothesis
 testing ...
 
 here is a typical problem ... and we teach students this!
 
 1. we design a new treatment
 2. we do an experiment
 3. our null hypothesis is that both 'methods', new and old, produce the
 same results
 4. we WANT to reject the null (especially if OUR method is better!)
 5. we DO a two sample t test (our t was 2.98 with 60 df)  and reject the
 
 null ... and in our favor!
 6. what has this told us?
 
 if this is ALL you do ... what it has told you AT BEST is that ... the
 methods probably are not the same ... but, is that the question of
 interest
 to us?
 
 no ... the real question is: how much difference is there in the two
 methods?
 
 our t test does NOT say anything about that
 
 ===

The way this world is --- 
   A master's candidate, or a phD candidate, or a professor,
   or a working scientist, has put a lot into his project.
   In terms of time, in terms of money, and more important 
   still, in terms of emotional commitment, (S)he has lived
   with this project for two years or more.  

That is a source of subjective bias:  (S)he WANTS the data to 
show something, preferably to support the original idea behind
the research, but even failing that, to show something.  

There needs be an objective brake on this wish.  An hypothesis
test is that a brake.  NOT rejecting the null hypothesis means
that the data has no information (about whatever aspect of the
data the test was designed to look at),  STOP THERE; go no
further.

Without some objective brake, the master's student, etc. will
go ahead to say something about the data, even when the test
would have told her(im) there is nothing to say.

Regard rejecting the null hypothesis as permission to look
the data.

SUMMARY:
* Don't be like a certain social sciences graduate
* student at our university who, after failing to reject her
* null hypothesis, nevertheless went on to draw conclusions
* from her data.  (Worse than that, her department
* had her seminar presented as a star example.)
 

bill knight  http://www.math.unb.ca/~knight


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-08 Thread Jerrold Zar

Then, follow up your t test with a statement of the effect size and its
associated confidence interval.  

---Jerry Zar

 dennis roberts [EMAIL PROTECTED] 04/07 2:46 pm 
i was not suggesting taking away from our arsenal of tricks ... but,
since 
i was one of those old guys too ... i am wondering if we were mostly
lead 
astray ...?

the more i work with statistical methods, the less i see any meaningful
(at 
the level of dominance that we see it) applications of hypothesis
testing ...

here is a typical problem ... and we teach students this!

1. we design a new treatment
2. we do an experiment
3. our null hypothesis is that both 'methods', new and old, produce the 
same results
4. we WANT to reject the null (especially if OUR method is better!)
5. we DO a two sample t test (our t was 2.98 with 60 df)  and reject the

null ... and in our favor!
6. what has this told us?

if this is ALL you do ... what it has told you AT BEST is that ... the 
methods probably are not the same ... but, is that the question of
interest 
to us?

no ... the real question is: how much difference is there in the two
methods?

our t test does NOT say anything about that



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===