Re: hyp testing -Reply
Robert Dawson wrote: As far as random samples are concerned: it is *very* rare for a true random sample, based on an equal-probability sample of the population to which the inference is intended to extend, to be taken. Say a researcher is studying the behaviour of humans. (S)he may take a random sample from the student subject pool, but not from the human race; and yet the paper published will claim to be about "Artificially Inducing The Gag Reflex in Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in Psych 1000 at Miskatonic U. (Fall '00)". Even if some future world government were to allow researchers access to a list of all humans alive at some moment to use as a sampling frame, most researchers would not disclaim any applicability of their research to those dead or not yet born. The implicit "Platonic" population larger than that available for study is a problem that is always with us; a bad sample is one in which this causes bias. The situation in which the entire actual population is available for study is an extreme case, of course. I don't think the problem is as severe as you imply. Scientific hypotheses are about infinite populations, because scientists draw inferences about processes, theories and so. The paleontologist example is interesting, because it is obviously true that there is something about those 20 individuals as a group which disposes them to drive certain cars (price, salary, whatever). However, the (more) interesting claim is that being a paleontologist makes you drive a certain kind of car. This claim embraces Fred (presently a window cleaner) who becomes a paleontologist (after night school) and suddenly purchases a new car. The population is effectively infinite if you want to embrace paleontologist last year, next year etc. A true random sample is rarely possible and may not be a random sample of the population for which you wish to generalize to. However, generalization does not rest soley on statistics. In fact statistical generalization is necessary, but less important than generalization with respect to theory in most sciences. If we know about (i.e. have useful theories of) lung (or brain, or ...) function and development then we can generalize from one sample with lungs or brains to another sample with lungs (or brains, or ...) more powerfully than through statistics alone. Many of the problems with traditional statistics are really problems of weak theory or weak experimental design. Hypothesis testing can't solve these, but neither can any other statistical method. (Indeed some alternatives to hypothesis testing may be more susceptible to these problems. For example, effect size calculation, meta analysis etc. may place more emphasis on strong theory. This can be good if it forces a researcher back to theory, but I can see little evidence of this, so far.) Thom === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
I thought everone knew there was a difference in Anatomy between male and female professors! ;) At 12:19 PM 4/20/00 +0100, you wrote: dennis roberts wrote: At 10:32 AM 4/17/00 -0300, Robert Dawson wrote: There's a chapter in J. Utts' mostly wonderful but flawed low-math intro text "Seeing Through Statistics", in which she does much the same. She presents a case study based on some of her own work in which she looked at the question of gender discrimination in pay at her own university, and fails to reject the null hypothesis [no systemic difference in pay between male and female faculty]. She heads the example "Important, but not significant, differences in salaries"; comments (_perhaps_ technically correctly but misleadingly) that "a statistically naive reader could conclude that there is no problem" and in closing states: the flaw here is that ... she has population data i presume ... or about as close as one can come to it ... within the institution ... via the budget or comptroller's office ... THE salary data are known ... so, whatever differences are found ... DEMS are it! the notion of statistical significance in this case seems IRRELEVANT ... the real issue is ... given that there are a variety of factors that might account for such differences (numbers in ranks, time in ranks, etc. etc.) is the remaining difference (if there is one) IMPORTANT TO DEAL WITH ... Yes! This reminds me of a newspaper article and radio news item in the UK this year about female and male professors. They had data to show that there was a large salary difference. However, they went on to say that the largest difference was in Anatomy. I mentioned this to a female colleague of mine (who works in that area) who pointed out there was only one female professor of Anatomy in the UK. Thom === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === _ - | \ Jon Cryer[EMAIL PROTECTED] ( ) Department of Statistics http://www.stat.uiowa.edu\ \_ University and Actuarial Science office 319-335-0819 \ * \ of Iowa The University of Iowa dept. 319-335-0706\ / Hawkeyes Iowa City, IA 52242FAX319-335-3017 | ) - V === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Joe Ward wrote: Yes, there occasionally were discussions in our Air Force research whether or not we were working with the POPULATION or a SAMPLE. As Dennis comments: | | the flaw here is that ... she has population data i presume ... or about | as | close as one can come to it ... within the institution ... via the budget | or comptroller's office ... THE salary data are known ... so, whatever | differences are found ... DEMS are it! | One of my Professors used to use the Invertebrate Paleontologists as his example of a POPULATION. I think at that time there were less than 20 people who were Invertebrate Paleontologists. OK. Now, suppose that you knew them all, and noticed that ten of them drove convertibles. You would probably make some generalization about invertebrate paleontologists, consider that this was a genuine phenomenon, and assume that if one more invertebrate paleontologist *did* turn up, it might well be in a convertible. [Maybe convertibles are easier than sedans to get into if you're invertebrate? grin] Suppose there were also exactly two extraterrestrial paleontologists in the world, and one of them drove a convertible. You would be less likely to think in the same way. Now, if you discovered that around 50% of the vertebrate paleontologists in the world drove convertibles, you would consider that you had ironclad proof that something was going on. I suggest that even if these groups are not true random samples (and they are not - more on that later) that the informal inferential process described has much in common with formal statistical inference. And, if it walks like a duck and quacks like a duck, it makes some sense to cook it like a duck. (Similarly, if you were to toss a coin and cover it unseen, and offer a frequentist various odds that it had landed heads, most frequentists would put their cutoff betweeen accepting and rejecting the wager at odds corresponding to a 50% probability, even if they refused to admit that that was the probability that the coin was heads-up.) There are obvious problems with the sampling technique - though probably less than if a convenience sample of (say) the most accessible half the population had been taken. As far as random samples are concerned: it is *very* rare for a true random sample, based on an equal-probability sample of the population to which the inference is intended to extend, to be taken. Say a researcher is studying the behaviour of humans. (S)he may take a random sample from the student subject pool, but not from the human race; and yet the paper published will claim to be about "Artificially Inducing The Gag Reflex in Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in Psych 1000 at Miskatonic U. (Fall '00)". Even if some future world government were to allow researchers access to a list of all humans alive at some moment to use as a sampling frame, most researchers would not disclaim any applicability of their research to those dead or not yet born. The implicit "Platonic" population larger than that available for study is a problem that is always with us; a bad sample is one in which this causes bias. The situation in which the entire actual population is available for study is an extreme case, of course. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
On Mon, 17 Apr 2000 20:07:56 GMT, Charles D Madewell [EMAIL PROTECTED] wrote: As a working engineer and part time graduate student I do not even understand why anyone would want to do away with hypothesis testing. I have spent many, many hours of my graduate school life learning, reading, calculating, and analyzing using hypothesis tests. Hypothesis testing is not bad. It is errors in designing the experiment that are bad and this comes from PEOPLE not the math. What is the fuss? Are you guys telling me that all of this knowledge I am being taught will be worthless? Come on, find something else to say The training is fine and useful. As training in pure logic, you can't lose by it. Most of the research problems can be expressed in terms of hypotheses; the people who can't express those problems that way are muddled thinkers, or are tackling problems that (so far) are too complex for them. Some other research problems are questions of estimation: - How reliable is this rater? You certainly want to be well above the value of 0. You might want to judge by the point estimator that is above .80 (say), or you might want to see a Confidence interval (90%? 75%? 50?) that is entirely above some stated value like .70. - Or, for huge samples, "significance" is obtained on every interesting comparison, so the only useful results are the ones where the effect size is greater than some target-amount. Technically, there is not much difference between the two (hypothesis vs estimation). If a research team can't put the question in terms of hypthesis testing, or tell you WHY it should not be put that way, that is probably a good enough test of their logic and competence that you can be safe in dismissing them. I don't know how well they handle real data, but (a) Dennis has seemed to fail this STANDARD, on certain hypothetical questions. However, I don't like those hypothetical questions, because it is too easy to pretend that they are something else. I think Dennis gets led off by the hypothetical semantics. (b) Robert Frick has published on hypthesis testing, and some of his seems quite unrealistic and wrong to me, too, especially in the description of two competing hypotheses. It might not be the only way, or eventually it might not be the best way, but one of the best organizing principles that we have -- right now -- is that of framing questions as hypotheses. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
At 03:37 PM 4/18/00 -0400, Rich Ulrich wrote: I don't know how well they handle real data, but (a) Dennis has seemed to fail this STANDARD, on certain hypothetical questions. However, I don't like those hypothetical questions, because it is too easy to pretend that they are something else. I think Dennis gets led off by the hypothetical semantics. it's nice to know that i A) have 'failed' a standard (it is not the first one) ... and if that is not enough ... B) and get led off by the hypothetical semantics if nothing else, i have THOSE two things in life the problem i see ... stated in simple terms without lots of semantical gobblygook ... is that we don't spend nearly enough time on thinking about the questions we want to explore ... as researchers (ask joe ward) ... BUT, we sure spend tons of time on learning inferential statistics ... so, the bias in the field is the tendency to think that inferential statistics ... and the logic behind it ... is THE way to knowledge ... but, it is not (though it helps). see below in a way ... what we should do is to BAN ANY discussion of statistical analysis ... UNtil we have a good grasp on the issue at hand ... or, if you want to say it this way (if there is some deduction from some theoretical position) ... formed a sensible hypothesis ... and if this takes time ... or we have to revise it till we get something that is reasonable ... then we need to take the time. then and ONLY then should we allow ourselves to ask: how can data analysis help me in this quest to the answer to the questions i have posed ... or, help me to sort out ways in which to test this deduction from the theory that i have made ... so, to get this ball rolling along SOME line of inquiry ... let's pose the basic question: if we had to opt one way OR the other (there is no middle groud) ... in our instruction related to statistics or analysis ... which way should we go: take a bayesian approach ... or, the way most have been doing it for seems like a zillion years? (and so no one thinks i have loaded the deck ... i don't really care which way we would good ... my only concern here is that IF we have to make a decision ... how would we decide that?) this seems like a legitimate question to ask but, certainly, it would take a lot of PRE data collection work (if it ever came to that point) ... to focus in on subparts of this overall question ... and to try to define important issues that would have to be dealt with ... before one could ever be in a position to even conduct some 'study' about this and attempt to arrive at some answer ... so, i offer a challenge: let's rationally discuss this question (not that this is any better than many others that could be framed) ... and restrict our discussion to NON statistical matters ... and see if we could develop a plan that if implemented ... would help us answer the question of interest ... if we can do that ... THEN let's see what might be an appropriate way or ways ... to handle any data that might come out of this exercise === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Spot on, Robert. Alan Robert Dawson wrote: Joe Ward wrote: Yes, there occasionally were discussions in our Air Force research whether or not we were working with the POPULATION or a SAMPLE. As Dennis comments: | | the flaw here is that ... she has population data i presume ... or about | as | close as one can come to it ... within the institution ... via the budget | or comptroller's office ... THE salary data are known ... so, whatever | differences are found ... DEMS are it! | One of my Professors used to use the Invertebrate Paleontologists as his example of a POPULATION. I think at that time there were less than 20 people who were Invertebrate Paleontologists. OK. Now, suppose that you knew them all, and noticed that ten of them drove convertibles. You would probably make some generalization about invertebrate paleontologists, consider that this was a genuine phenomenon, and assume that if one more invertebrate paleontologist *did* turn up, it might well be in a convertible. [Maybe convertibles are easier than sedans to get into if you're invertebrate? grin] Suppose there were also exactly two extraterrestrial paleontologists in the world, and one of them drove a convertible. You would be less likely to think in the same way. Now, if you discovered that around 50% of the vertebrate paleontologists in the world drove convertibles, you would consider that you had ironclad proof that something was going on. I suggest that even if these groups are not true random samples (and they are not - more on that later) that the informal inferential process described has much in common with formal statistical inference. And, if it walks like a duck and quacks like a duck, it makes some sense to cook it like a duck. (Similarly, if you were to toss a coin and cover it unseen, and offer a frequentist various odds that it had landed heads, most frequentists would put their cutoff betweeen accepting and rejecting the wager at odds corresponding to a 50% probability, even if they refused to admit that that was the probability that the coin was heads-up.) There are obvious problems with the sampling technique - though probably less than if a convenience sample of (say) the most accessible half the population had been taken. As far as random samples are concerned: it is *very* rare for a true random sample, based on an equal-probability sample of the population to which the inference is intended to extend, to be taken. Say a researcher is studying the behaviour of humans. (S)he may take a random sample from the student subject pool, but not from the human race; and yet the paper published will claim to be about "Artificially Inducing The Gag Reflex in Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in Psych 1000 at Miskatonic U. (Fall '00)". Even if some future world government were to allow researchers access to a list of all humans alive at some moment to use as a sampling frame, most researchers would not disclaim any applicability of their research to those dead or not yet born. The implicit "Platonic" population larger than that available for study is a problem that is always with us; a bad sample is one in which this causes bias. The situation in which the entire actual population is available for study is an extreme case, of course. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Hi Dennis, Robert's observation is 'spot on' because it is the way things are, rather than the way we would like to think things are. I (of course) agree that people writing papers should have some sense of proportion the claims made in their papers. Nevertheless, if you want to study the gag reflex in humans, you simply cannot take a simple random sample of all humans, so you have to use a surrogate population of some sort. In fact I would claim that except for artificial (ie class room) examples, you pretty well always have to use surrogate populations. This is of course particularly true when the population is not well defined. In market research, for example, when your population allegedly consists of 'customers of XYZ Store', but equally well in pretty well any branch of research. It has been my opinion for quite some time now that the uncertainties in conclusions due to use of surrogate populations, plus those due to measurement (eg uncertainty in interpretation of questions in a questionnaire), far exceed sampling errors - probably even exceed nonsampling errors. However, this is quite consistent with my observations about models. In applying the results obtained from the surrogate population to the general 'true' population of interest, we apply those results as a model. If the surrogate population was well chosen (and the analysis well done) then the model is likely to be reasonably appropriate. That is, it is likely to 'work'. It cannot be emphasized too much that the statistical analysis - including the definition of variables, the design, the collection of data and the analysis of the data - is simply a part of a process of investigation. This part provides evidence - to some extent objective, and hopefully objective - which will help the researcher to argue his or her case. It is only part of the evidence. And that evidence may be sufficiently strong to persuade the scientific community that the researcher's argument is valid, or it may not. Regards, Alan dennis roberts wrote: Robert Dawson wrote: As far as random samples are concerned: it is *very* rare for a true random sample, based on an equal-probability sample of the population to which the inference is intended to extend, to be taken. Say a researcher is studying the behaviour of humans. (S)he may take a random sample from the student subject pool, but not from the human race; and yet the paper published will claim to be about "Artificially Inducing The Gag Reflex in Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in Psych 1000 at Miskatonic U. (Fall '00)". well, perhaps journal editors should INSIST that the author say very clearly ... that this only applies to students enrolled in psy 1000 at miskatonic u ... fall 1999 ... since that is what it is ... the only way we can get around this ... is to REPLICATE investigations and see if we can find comparable results across disparate subject pools .. but, unfortunately ... if you do like benton j underwood did many years ago: studies in the meaningfulness of learning 1 ... then 2 ... then ... 29 ... your tenure would be 'on hold' ... you are not allowed to replicate 10 times ... you must move onward and upward ... we would be MUCH better off ... reducing drastically the NUMBER of things we tried to be unique in (that no one else has done) ... and spend more time replicating work ... that is deemed to be MORE important ... in the long run ... our knowledge base would be better and stronger ... rather than relying on some p value to suggest that THIS has been researched and THE answer found ... now we should move on to something else ... just another weight place on the poor little p ... when its back is already crushing! the more i think about it ... the more i think our overall effort is misguided ... and this is but one reason why there is so much crappy research ... and believe me ... there is plenty to go around across all the disciplines Even if some future world government were to allow researchers access to a list of all humans alive at some moment to use as a sampling frame, most researchers would not disclaim any applicability of their research to those dead or not yet born. The implicit "Platonic" population larger than that available for study is a problem that is always with us; a bad sample is one in which this causes bias. The situation in which the entire actual population is available for study is an extreme case, of course. i would suggest that inferential statistics .. as we know it ... is not robust to cruddy samples and if samples are cruddy ... what's the point in using some standard error that is BASED on the assumption that samples are NOT cruddy ... but rather, have some connection to random error ... you can't have it both ways ... either we make a good faith effort to sample in a reasonble way .. such that our standard errors can be expected to be about
Re: hyp testing -Reply
At 10:32 AM 4/17/00 -0300, Robert Dawson wrote: There's a chapter in J. Utts' mostly wonderful but flawed low-math intro text "Seeing Through Statistics", in which she does much the same. She presents a case study based on some of her own work in which she looked at the question of gender discrimination in pay at her own university, and fails to reject the null hypothesis [no systemic difference in pay between male and female faculty]. She heads the example "Important, but not significant, differences in salaries"; comments (_perhaps_ technically correctly but misleadingly) that "a statistically naive reader could conclude that there is no problem" and in closing states: the flaw here is that ... she has population data i presume ... or about as close as one can come to it ... within the institution ... via the budget or comptroller's office ... THE salary data are known ... so, whatever differences are found ... DEMS are it! the notion of statistical significance in this case seems IRRELEVANT ... the real issue is ... given that there are a variety of factors that might account for such differences (numbers in ranks, time in ranks, etc. etc.) is the remaining difference (if there is one) IMPORTANT TO DEAL WITH ... === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
- Original Message - From: dennis roberts At 10:32 AM 4/17/00 -0300, Robert Dawson wrote: There's a chapter in J. Utts' mostly wonderful but flawed low-math intro text "Seeing Through Statistics", in which she does much the same. She presents a case study based on some of her own work in which she looked at the question of gender discrimination in pay at her own university, and fails to reject the null hypothesis [no systemic difference in pay between male and female faculty]. She heads the example "Important, but not significant, differences in salaries"; comments (_perhaps_ technically correctly but misleadingly) that "a statistically naive reader could conclude that there is no problem" and in closing states: and Dennis Roberts replied: the flaw here is that ... she has population data i presume ... or about as close as one can come to it ... within the institution ... via the budget or comptroller's office ... THE salary data are known ... so, whatever differences are found ... DEMS are it! the notion of statistical significance in this case seems IRRELEVANT ... the real issue is ... given that there are a variety of factors that might account for such differences (numbers in ranks, time in ranks, etc. etc.) is the remaining difference (if there is one) IMPORTANT TO DEAL WITH ... If one can totally explain all contributing factors, so that a model with significantly fewer parameters than there are faculty fits everybody to within a practically significant margin of error, then yes, either the model continues to work with gender removed or it doesn't. If, on the other hand, there are unknown sources of variation (a reasonable assumption in any situation involving people), or more sources of variation than there are data (another good bet if one thought hard enough), one cannot automatically go from the observation (*) "The average pay of female faculty members here is less than that of male faculty members" to the apparently desired conclusion (**) "There is a gender-based _pattern_ of discrimination in faculty salaries" without considering the study as a pseudo-experiment, and analyzing it as such. One would be trying to decide: is the difference between mean male and female faculty salaries greater than one would expect if one took N1 males and N2 females and assigned factors such as experience, rank, skill/luck at negotiating a first contract, demand for specialties, merit pay actually deserved [as opposed to given on a gender basis], etc. at random? This is what Utts and her coauthors were, it seems, trying to do. However, when the tests were not significant at the chosen level they seem to have fallen back on inferring (**) directly from (*). -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Response embedded within message: In article [EMAIL PROTECTED], [EMAIL PROTECTED] wrote: The way this world is --- A master's candidate, or a phD candidate, or a professor, or a working scientist, has put a lot into his project. In terms of time, in terms of money, and more important still, in terms of emotional commitment, (S)he has lived with this project for two years or more. That is a source of subjective bias: (S)he WANTS the data to show something, preferably to support the original idea behind the research, but even failing that, to show something. There needs be an objective brake on this wish. An hypothesis test is that a brake. NOT rejecting the null hypothesis means that the data has no information (about whatever aspect of the data the test was designed to look at), STOP THERE; go no further. I hope not to get too off topic here, but sometimes the failure to reject the null hypothesis has more implications than successfully rejecting it. I understand your point here, and certainly have seen it happen both personally and in the literature. However, as long as the experiment has a sufficient sample size to detect a meaningful effect (not necessarily just a null of an effect size of zero), then there is something to say. For example, the literature has been overflowing with reports of "estrogenic compounds" such as DDT/DDE that affect sexual development of exposed animals. If someone found that DDE has little ability to competitively bind to estrogen receptors (which someone has found), at least to an extent necessary to elicit strong estrogenic activity, this would not only mean that the null hypothesis that DDE is estrogenic was rejected, but that something ELSE must be happening; ie. that the known alterations to sexual development after exposure to DDE is not due to estrogenic actvity. I am sure that this sort of thing must be happening in other fields. Without some objective brake, the master's student, etc. will go ahead to say something about the data, even when the test would have told her(im) there is nothing to say. Failure to reject null hypotheses that have been "successfully rejected" in numerous previous experiments, and thus are generally accepted by the scientific community at large, can have big implications, even if the alternative explanations were not tested and thus remain unknown. It may not happen often, but failure to reject a null hypothesis, particularly one that was expected to be rejected, may indicate a poorly executed study, but it may signal that the underlying theory from which the experiment is based upon is wrong. That alone is valuable. Shane de Solla [EMAIL PROTECTED] snip Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
At 08:07 PM 4/17/00 +, Charles D Madewell wrote: As a working engineer and part time graduate student I do not even understand why anyone would want to do away with hypothesis testing. I have spent many, many hours of my graduate school life learning, reading, calculating, and analyzing using hypothesis tests. Hypothesis testing is not bad. It is errors in designing the experiment that are bad and this comes from PEOPLE not the math. What is the fuss? Are you guys telling me that all of this knowledge I am being taught will be worthless? Come on, find something else to say some of us find it very difficult ... given how we learned/or were taught a subject matter ... AND how we have been practicing it for dozens and dozens of years ... to come to the realization that perhaps ... what we have been taught ... and what we have practiced ... is disproportional to its benefit and utility ... if we take all the courses that teach (particularly at the more introductory levels) statistical material ... and try to establish some percent of that that deals with hypothesis testing and related matters ... VERSUS time spent on other things ... and then ask: is all that time worth the investment of energy? i think the answer is clearly no ... but, we are so slow to change ... if we change at all ... i grew up like that ... and have spent all these years teaching that (have to fill those students with sufficient statistical info) ... but, the reality is: hypothesis testing the way we do it ... has limited utility ... and is overblown to the nth degree now, that does not mean it is not important ... it is ... just not nearly as important as our expenditure of time suggests ... for us AND for students sure, design is much more important than inferential statistics but we have to share some of the blame ... when we push it so ... and as the ONLY way to go about things ... this is not only using our time unwisely ... but also doing a disservice to students ... === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Hi, Robert and all -- Yes, there occasionally were discussions in our Air Force research whether or not we were working with the POPULATION or a SAMPLE. As Dennis comments: | | the flaw here is that ... she has population data i presume ... or about | as | close as one can come to it ... within the institution ... via the budget | or comptroller's office ... THE salary data are known ... so, whatever | differences are found ... DEMS are it! | One of my Professors used to use the Invertebrate Paleontologists as his example of a POPULATION. I think at that time there were less than 20 people who were Invertebrate Paleontologists. -- Joe * Joe Ward Health Careers High School * * 167 East Arrowhead Dr 4646 Hamilton Wolfe* * San Antonio, TX 78228-2402San Antonio, TX 78229 * * Phone: 210-433-6575 Phone: 210-617-5400* * Fax: 210-433-2828 Fax: 210-617-5423 * * [EMAIL PROTECTED]* * http://www.ijoa.org/joeward/wardindex.html * - Original Message - From: Robert Dawson [EMAIL PROTECTED] To: dennis roberts [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, April 17, 2000 9:54 AM Subject: Re: hyp testing -Reply | | - Original Message - | From: dennis roberts | At 10:32 AM 4/17/00 -0300, Robert Dawson wrote: | | There's a chapter in J. Utts' mostly wonderful but flawed low-math | intro | text "Seeing Through Statistics", in which she does much the same. She | presents a case study based on some of her own work in which she looked | at | the question of gender discrimination in pay at her own university, and | fails to reject the null hypothesis [no systemic difference in pay | between | male and female faculty]. She heads the example "Important, but not | significant, differences in salaries"; comments (_perhaps_ technically | correctly but misleadingly) that "a statistically naive reader could | conclude that there is no problem" and in closing states: | | and Dennis Roberts replied: | | the flaw here is that ... she has population data i presume ... or about | as | close as one can come to it ... within the institution ... via the budget | or comptroller's office ... THE salary data are known ... so, whatever | differences are found ... DEMS are it! | | the notion of statistical significance in this case seems IRRELEVANT ... | the real issue is ... given that there are a variety of factors that might | account for such differences (numbers in ranks, time in ranks, etc. etc.) | is the remaining difference (if there is one) IMPORTANT TO DEAL WITH | ... | | | If one can totally explain all contributing factors, so that a model | with significantly fewer parameters than there are faculty fits everybody to | within a practically significant margin of error, then yes, either the model | continues to work with gender removed or it doesn't. | | If, on the other hand, there are unknown sources of variation (a | reasonable assumption in any situation involving people), or more sources of | variation than there are data (another good bet if one thought hard enough), | one cannot automatically go from the observation | | (*) "The average pay of female faculty members here is less than that of | male faculty members" | | to the apparently desired conclusion | | (**) "There is a gender-based _pattern_ of discrimination in faculty | salaries" | | without considering the study as a pseudo-experiment, and analyzing it as | such. One would be trying to decide: is the difference between mean male | and female faculty salaries greater than one would expect if one took N1 | males and N2 females and assigned factors such as experience, rank, | skill/luck at negotiating a first contract, demand for specialties, merit | pay actually deserved [as opposed to given on a gender basis], etc. at | random? | | This is what Utts and her coauthors were, it seems, trying to do. | However, when the tests were not significant at the chosen level they seem | to have fallen back on inferring (**) directly from (*). | | -Robert Dawson | | | | === | This list is open to everyone. Occasionally, less thoughtful | people send inappropriate messages. Please DO NOT COMPLAIN TO | THE POSTMASTER about these messages because the postmaster has no | way of controlling them, and excessive complaints will result in | termination of the list. | | For information about this list, including information about the | problem of inappropriate messages and information a
Re: hyp testing -Reply
In article [EMAIL PROTECTED], bill knight [EMAIL PROTECTED] wrote: dennis roberts [EMAIL PROTECTED] 04/07 2:46 pm === The way this world is --- A master's candidate, or a phD candidate, or a professor, or a working scientist, has put a lot into his project. In terms of time, in terms of money, and more important still, in terms of emotional commitment, (S)he has lived with this project for two years or more. That is a source of subjective bias: (S)he WANTS the data to show something, preferably to support the original idea behind the research, but even failing that, to show something. There needs be an objective brake on this wish. An hypothesis test is that a brake. NOT rejecting the null hypothesis means that the data has no information (about whatever aspect of the data the test was designed to look at), STOP THERE; go no further. It is a brake, but is it a meaningful brake? That data HAS information; ignoring it has risks. And if the sample size is huge, the proper brake has been removed; the result will almost certainly be significant, even if unimportant. Without some objective brake, the master's student, etc. will go ahead to say something about the data, even when the test would have told her(im) there is nothing to say. So 100 investigators look at a problem, and on the average at least 5 will find significance. So we have 5 positive papers published, and the magnitude of the effect is exaggerated. This is the converse. Then some investigator who has had statistical methods courses does a meta-analysis, and gets an "important" effect. For a given experiment, more statical significance is usually associated with a larger effect. But across experiments, this is not the case. In the British study on Type 2 diabetics, one comparison gave a p value of .052. This was then classed as an unimportant effect. If it had been .048, it would have been called important. If the study did not have other results, this data would have been buried. Regard rejecting the null hypothesis as permission to look the data. Looking at the data takes much more understanding of probability and statistics than the classical view even permits. SUMMARY: * Don't be like a certain social sciences graduate * student at our university who, after failing to reject her * null hypothesis, nevertheless went on to draw conclusions * from her data. (Worse than that, her department * had her seminar presented as a star example.) bill knight http://www.math.unb.ca/~knight -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
dennis roberts [EMAIL PROTECTED] 04/07 2:46 pm i was not suggesting taking away from our arsenal of tricks ... but, since i was one of those old guys too ... i am wondering if we were mostly lead astray ...? the more i work with statistical methods, the less i see any meaningful (at the level of dominance that we see it) applications of hypothesis testing ... here is a typical problem ... and we teach students this! 1. we design a new treatment 2. we do an experiment 3. our null hypothesis is that both 'methods', new and old, produce the same results 4. we WANT to reject the null (especially if OUR method is better!) 5. we DO a two sample t test (our t was 2.98 with 60 df) and reject the null ... and in our favor! 6. what has this told us? if this is ALL you do ... what it has told you AT BEST is that ... the methods probably are not the same ... but, is that the question of interest to us? no ... the real question is: how much difference is there in the two methods? our t test does NOT say anything about that === The way this world is --- A master's candidate, or a phD candidate, or a professor, or a working scientist, has put a lot into his project. In terms of time, in terms of money, and more important still, in terms of emotional commitment, (S)he has lived with this project for two years or more. That is a source of subjective bias: (S)he WANTS the data to show something, preferably to support the original idea behind the research, but even failing that, to show something. There needs be an objective brake on this wish. An hypothesis test is that a brake. NOT rejecting the null hypothesis means that the data has no information (about whatever aspect of the data the test was designed to look at), STOP THERE; go no further. Without some objective brake, the master's student, etc. will go ahead to say something about the data, even when the test would have told her(im) there is nothing to say. Regard rejecting the null hypothesis as permission to look the data. SUMMARY: * Don't be like a certain social sciences graduate * student at our university who, after failing to reject her * null hypothesis, nevertheless went on to draw conclusions * from her data. (Worse than that, her department * had her seminar presented as a star example.) bill knight http://www.math.unb.ca/~knight === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Then, follow up your t test with a statement of the effect size and its associated confidence interval. ---Jerry Zar dennis roberts [EMAIL PROTECTED] 04/07 2:46 pm i was not suggesting taking away from our arsenal of tricks ... but, since i was one of those old guys too ... i am wondering if we were mostly lead astray ...? the more i work with statistical methods, the less i see any meaningful (at the level of dominance that we see it) applications of hypothesis testing ... here is a typical problem ... and we teach students this! 1. we design a new treatment 2. we do an experiment 3. our null hypothesis is that both 'methods', new and old, produce the same results 4. we WANT to reject the null (especially if OUR method is better!) 5. we DO a two sample t test (our t was 2.98 with 60 df) and reject the null ... and in our favor! 6. what has this told us? if this is ALL you do ... what it has told you AT BEST is that ... the methods probably are not the same ... but, is that the question of interest to us? no ... the real question is: how much difference is there in the two methods? our t test does NOT say anything about that === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===