Re: Disadvantage of Non-parametric vs. Parametric Test
- I have a comment on an offhand remark of Glen's, at the start of his interesting posting - On Tue, 07 Dec 1999 15:58:11 +1100, Glen Barnett [EMAIL PROTECTED] wrote: Alex Yu wrote: Disadvantages of non-parametric tests: Losing precision: Edgington (1995) asserted that when more precise measurements are available, it is unwise to degrade the precision by transforming the measurements into ranked data. So this is an argument against rank-based nonparametric tests rather than nonparametric tests in general. In fact, I think you'll find Edgington highly supportive of randomization procedures, which are nonparametric. - In my vocabulary, these days, "nonparametric" starts out with data being ranked, or otherwise being placed into categories -- it is the infinite parameters involved in that sort of non-reversible re-scoring which earns the label, nonparametric. (I am still trying to get my definition to be complete and concise.) I know that when *nonparametric* and *distribution-free* were the two alternatives to ANOVAs, either of the two labels was slapped onto people's pet procedures, fairly indiscriminately; and a lack of discrimination seems to have widened to encompass *robust*, later on. Okay, I see that exact evaluation by randomization of a fixed sample does not use a t or F distribution for its p-levels. Okay, I see that it is not ANOVA. But, I'm sorry, I don't regard a test as nonparametric which *does* preserve and use the original metric and means. Comparison of means is parametric, and that contrasts to nonparametric. Similarly, bootstrapping is a method of "robust variance estimation" but it does not change the metric like a power transformation does, or abandon the metric like a rank-order transformation does. If it were proper terminology to say randomization is nonparametric, you would probably want to say bootstrapping is nonparametric, too. (I think some people have done so; but it is not widespread.) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html
Re: Disadvantage of Non-parametric vs. Parametric Test
Frank E Harrell Jr wrote: Alex Yu wrote: Disadvantages of non-parametric tests: Losing precision: Edgington (1995) asserted that when more precise measurements are available, it is unwise to degrade the precision by transforming the measurements into ranked data. Edgington's comment is off the mark in most cases. The efficiency of the Wilcoxon-Mann-Whitney test is 3/pi (0.96) with respect to the t-test IF THE DATA ARE NORMAL. If they are non-normal, the relative efficiency of the Wilcoxon test can be arbitrarily better than the t-test. Likewise, Spearman's correlation test is quite efficient (I think the efficiency is 9/pi^2) relative to the Pearson r test if the data are bivariate normal. Where you lose efficiency with nonparametric methods is with estimation of absolute quantities, not with comparing groups or testing correlations. The sample median has efficiency of only 2/pi against the sample mean if the data are from a normal distribution. Yes, the median is inefficient at the normal. This is the location estimator corresponding to the sign test in the one-sample case. But if you use the location estimator corresponding to the signed-rank test (say) instead, the efficiency improves substantially. Glen
Re: Disadvantage of Non-parametric vs. Parametric Test
Rich Ulrich wrote: - In my vocabulary, these days, "nonparametric" starts out with data being ranked, or otherwise being placed into categories -- it is the infinite parameters involved in that sort of non-reversible re-scoring which earns the label, nonparametric. (I am still trying to get my definition to be complete and concise.) Well, I am happy for you to use this definition of nonparametric now that you've said what you want it to mean, but it isn't exactly what most statisticians - including those of us that distinguish between the terms "distribution-free" and "nonparametric" - mean by "nonparametric", so you'll have to excuse my earlier ignorance of your definition. If my recollection is correct, a parametric procedure is where the entire distribution is specified up to a finite number of parameters, whereas a nonparametric procedure is one where the distribution can't be/isn't specified with only a finite number of unspecified parameters. This typically includes the usual distribution-free procedures, including many rank-based procedures, but it also includes many other things - including some that don't transform the data in any way, and even some based on means. So, for example, ordinary simple linear regression is parametric, because the distribution of y|x is specified, up to the value of the parameters specifying the intercept and slope of the line, and the variance about the line. Nonparametric regression (as the term is typically used in the literature), by contrast, is effectively infinite-parametric, because the distribution of y|x doesn't depend only on a finite number of parameters (often the distribution *about* E[y|x] is parametric - typically gaussian - but E[y|x] itself is where the infinite-parametric part comes from). Nonparametric regression would not seem to fit your definition of "nonparametric", since your usage seems to require some loss of information through ranking or categorisation. Once we start using the same terminology, we tend to find the disagreements die down a bit. Glen
Re: Disadvantage of Non-parametric vs. Parametric Test
Alex Yu wrote: Disadvantages of non-parametric tests: Losing precision: Edgington (1995) asserted that when more precise measurements are available, it is unwise to degrade the precision by transforming the measurements into ranked data. So this is an argument against rank-based nonparametric tests rather than nonparametric tests in general. In fact, I think you'll find Edgington highly supportive of randomization procedures, which are nonparametric. In fact, surprising as it may seem, a lot of the location information in a two sample problem is in the ranks. Where you really start to lose information is in ignoring ordering when it is present. Low power: Generally speaking, the statistical power of non-parametric tests are lower than that of their parametric counterpart except on a few occasions (Hodges Lehmann, 1956; Tanizaki, 1997). When the parametric assumptions hold, yes. e.g. if you assume normality and the data really *are* normal. When the parametric assumptions are violated, it isn't hard to beat the standard parametric techniques. However, frequently that loss is remarkably small when the parametric assumption holds exactly. In cases where they both do badly, the parametric may outperform the nonparametric by a more substantial margin (that is, when you should use something else anyway - for example, a t-test outperforms a WMW when the distributions are uniform). Inaccuracy in multiple violations: Non-parametric tests tend to produce biased results when multiple assumptions are violated (Glass, 1996; Zimmerman, 1998). Sometimes you only need one violation: Some nonparametric procedures are even more badly affected by some forms of non-independence than their parametric equivalents. Testing distributions only: Further, non-parametric tests are criticized for being incapable of answering the focused question. For example, the WMW procedure tests whether the two distributions are different in some way but does not show how they differ in mean, variance, or shape. Based on this limitation, Johnson (1995) preferred robust procedures and data transformation to non-parametric tests. But since WMW is completely insensitive to a change in spread without a change in location, if either were possible, a rejection would imply that there was indeed a location difference of some kind. This objection strikes me as strange indeed. Does Johnson not understand what WMW is doing? Why on earth does he think that a t-test suffers any less from these problems than WMW? Similarly, a change in shape sufficient to get a rejection of a WMW test would imply a change in location (in the sense that the "middle" had moved, though the term 'location' becomes somewhat harder to pin down precisely in this case). e.g. (use a monospaced font to see this): :. .: ::. = .:: ... ... a b a b would imply a different 'location' in some sense, which WMW will pick up. I don't understand the problem - a t-test will also reject in this case; it suffers from this drawback as well (i.e. they are *both* tests that are sensitive to location differences, insensitive to spread differences without a corresponding location change, and both pick up a shape change that moves the "middle" of the data). However, if such a change in shape were anticipated, simply testing for a location difference (whether by t-test or not) would be silly. Nonparametric (notably rank-based) tests do have some problems, but making progress on understanding just what they are is difficult when such seemingly spurious objections are thrown in. His preference for robust procedures makes some sense, but the preference for (presumably monotonic) transformation I would see as an argument for a rank-based procedure. e.g. lets say we are in a two-sample situation, and we decide to use a t-test after taking logs, because the data are then reasonably normal... in that situation, the WMW procedure gives the same p-value as for the untransformed data. However, let's assume that the log-transform wasn't quite right... maybe not strong enough. When you finally find the "right" transformation to normality, there you finally get an extra 5% (roughly) efficiency over the WMW you started with. Except of course, you never know you have the right transformation - and if the distribution the data are from are still skewed/heavy-tailed after transformation (maybe they were log-gamma to begin with or something), then you still may be better off using WMW. Do you have a full reference for Johnson? I'd like to read what the reference actually says. Glen