On Tue, Mar 21, 2006 at 08:35:52AM +0800, John Darrington wrote: > On Mon, Mar 20, 2006 at 10:03:27AM -0500, Jason Stover wrote: > > > > 3. cat_value_update seems to do nothing for numeric variables. Why is > > this? A numeric variable can be used as a categorical variable > > just as easily as an alpha one. > > Good point. Encoding numeric data as categorical is usually a mistake > from a statistical standpoint, but there are circumstances when > treating a numeric variable as categorical makes perfect sense, so > maybe cat_value_update() shouldn't care what type of variable it is > looking at. This is where the question 'should we protect the user?' > comes up. Someone with a numeric variable that has, say, 10^5 distinct > values and inadvertently treats that variable as categorical could > wind up running a procedure with 0 or negative degrees of freedom; > slowing the machine down to a crawl; or, worst of all, finding bugs > we'd rather not know about. But users should probably have the ability > to treat numeric data as categorical if they want to. > > I'm not a statistician, so I can't make any comment about whether > numeric variables, "ought" to be used as catagorical ones. But I've > seen *many* examples where this is done. Most demonstrations of > T-TEST do something like 0 = Male, 1 = Female. I've even seen reports > telling me that a person's average sex is 0.54 Maybe we could have a > very mild warning if a catagorical variable is numeric.
Yeah. And I don't think the warning is necessary. (I was thinking users should enter a '0' or '1' but make the type categorical, but that doesn't happen, and often shouldn't happen, as in the case where 'average sex is .54' just means '54% female.') -Jason _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
