On Tue, Mar 21, 2006 at 08:35:52AM +0800, John Darrington wrote:
On Mon, Mar 20, 2006 at 10:03:27AM -0500, Jason Stover wrote:
> 3. cat_value_update seems to do nothing for numeric variables. Why
is
> this? A numeric variable can be used as a categorical variable
> just as easily as an alpha one.
Good point. Encoding numeric data as categorical is usually a mistake
from a statistical standpoint, but there are circumstances when
treating a numeric variable as categorical makes perfect sense, so
maybe cat_value_update() shouldn't care what type of variable it is
looking at. This is where the question 'should we protect the user?'
comes up. Someone with a numeric variable that has, say, 10^5 distinct
values and inadvertently treats that variable as categorical could
wind up running a procedure with 0 or negative degrees of freedom;
slowing the machine down to a crawl; or, worst of all, finding bugs
we'd rather not know about. But users should probably have the ability
to treat numeric data as categorical if they want to.
I'm not a statistician, so I can't make any comment about whether
numeric variables, "ought" to be used as catagorical ones. But I've
seen *many* examples where this is done. Most demonstrations of
T-TEST do something like 0 = Male, 1 = Female. I've even seen reports
telling me that a person's average sex is 0.54 Maybe we could have a
very mild warning if a catagorical variable is numeric.One other point I thought of, was that catagorical variables should almost always have a measure attribute of 'Nominal', a catagorical variable which is 'Continuous' really doesn't make sense, so perhaps this should raise a warning. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature
_______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
