Re: teaching statistical methods by rules?

1999-12-19 Thread Herman Rubin

In article [EMAIL PROTECTED],
Robert Frick  [EMAIL PROTECTED] wrote:
Jerry Dallal wrote:

 Robert Frick wrote:

  I know it is hard to make statistics fun, but FOLLOWING
 RULES IS NEVER
  FUN.  Not in math, not in games, nowhere.

 In math and in games, following rules isn't just fun,
 IT'S THE LAW.  In fact, you can't have fun unless
 you follow them.  :-)

Well, technically, most real rules tell you what not to do -- they
usually don't tell you what to do, because that isn't fun.

This is well put.  The rules describe what is allowed, but 
not which of the allowed possibilities to perform.  

 In bridge,
the language of the bidding is very prescribed, but you almost always
have choices as to what you can bid.  On the other hand, the
prescription to bid 1NT with a balanced hand and 15-17 points tells you
what to do, but is not a real rule of the game.  Instead, it is a rule
the experts constructed so that the game wouldn't be fun.  Ha ha, they
really constructed the rule so that people could play better bridge. 
Destroying the game is an unintended byproduct.

   In math, aren't students often taught algorithms for solving problems? 
Again, no fun.

Yes, this is a mistake.  They are given many rules in
linear algebra, which are all special cases of what is
known in logic as the rule of equality.  But these rules
only state what processes are allowed.  The introduction
of formal algorithms added nothing but glitz to algebra.

The ones often given as rules for solving, such as Cramer's
rule, or inversion of a matrix by determinants, are
essentially useless in most situations.  These are rules,
of the type given for special situations in statistics.  Of
course, most real problems in statistics are not of this
special type.

Bob F.


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558



Re: Prediction Model Question

1999-12-19 Thread Donald F. Burrill

On Thu, 16 Dec 1999, Burke Johnson wrote:

 A student of mine is getting ready to develop a GLM prediction model
 that will include a mixture of categorical and quantitative predictor
 variables.  We will probably not include interaction terms in the model 
 (i.e., it will be a main effects only model). 

Why would one NOT include interaction terms in the model, at least in the 
exploratory stages of analysis?  As Joe Ward pointed out, somewhat 
obliquely, you can miss a great deal that's going on in the data if you 
rule out of order all the interesting stuff beforehand.
   [If you want an example, there are several rather neat ones around.] 
 
 Here's my question:  Do you suggest using dummy coding (0,1) or effects 
 coding (1,0,-1) for the categorical variables included in the model? 

I was puzzled about your assertion in the next paragraph until I read 
Rich Ulrich's reply.  What you call "effects coding", if limited to that 
(1,0,-1), is what I'd call a "linear effect" for a 3-category variable if 
the categories are at least ordered.  You didn't say how many categories 
there are in any of the categorical variables in question;  apparently 
not binary variables, else (1,0,-1) could not apply.  If you use 
indicator variables of the form (0,1), you'd need more than 1 such 
variable if you have more than 2 categories to be represented, and you 
did not indicate whether you had in mind as many indicator variables as 
there are degrees of freedom in the categorical variable(s) of interest, 
or (for some unexplained and to me unimaginable reason) were planning to 
use only one such variable for each of the categorical variables.
  And of course if you use (1,0,-1) for a ternary variable, you ought 
also to use the complementary (-1,2,-1) to represent the remaining degree 
of freedom.  If you're thinking of variables with more than 3 categories, 
how did you plan to code the 4th, 5th, ... category?

 The reason I'm asking is because dummy coding does not always give the
 same result for a factorial design as does ANOVA and effects coding, 

Either you are in error in this assertion, or you mean something 
different fromk what _I_ have in mind by "dummy coding" and "effects 
coding".  As another respondent has pointed out, the results are 
equivalent whatever the coding, so long as all the degrees of freedom 
implied by the several categories are represented in the codes.

 and, hence, Pedhazur recommends using effects coding rather than dummy
 coding in the factorial case. 

As another respondent has remarked, this seems to me most unlikely.  
Where, precisely, does Pedhazur recommend any such thing?

 Do you know if the choice of dummy or effects coding matters for a main
 effects only model with multiple categorical and quantitatively scaled
 predictor variables? 

As implied above, it ought not to matter so long as all the d.f. are 
properly accounted for.  IF they are, what you describe is equivalent to 
an analysis of covariance constrained to additive effects only.  (Some of 
our colleagues consider "analysis of covariance" an old-fashioned term,  
possibly even a misleading one;  but in the old-fashioned sense that may 
still make sense to some of us, that's what it is.  ANCOVA is, of course, 
a subset of the general linear model, which is what I suppose you mean by 
GLM.)

One would still have to question the reason, if any, for the constraint. 
One is tempted to suspect that your student would really rather not be 
bothered with interactions, because they're less easy to think about than 
a model containing main effects only;  but perhaps that is a base canard. 
Whatever the case, the _best_ way of dealing with interactions one would 
like not to exist is to model them and show that they are in fact not 
detectable in the data at hand.  If they _are_ detectable, well, sorry, 
folks, that's the way the cookie crumbles sometimes, and the universe (as 
represented, however imperfectly, in the data) may be trying to tell you 
something interesting.  Maybe even useful.  If so, 'twould be rude to 
ignore it;  and being rude to the universe is a loser's game.
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



Re: Factor analysis

1999-12-19 Thread Rich Ulrich

On Sat, 18 Dec 1999 12:00:52 -, "Haider Al-Katem"
[EMAIL PROTECTED] wrote:

 I have conducted a factor analysis on some questionnaire items. The
 dependent variables that I am measuring for example ('Intention To Buy',
 'Attitude towards a product'  and 'Trust in buying the product from a
 merchant' ) seem to load significantly high on two factors which leaves me
 with a NOT SIMPLE FACTOR STRUCTURE.
 
 - Hey, two factors is pretty simple, if you start with a few dozen
items ...

 I am assuming that since 'Intention To Buy', 'Attitude towards a product'
 and 'Trust in buying the product from a merchant'  all seem to be some type
 of an ATTITUDE , the significantly high factor loadings on the two factors
 may be justifiable.
 
 My questions are:
 
 1. Are my above interpretations of the result correct?

Well, if "not simple" is an interpretation, it seems premature or
impossible for us readers to comment, because there is no content
worth commenting on.  If "may be justifiable" is an interpretation, it
is wimpy enough that I wouldn't claim it is incorrect.

 2. If not, is there a statistical method that can help me overcome this
 'non-simple factor structure'?

 And what is goal is "overcome" supposed to indicate?  If there are
two factors, you can provide the outcome of your survey as two
composite scores instead of just one.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html