George has done some interest work that might interest the group.  I sure we can
all think of examples where correlation has no association with causality but
this is an interesting approach that might force us to rewrite the books.
Ron Blue

----- Original Message -----
From: George Hammond <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, April 22, 1999 4:09 AM
Subject: Mental Speed & causality


> Posted from George Hammond (physics) April 22, 1999
> email:  [EMAIL PROTECTED]
>
> RE:  The formula for causation
>
>
> Ron Blue writes:
>
> >I like your line of reasoning.  One of the few topics that has
> >been pushed by psychologist is that correlation does not prove
> >causality.  But a "possible" causal connection might be a
> >reasonable idea whose time has come.
> >Ron
>
>
> Dear Ron:
>
>    Boy..! you've got that one right.  Now; I suppose there's a historical
> reason for this supercautionary approach to science by psychologists, but
> your're talking to someone here (a physicist) who is completely oblivious to
> it.  My annoyance with 100 years of statistical development may be summed up
> in 4 points:
>
>    1. In 100 statistics books I have seen there is no
>       such thing as a precise verbal definition of a
>       correlation coefficient.  My conclusion is that
>       nobody actually knows what it is.
>
>    2. There is no commonly understood relation between
>       correlation and physical causation in Psychology,
>       despite the fact that an elementary relation exists.
>
>    3. The traditional use of "r-squared" to indicate
>       causality is mathematically erroneous and only holds
>       true for robust correlations (.6-.8) that are usually
>       found in applied science.  It vastly underestimates
>       causality for small correlations (.1-.4) that routinely
>       arise in theoretical research.
>
>    4. The cause of this educational blackout is, in my
>       humble opinion, summed up in Shakespear's words
>       "me thinketh they protest too much"; about correlation
>       not being necessarily causation.  As anyone but a
>       Boyscout would know, the only case of interest is
>       when it IS causation.
>
> Ok, so much for complaining about the situation; what are the real 'facts of
> life" about the Pearson correlatino coefficient and physical causality that
> have been deliberatly obfuscated for 100 years?
>
> As a physicist sees it, a correlaoitn coefficient IS a "coefficient of
> causation", pure and simple.  The only complication involved is that it is
> "statistical" causation not "simple" causation.  But don't panic, what I'm
> here to tell you is that the difference amounts to no more than a simple
> factor of the square root of 2.
>    Suppose in a simple physical problem:
>
>                   C = A + B
>
> then obviously the percentage causation of C, caused by A, is:
>
>                                A
>       ordinary percentage = -------
>                              A + B
>
> now, in the case of "variates" this does not hold true (a "variate" being a
> fixed range variable with a standard deviation).  In this case, it is well
> known that if:
>
>                   C = A + B
>
> then:
>                                     A
>     statistical percentage =  ---------------
>                               sqrt(A^2 + B^2)
>
> this quantity is known as the "Pearson correlation coefficient" (correlation
> of A with C).
>    Notice that sqrt(A^2 + B^2) is SMALLER than A+B.  In fact, if A=B, the
> ratio is simply:
>
>    A+B               2A
> ---------      =  ----------  =  sqrt(2)
> sqrt(A^2+B^2)     A sqrt(2)
>
> in other words, percentage physical causality is related to Pearson r as:
>
>                                           Pearson r
>    Percentage Physical Causality  =   ----------------
>                                       square root of 2
>
>
> Now, what is all of this really saying?  All it says is that r differs from
> simple percent causation in that statistical variables "overlap" one
> another, i.e. tend to cancel each other out by about 30% so that A+B is not
> 1, but is actually .7071.  Therefore, on average, Pearson r is slightly
> larger than actual percent physical causation by the square root of 2.
>    In my last post I gave the exact formula:
>
>                                               r
>    Percentage Physical Causation  =    ---------------
>                                        r + sqrt(1-r^2)
>
> and this equation is simply A/A+B where A=r and B=sqrt(1-r^2).
>    If you plot this equation on a piece of graph paper, you will find that
> it is almost a straight line between r=0 and r=.95, and the slope of that
> line is very nearly 1/sqrt(2).
>    Finally, we note that for a correlation of .7071 that:
>
>       r/sqrt(2)  =  r^2  =  .5
>
> In other words, the old "r-squared" estimate of causality agrees with the
> exact formula in the neighborhood of r=.707.  This was probably discovered
> by applied scientists years ago, and accounts for the tenacious use of the
> "r-squared" estimate of causality even though it is clearly mathematically
> wrong- drastically wrong for small r.
>    Ok, so as a byproduct of this discovery of the true causality formula, we
> have also discovered the correct verbal definition of the Pearson
> correlation coefficient.  It is quite evidently the "visible percentage of
> causation" which differs from the "theoretical percentage of causation" by
> approximately a factor of sqrt(2):
>                                   visible causation
>       theoretical causation  =   ------------------  =  .707 r
>                                        sqrt(2)
>
>
> Now, this all began when the question of physical causation arose in
> connection with the celebrated correlation r=-.35 of RT with IQ.  In that
> case, the simple rule of thumb gives us:
>
>                                  .35
>       physical causation  =  ----------- = 25 percent
>                                sqrt(2)
>
> The exact formula (above) gives 27.2 percent, but you can see that sqrt(2)
> gives a very close answer.  Note that this is TWICE AS LARGE as the old
> r-squared estimate of .35^2 = 12%.
>    At any rate, we no longer have to listen to all that textbook jive about
> a correlation coefficient being "the strength of the linear relationship" or
> " the degree of association" or "the scalar product of two vectors in factor
> space" etc. etc. etc.  What a correlation coefficient actually is, is the
> "percentage of visible causation" of a variable.  Naturally this is sqrt(2)
> LARGER than the actual physical causation, because the variable is
> statistically "overlapping" with its errors by sqrt(2); on average.
> GH
>
>





Reply via email to