George has done some interest work that might interest the group. I sure we can
all think of examples where correlation has no association with causality but
this is an interesting approach that might force us to rewrite the books.
Ron Blue
----- Original Message -----
From: George Hammond <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, April 22, 1999 4:09 AM
Subject: Mental Speed & causality
> Posted from George Hammond (physics) April 22, 1999
> email: [EMAIL PROTECTED]
>
> RE: The formula for causation
>
>
> Ron Blue writes:
>
> >I like your line of reasoning. One of the few topics that has
> >been pushed by psychologist is that correlation does not prove
> >causality. But a "possible" causal connection might be a
> >reasonable idea whose time has come.
> >Ron
>
>
> Dear Ron:
>
> Boy..! you've got that one right. Now; I suppose there's a historical
> reason for this supercautionary approach to science by psychologists, but
> your're talking to someone here (a physicist) who is completely oblivious to
> it. My annoyance with 100 years of statistical development may be summed up
> in 4 points:
>
> 1. In 100 statistics books I have seen there is no
> such thing as a precise verbal definition of a
> correlation coefficient. My conclusion is that
> nobody actually knows what it is.
>
> 2. There is no commonly understood relation between
> correlation and physical causation in Psychology,
> despite the fact that an elementary relation exists.
>
> 3. The traditional use of "r-squared" to indicate
> causality is mathematically erroneous and only holds
> true for robust correlations (.6-.8) that are usually
> found in applied science. It vastly underestimates
> causality for small correlations (.1-.4) that routinely
> arise in theoretical research.
>
> 4. The cause of this educational blackout is, in my
> humble opinion, summed up in Shakespear's words
> "me thinketh they protest too much"; about correlation
> not being necessarily causation. As anyone but a
> Boyscout would know, the only case of interest is
> when it IS causation.
>
> Ok, so much for complaining about the situation; what are the real 'facts of
> life" about the Pearson correlatino coefficient and physical causality that
> have been deliberatly obfuscated for 100 years?
>
> As a physicist sees it, a correlaoitn coefficient IS a "coefficient of
> causation", pure and simple. The only complication involved is that it is
> "statistical" causation not "simple" causation. But don't panic, what I'm
> here to tell you is that the difference amounts to no more than a simple
> factor of the square root of 2.
> Suppose in a simple physical problem:
>
> C = A + B
>
> then obviously the percentage causation of C, caused by A, is:
>
> A
> ordinary percentage = -------
> A + B
>
> now, in the case of "variates" this does not hold true (a "variate" being a
> fixed range variable with a standard deviation). In this case, it is well
> known that if:
>
> C = A + B
>
> then:
> A
> statistical percentage = ---------------
> sqrt(A^2 + B^2)
>
> this quantity is known as the "Pearson correlation coefficient" (correlation
> of A with C).
> Notice that sqrt(A^2 + B^2) is SMALLER than A+B. In fact, if A=B, the
> ratio is simply:
>
> A+B 2A
> --------- = ---------- = sqrt(2)
> sqrt(A^2+B^2) A sqrt(2)
>
> in other words, percentage physical causality is related to Pearson r as:
>
> Pearson r
> Percentage Physical Causality = ----------------
> square root of 2
>
>
> Now, what is all of this really saying? All it says is that r differs from
> simple percent causation in that statistical variables "overlap" one
> another, i.e. tend to cancel each other out by about 30% so that A+B is not
> 1, but is actually .7071. Therefore, on average, Pearson r is slightly
> larger than actual percent physical causation by the square root of 2.
> In my last post I gave the exact formula:
>
> r
> Percentage Physical Causation = ---------------
> r + sqrt(1-r^2)
>
> and this equation is simply A/A+B where A=r and B=sqrt(1-r^2).
> If you plot this equation on a piece of graph paper, you will find that
> it is almost a straight line between r=0 and r=.95, and the slope of that
> line is very nearly 1/sqrt(2).
> Finally, we note that for a correlation of .7071 that:
>
> r/sqrt(2) = r^2 = .5
>
> In other words, the old "r-squared" estimate of causality agrees with the
> exact formula in the neighborhood of r=.707. This was probably discovered
> by applied scientists years ago, and accounts for the tenacious use of the
> "r-squared" estimate of causality even though it is clearly mathematically
> wrong- drastically wrong for small r.
> Ok, so as a byproduct of this discovery of the true causality formula, we
> have also discovered the correct verbal definition of the Pearson
> correlation coefficient. It is quite evidently the "visible percentage of
> causation" which differs from the "theoretical percentage of causation" by
> approximately a factor of sqrt(2):
> visible causation
> theoretical causation = ------------------ = .707 r
> sqrt(2)
>
>
> Now, this all began when the question of physical causation arose in
> connection with the celebrated correlation r=-.35 of RT with IQ. In that
> case, the simple rule of thumb gives us:
>
> .35
> physical causation = ----------- = 25 percent
> sqrt(2)
>
> The exact formula (above) gives 27.2 percent, but you can see that sqrt(2)
> gives a very close answer. Note that this is TWICE AS LARGE as the old
> r-squared estimate of .35^2 = 12%.
> At any rate, we no longer have to listen to all that textbook jive about
> a correlation coefficient being "the strength of the linear relationship" or
> " the degree of association" or "the scalar product of two vectors in factor
> space" etc. etc. etc. What a correlation coefficient actually is, is the
> "percentage of visible causation" of a variable. Naturally this is sqrt(2)
> LARGER than the actual physical causation, because the variable is
> statistically "overlapping" with its errors by sqrt(2); on average.
> GH
>
>