In the mode of addressing the relevance of the trope "correlation doesn't imply causation" to the issue of governments testing social theories on unwilling human subjects:
About 15 years ago, I was working at a well-funded Silicon Valley startup in Palo Alto with about 100 people. During the few years I was there, 5 children of parents working there were diagnosed on the autism spectrum. I contacted the Berkeley epidemiologist who had been studying autism and informed him of the anomaly. His response was simply that "We know such clusters exist in Silicon Valley and we don't know what causes them." Well, DUH! I was outraged. Several years later I was able to get data out of the Dept of Education on the incidence of autism by State. I could not locate data by county. So I did what any _reasonable epidemiologist should_ do with such data: I surveyed the list of current hypotheses of causes of autism, added a few, and gathered State-level data on other variables related to those hypotheses to look for *gasp* CORRELATIONS. Now, none of this would be in the _least_ controversial, except that one of the hypotheses was that the recent increase in immigration from India to places like Silicon Valley was bringing in a pathogen -- possibly intestinal -- being spread in some manner such as restaurants. Moreover, the project wouldn't have been controversial even then because if you look at the rank-order of single-variable correlations, the correlation with immigrants from India doesn't beat mother's age at first live birth (one hypothesis is father's age producing errors in the sperm's DNA -- for which MAAFLB is a proxy). However, if we're looking at a population with high susceptibility -- say genetic background from human ecologies with low population densities -- then you have to construct a composite variable as the conjunction between the susceptible population and the vector population. L Lo and behold, when all 2-variable conjunctions were correlated with autism incidence, the pair that came out on top was immigrants from India per capita and Finnish ancestry per capita. NOW we're in serious trouble for oblivious political reasons! So I added hundreds more demographic variables to see if, by chance, I could get some pairs of variables to beat that pair -- not that this would, by itself, invalidate the correlation; such scatter-shot searches for correlations are notorious as a statistical fallacy called "data-mining" in which you have no idea of what class of correlations you're looking for and, just by pure chance, you can expect to find some ranking higher so you can't automatically conclude they are significant even though the Pearson's 'r' and degrees of freedom (sample size) -- taken out of the data-mining context -- might indicate high significance. What I found was that, indeed, there were higher correlation pairs but in the scatter plot for the correlation in question, there were some data points that seemed as particular statistical outliers. This is a common problem in science and it can result from a large number of things -- but usually some kind of measurement error. It is standard procedure, in such scenarios, to throw out the top and bottom measurements -- thereby reducing the sample size but hopefully ending up with a higher quality sample. Doing that, the India immigrant x Finnish ancestry pair once again topped the list which now included a combinatorial explosion of pairs. So we're still far from out of the scientific woods (let alone political woods) with this since there the single variable correlation with mother's age at first live birth is nipping at the heels of the politically volatile correlation. Moreover, the MAAFLB scatter plot is more 'normal' or 'robust', meaning that the data points spread out relatively evenly around the regression line, whereas the politically volatile correlation is ragged -- far from 'normal'. You can try to discount the ragged correlation scatter and keep the high rank for the politically volatile correlation by invoking confounding variables such as differing standards of autism diagnosis applied across different states, etc. However, the fact remains that the MAAFLB correlation is less complicated (single variable) and is more robust. OK, so where does this leave us? Well, if I were forced to choose one hypothesis as a working hypothesis I'd say father's age is the correct hypothesis -- not because it avoids the nasty politics of immigration -- but simply on standard statistical merits. However, life isn't so kind to us as to allow us to ignore all alternative hypotheses -- even when those hypotheses might be considered "Hate Data". This is particularly true when you have something as devastating to families, already struggling with the disappearance of middle class jobs, as autism mysteriously exploding in incidence. But it gets worse: Once I had this database of hundreds of by-State demographic variables, I decided to -- just out of curiosity -- do a complete correlation matrix and the compute which of the variables had the greatest statistical power in predicting the other variables by summing their coefficients of determination (you simply square Pearson's 'r' to get a particular correlation's CoD). The variable that came out on top was Jewish percent of Whites in the population with AIDS prevalence a close second. At this point, you can see how "correlation doesn't imply causation" enters the scientific discourse with the most powerful forces of history behind it. On Fri, Oct 4, 2013 at 9:15 PM, James Bowery <[email protected]> wrote: > Well, you will notice that in addition to the preface "Quite aside from > the fact..." I did place "correlation doesn't imply causation" in scare > quotes. We needn't belabor this rhetorical and even philosophical morass > to accept the priority of moral agency in respecting the humanity of others. > > > On Fri, Oct 4, 2013 at 5:19 PM, Jed Rothwell <[email protected]>wrote: > >> James Bowery <[email protected]> wrote: >> >> >>> Quite aside from the fact that "correlation doesn't imply causation", >>> >> >> Actually it does, as David Hume pointed out. In natural science, that is >> pretty much all you have to go on in many cases. >> >> - Jed >> >> >

