Be Skeptical of Both Piketty And His Skeptics 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/>
  


By Nate Silver <http://fivethirtyeight.com/contributors/nate-silver/>  

Data never has a virgin birth. It can be tempting to assume that the 
information contained in a spreadsheet or a database is pure or clean or beyond 
reproach. But this is almost never the case 
<http://ajr.org/2014/04/05/dealing-data-skeptical/> . All data is collected and 
compiled by someone — either an individual researcher or a government agency or 
a scientific laboratory or a news organization or someone or something else. 
Sometimes, the data collection process is automated or programmatic. But that 
automation process is initiated by human beings who write code or programs or 
algorithms; those programs can have bugs, which will be faithfully replicated 
by the computers.

This is another way of saying that almost all data is subject to human error. 
It’s important both to reduce the error rate and to develop methods that are 
more robust to the presence of error. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-1>
 1 And it’s important to keep expectations in check when a controversy like the 
one surrounding the French economist Thomas Piketty arises.

Piketty’s 696-page book “Capital in the Twenty-First Century” has become an 
unlikely best-seller 
<http://www.huffingtonpost.com/2014/04/22/thomas-piketty-amazon_n_5191566.html> 
 in the United States. That’s perhaps because it was published at a time when 
there is rapidly increasing interest 
<http://fivethirtyeight.com/datalab/inequality-booms-on-msnbc-and-fox-news-cnns-looking-for-flight-370/>
  in the subject of economic inequality in the U.S. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-2>
 2 But on Friday, the Financial Times’ Chris Giles published a list 
<http://blogs.ft.com/money-supply/2014/05/23/data-problems-with-capital-in-the-21st-century/>
  of apparent errors and methodological questions in the data underpinning 
Piketty’s work. Piketty has so far responded 
<http://blogs.ft.com/money-supply/2014/05/23/piketty-response-to-ft-data-concerns/>
  to the Financial Times only in general terms.

My goal here is not to litigate the individual claims made by Giles; see The 
New York Times’ Neil Irwin 
<http://www.nytimes.com/2014/05/24/upshot/did-piketty-get-his-math-wrong.html>  
or The Economist’s Ryan Avent  
<http://www.economist.com/blogs/freeexchange/2014/05/inequality-0> for more 
detail on that. Rather, I hope to provide some broad perspective about data 
collection, publication and analysis. A series of disclosures: First, my 
economic priors <http://rationalwiki.org/wiki/Bayesian>  and preferences are 
closer to The Economist’s than to Piketty’s. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-3>
 3 Second, I haven’t finished Piketty’s book, although I’ve spent some time 
exploring his data. Third, I’m no expert on macroeconomic policy or 
macroeconomic data. Fourth, this comment rather liberally takes advantage of 
our footnote system; there’s a short version (sans footnotes) and a long 
version ( <http://crosswordtracker.com/clue/sans-opposite/> avec).

My perspective is that of someone who has spent a lot of time compiling and 
analyzing moderately complex data sets of different kinds. Also, I’m someone 
who, like Piketty, has seen his public profile grow unexpectedly in recent 
years. I consider myself extremely fortunate for this — however, I know that 
attention can sometimes yield disproportionate praise and criticism. 
Throat-clearing aside, here’s what I have to offer.

Piketty’s data sets are very detailed, and they aggregate data from many 
original sources. For instance, the data 
<http://piketty.pse.ens.fr/en/capitalisback>  Piketty and the economist Gabriel 
Zucman compiled on wealth inequality in the United Kingdom for their paper 
<http://piketty.pse.ens.fr/files/PikettyZucman2014QJE.pdf>  “Capital is Back: 
Wealth-Income Ratios in Rich Countries, 1700-2010″ contains about 220 data 
series for the U.K. alone which are hard-coded 
<http://en.wikipedia.org/wiki/Hard_coding>  into their spreadsheet. These data 
series are compiled from a wide array of original sources, which are reasonably 
well documented in the spreadsheet.

This type of data-collection exercise — many different data series over many 
different years, compiled from many countries and many sources — offers many 
opportunities for error. Part of the reason Piketty’s efforts are potentially 
valuable is because data on wealth inequality is lacking. But that also means 
his numbers will not have received as much scrutiny as other data sets.

An extreme contrast would be to something like Major League Baseball 
statistics, almost every detail of which have been scrubbed and scrutinized by 
enthusiasts for decades. Even so, they contain errors from time to time 
<http://community.seattletimes.nwsource.com/archive/?date=19990623&slug=2968014>
 . There are, however, usually larger gains to be had when data or methods or 
findings are relatively new — as they are in Piketty’s case. (An analogy is the 
way a vacuum’s first sweep of the living-room floor picks up a lot more dust 
and dirt than the second and third attempts.) Perhaps Piketty is guilty of 
coming to some fairly grand conclusions 
<http://www.bloombergview.com/articles/2014-04-20/the-most-important-book-ever-is-all-wrong>
  based on data that has not yet received all that much scrutiny.

What error rate is acceptable? The right answer is probably not “zero.” If 
researchers kept scrubbing data until it were perfect, they’d never have time 
for analysis. There comes a point of diminishing returns; that Hack Wilson had 
191 RBIs during the 1930 season rather than 190 ought not have a material 
impact on any analysis of baseball player performance. At other times, entire 
articles or analyses or theories or paradigms are developed on the basis of 
deeply flawed data.

I don’t know where Piketty sits on this spectrum. However, I think Giles (and 
some of the commentary surrounding his work) could do a better job of 
describing Piketty’s error rate relative to the overall volume of data that was 
examined. If Giles scrutinized all of Piketty’s data and found a handful of 
errors, that would be very different from taking a small subsample of that data 
and finding it rife with mistakes.

All of this is part of the peer-review process. Academics sometimes think of 
peer review as a relatively specific activity 
<http://www.nature.com/nature/peerreview/debate/>  undertaken by other 
academics before academic papers or journal articles are published. This 
process of peer review has been much studied over the years 
<http://scholar.google.com/scholar?hl=en&q=peer+review&btnG=&as_sdt=1%2C33&as_sdtp=>
  (often in peer-reviewed articles, naturally), and scholars have come to 
different conclusions about how effective it is in avoiding various types of 
errors <http://www.ma.utexas.edu/users/mks/statmistakes/errortypes.html>  in 
published research.

I’m not necessarily opposed to this type of peer review. But I think it defines 
peer review too narrowly and confines it too much to the academy. Peer review, 
to my mind, should be thought of as a continuous process: It starts from the 
moment a researcher first describes her result to a colleague over coffee and 
it never ends, even after her work has been published in a peer-reviewed 
journal (or a best-selling book). Many findings are contradicted or even 
retracted <http://retractionwatch.com/>  years after being published, and 
replication rates for peer-reviewed academic studies across a variety of 
disciplines are disturbingly low 
<http://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/308269/>
 .

I have a dog in this fight, obviously. I think journalistic organizations from 
the Financial Times to FiveThirtyEight should be thought of as prospective 
participants in the peer-review process, meaning both that we provide peer 
review and that our work is subject to peer review.

I can’t speak for the FT, but I know that FiveThirtyEight gets some things 
badly wrong from time to time 
<http://fivethirtyeight.com/datalab/mapping-kidnappings-in-nigeria/> . It’s 
helpful to have readers who hold us to a very high standard. (A terrific 
question is whether FiveThirtyEight and other news organizations are 
transparent enough about their research to be full-fledged participants in the 
peer-review process. That’s something I should probably address more completely 
in a separate post, but see the footnotes for some discussion about it. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-4>
 4)

Piketty’s errors would not have been detected so soon had he not published his 
data in detail. That’s not to say that transparency is an absolute defense. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-5>
 5 But one should also assume that there are as many problems (probably more) 
with unpublished data, or poorly explained methods. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-6>
 6

The peer-review process ideally involves both exactly replicating a research 
finding and replicating it in principle. It would be problematic if other 
researchers couldn’t duplicate Piketty’s data. But it would be at least as 
problematic — I’d argue more so — if they could replicate it but found that 
Piketty’s conclusions were not very robust to changes in assumptions or data 
sources.

Some of Giles’s critique of Piketty gets at this problem. For instance, he 
calls into question Piketty’s finding that wealth inequality is rising 
throughout Western Europe, a result which he says depends on a particular 
series of assumptions and choices that Piketty made. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-7>
 7

Of course, Giles’s methodological choices can be scrutinized, too. Perhaps 
there’s some reasonable set of assumptions under which wealth inequality is not 
rising at all in Western Europe, another under which it’s increasing modestly, 
and a third under which it’s increasing substantially.

In the medium term, the better test might be one of research that’s built up 
from scratch and largely independently of both Piketty and Giles. How robust 
are their findings to reasonable changes in data and assumptions? 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-8>
 8

And in the long run, the best test might be whether Piketty’s hypothesis makes 
a good prediction about wealth inequality, i.e. whether wealth inequality 
continues to rise. The prediction won’t be as easy to evaluate as election 
forecasts are. 
<http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=#fn-9>
 9 Still, Piketty’s book comes closer to making a testable prediction 
<http://gregmankiw.blogspot.com/2014/04/first-thoughts-on-piketty.html>  than 
much other macroeconomic work.

Science is messy, and the social sciences are messier than the hard sciences. 
Research findings based on relatively new and novel data sets (like Piketty’s) 
are subject to one set of problems — the data itself will have been less well 
scrutinized and is more likely to contain errors, small and large. Research on 
well-worn datasets are subject to another. Such data is probably in better 
shape, but if researchers are coming to some new and novel conclusions from it, 
that may reflect some flaw in their interpretation or analysis.

The closest thing to a solution is to remain appropriately skeptical, perhaps 
especially when the research finding is agreeable to you. A lot of apparently 
damning critiques prove to be less so when you assume from the start that data 
analysis and empirical research, like other forms of intellectual endeavor, are 
not free from human error. Nonetheless, once the dust settles, it seems likely 
that both Piketty and Giles will have moved us toward an improved understanding 
of wealth inequality and its implications.

http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/?alcmpid=

Reply via email to