Hi Risker, the researchers' conclusion in their own words (see section 4.1, "Indentation Reliability") is:
*"Incorrect indentation (i.e., indentation that implies a reply-to relation with the wrong post) is quite common in longer discussions in the EWDC [the English Wikipedia Discussion Corpus]."* Responding below to your concerns about their methodology, taking the opportunity to clear up some statistical misconceptions, which might be valuable for other contexts too. On Friday, March 20, 2015, Risker <[email protected]> wrote: > On 20 March 2015 at 06:13, Tilman Bayer <[email protected]> wrote: > > > On Friday, March 20, 2015, Tilman Bayer <[email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > > Just to throw this in here as one data point: "39% of talk page threads > > > contain wrong indentations > > > < > > > https://meta.wikimedia.org/wiki/Research:Newsletter/2014/November#39.25_of_talk_page_threads_contain_wrong_indentations > > > > > > " > > > > > PS: The result from that paper was actually even worse than that > (somewhat > > sloppy) headline suggests: the researchers "found that 29 of 74 total > > turns, or 39%±14pp of an average thread, had indentation that > misidentified > > the turn to which they were a reply." > > > > > > I'm not sure you really read the underlying study, Tilman; the sample size > is so absurdly small that there is no way it is statistically signifant. > (550 discussions on 83 article talk pages, in case anyone was wondering; No, the sample size was actually stated right in the sentence I quoted above: "74 total turns" (talk page comments responding to another one), together with a ±14pp confidence interval. And what exactly did you mean here by "statistically significant"? The term doesn't make mathematical sense when applied to such a measured percentage in isolation, i.e. without a hypothesis or comparison value. Rather, one can talk about confidence intervals - a smaller confidence interval means the estimate is likely to be more precise. The 550 discussions you quoted refer to a different sample within the same corpus. the equivalent of about 10 minutes' worth of discussions on enwiki, except > that they are looking at talk pages that may have conversations dating back > 10+ years.) This 10 minutes/10 years comparison and the "absurdly small"/"no way" rhetoric sound a lot like a common statistical fallacy, namely erroneously assuming that it is "size of the sample as a fraction of the population that matters" <http://www.amstat.org/publications/jse/v12n2/smith.html> ("Unless the sample encompasses a substantial portion of the population, the standard error of an estimator depends on the size of the sample, but not the size of the population. This is a crucial statistical insight that students find very counterintuitive"). Granted, if one draws a sample of 74 turns from all turns on all talk pages made in Wikipedia's history, then it's plausible that that overall population numbers hundreds of millions. But at such a large population size (or small sample/population ratio) it is is the absolute size of the sample that matters for the size of the confidence interval - not how large the sample is compared to the population. There may be accessible explanations elsewhere that include more of the math behind it, but perhaps this Khan Academy video <https://www.youtube.com/watch?v=1JT9oODsClE> helps, which walks one through a calculation showing how measuring a percentage of 38% in a sample of just 150 US households (out of 100+million) already allows one to reject the null hypothesis that the real percentage among the entire population of all households is less than 30%. It calls 150 a "large" sample in terms of the approximation regime used there - which I'm sure you must find extremely shocking as you earlier called a sample of 550 "absurdly small". For a more thorough derivation, these online lectures <http://www.stat.berkeley.edu/~stark/SticiGui/Text/confidenceIntervals.htm> are quite useful (see e.g. the "Conservative confidence intervals for percentages" section. The "finite population correction <https://en.wikipedia.org/wiki/Finite_population_correction>" term there is close to 1 for small sample/population ratios and so the resulting formula for the confidence interval does no longer depend on N, the population size). Sure, in the present case the absolute size of the sample (74) wasn't very big either, and there are other things to consider such as the selection method (e.g. they actually selected from whole threads "longer than 10 turns each" only, so that's what the percentage relates to). But the authors did their due diligence and indicated the limitations resulting from the sample size by including that ±14% confidence interval. Yes, that's quite broad and for more precise estimations of the real overall percentage of wrong indentations (39% or 32% or 48%?...) one would need a larger sample. But it already makes it highly unlikely that this real percentage is only 1% or 2%, say. Hence I don't see a valid reason to dismiss the authors' conclusion that incorrect indentation is "quite common", or to deny it is likely to be applicable to the English Wikipedia as a whole. To make it concrete, it appears that this was one of the threads in their sample: https://en.wikipedia.org/wiki/Talk:Grammatical_tense#Gutted - which is certainly rife with wrong indentations. By the way, the analogous talk page corpus for the Simple English Wikipedia has been published at https://www.ukp.tu-darmstadt.de/data/discourse-analysis/wikipedia-discussion-corpora/ and I'm told that the above corpus for the English Wikipedia is still going to be published too. This might be interesting material for further quantitative studies on how the existing wikitext talk pages are actually used. And the purpose of the study was to see if this particular > manner of analysing a discussion ("lexical pairs") was useful in > identifying who said what to whom; it's a discussion of the analysis > process, not the actual discussions. Your point being? I already wrote in the linked summary that this finding about wrong indentations was a side result of the paper. But that doesn't make it go away either. > Nonetheless, if you were trying to illustrate that there are communication > benefits in having an easily read flow of discussion, I actually wasn't talking about that here; being easy on readers is a separate issue. Rather, this was about being easy on contributors. The quoted result strongly indicates that many users who comment on wikitext talk pages struggle to get indentation right, even if it may have become second nature to veteran editors like you and me. That would be an argument for building a discussion system where it's easier for commenters to indicate which statement they are replying to, instead of training them in colon-counting. -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
