Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-21 Thread Thomas Dalton
2009/8/21 Anthony wikim...@inbox.org: If we are only interested in whether the most recent revision is vandalised then that is a simpler problem but would require a much larger sample to get the same quality of data. How much larger?  Do you know anything about this, or you're just guessing?

[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Robert Rohde
I am supposed to be taking a wiki-vacation to finish my PhD thesis and find a job for next year. However, this afternoon I decided to take a break and consider an interesting question recently suggested to me by someone else: When one downloads a dump file, what percentage of the pages are

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Sue Gardner
Robert, thanks for this. I have long wanted that number: it is really interesting. -Original Message- From: Robert Rohde raro...@gmail.com Date: Thu, 20 Aug 2009 03:06:06 To: Wikimedia Foundation Mailing Listfoundation-l@lists.wikimedia.org; English

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Maxwell
On Thu, Aug 20, 2009 at 6:06 AM, Robert Rohderaro...@gmail.com wrote: [snip] When one downloads a dump file, what percentage of the pages are actually in a vandalized state? Although you don't actually answer that question, you answer a different question: [snip] approximations:  I considered

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Marco Chiesa
On Thu, Aug 20, 2009 at 12:06 PM, Robert Rohderaro...@gmail.com wrote: Given the nature of the approximations I made in doing this analysis I suspect it is more likely that I have somewhat underestimated the vandalism problem rather than overestimated it, but as I said in the beginning I'd

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Jimmy Wales
Robert Rohde wrote: When one downloads a dump file, what percentage of the pages are actually in a vandalized state? This is equivalent to asking, if one chooses a random page from Wikipedia right now, what is the probability of receiving a vandalized revision? Is there a possibility of

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Jimmy Wales
Gregory Maxwell wrote: If you were using is gay as a measure of vandalism over time you might conclude that vandalism is decreasing when in reality cluebot is performing the same kind of analysis for its automatic vandalism suppression and the vandals have responded by vandalizing in forms

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Kohs
While the time and effort that went into Robert Rohde's analysis is certainly extensive, the outcomes are based on so many flawed assumptions about the nature of vandalism and vandalism reversion, publicize at one's peril the key finding of a 0.4% vandalism rate.

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Nathan
On Thu, Aug 20, 2009 at 12:59 PM, Gregory Kohs thekoh...@gmail.com wrote: While the time and effort that went into Robert Rohde's analysis is certainly extensive, the outcomes are based on so many flawed assumptions about the nature of vandalism and vandalism reversion, publicize at one's

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Maxwell
On Thu, Aug 20, 2009 at 12:46 PM, Jimmy Walesjwa...@wikia-inc.com wrote: [snip] Greg, I think your email sounded a little negative at the start, but not so much further down.  I think you would join me heartily in being super grateful for people doing this kind of analysis.  Yes, some of it

[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Erik Zachte
There is another way to detect 100% reverts. It won't catch manual reverts that are not 100 accurate but most vandal patrollers will use undo, and the like. For every revision calculate md5 checksum of content. Then you can easily look back say 100 revisions to see whether this checksum

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Kohs
Nathan said: ...but certainly its (sic) more informative than a Wikipedia Review analysis of a relatively small group of articles in a specific topic area. And you are certainly entitled to a flawed opinion based on incorrect assumptions, such as ours being a Wikipedia Review analysis. But,

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Andrew Gray
2009/8/20 Erik Zachte erikzac...@infodisiac.com: There is another way to detect 100% reverts. It won't catch manual reverts that are not 100 accurate but most vandal patrollers will use undo, and the like. For every revision calculate md5 checksum of content. Then you can easily look back

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Brian
On Thu, Aug 20, 2009 at 11:23 AM, Erik Zachte erikzac...@infodisiac.comwrote: There is another way to detect 100% reverts. It won't catch manual reverts that are not 100 accurate but most vandal patrollers will use undo, and the like. For every revision calculate md5 checksum of content.

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Nathan
On Thu, Aug 20, 2009 at 1:30 PM, Gregory Kohs thekoh...@gmail.com wrote: Nathan said: ...but certainly its (sic) more informative than a Wikipedia Review analysis of a relatively small group of articles in a specific topic area. And you are certainly entitled to a flawed opinion based on

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Kohs
Apologies to Nathan regarding the Wikipedia Review description. The analysis team was, indeed, recruited via Wikipedia Review; however, almost all of the participants in the research have now departed or reduced their participation in Wikipedia Review to such a degree, I don't personally consider

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 1:55 PM, Nathan nawr...@gmail.com wrote: My point (which might still be incorrect, of course) was that an analysis based on 30,000 randomly selected pages was more informative about the English Wikipedia than 100 articles about serving United States Senators. Any

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Andrew Gray
2009/8/20 Gregory Maxwell gmaxw...@gmail.com: Going back to your simple study now:  The analysis of vandalism duration and its impact on readers makes an assumption about readership which we know to be invalid. You're assuming a uniform distribution of readership: That readers are just as

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Robert Rohde
On Thu, Aug 20, 2009 at 2:10 PM, Anthonywikim...@inbox.org wrote: On Thu, Aug 20, 2009 at 1:55 PM, Nathan nawr...@gmail.com wrote: My point (which might still be incorrect, of course) was that an analysis based on 30,000 randomly selected pages was more informative about the English Wikipedia

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Thomas Dalton
2009/8/20 Jimmy Wales jwa...@wikia-inc.com: Robert Rohde wrote: When one downloads a dump file, what percentage of the pages are actually in a vandalized state? This is equivalent to asking, if one chooses a random page from Wikipedia right now, what is the probability of receiving a

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Alex
Robert Rohde wrote: Does anyone have a nice comprehensive set of page traffic aggregated at say a month level? The raw data used by stats.grok.se, etc. is binned hourly which opens one up to issues of short-term fluctuations, but I'm not at all interested in downloading 35 GB of hourly

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 6:36 PM, Robert Rohde raro...@gmail.com wrote: On Thu, Aug 20, 2009 at 2:10 PM, Anthonywikim...@inbox.org wrote: if one chooses a random page from Wikipedia right now, what is the probability of receiving a vandalized revision The best way to answer that question

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Thomas Dalton
2009/8/20 Anthony wikim...@inbox.org: I wouldn't suggest looking at the edit history at all, just the most recent revision as of whatever moment in time is chosen.  If vandalism is found, then and only then would one look through the edit history to find out when it was added. That only works

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 6:57 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/20 Anthony wikim...@inbox.org: I wouldn't suggest looking at the edit history at all, just the most recent revision as of whatever moment in time is chosen. If vandalism is found, then and only then would

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Robert Rohde
On Thu, Aug 20, 2009 at 3:57 PM, Thomas Daltonthomas.dal...@gmail.com wrote: 2009/8/20 Anthony wikim...@inbox.org: I wouldn't suggest looking at the edit history at all, just the most recent revision as of whatever moment in time is chosen.  If vandalism is found, then and only then would one

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Thomas Dalton
2009/8/21 Anthony wikim...@inbox.org: On Thu, Aug 20, 2009 at 6:57 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/20 Anthony wikim...@inbox.org: I wouldn't suggest looking at the edit history at all, just the most recent revision as of whatever moment in time is chosen.  If

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 7:13 PM, Robert Rohde raro...@gmail.com wrote: On Thu, Aug 20, 2009 at 3:57 PM, Thomas Daltonthomas.dal...@gmail.com wrote: 2009/8/20 Anthony wikim...@inbox.org: I wouldn't suggest looking at the edit history at all, just the most recent revision as of whatever

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 7:20 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/21 Anthony wikim...@inbox.org: On Thu, Aug 20, 2009 at 6:57 PM, Thomas Dalton thomas.dal...@gmail.com wrote: 2009/8/20 Anthony wikim...@inbox.org: I wouldn't suggest looking at the edit history at all,

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Thomas Dalton
2009/8/21 Anthony wikim...@inbox.org: My God.  If a few dozen people couldn't easily determine to a relatively high degree of certainty what portion of a mere 0.03% of Wikipedia's articles are *vandalized*, how useless is Wikipedia? I never said they couldn't. I said they couldn't do it by

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Thomas Dalton
2009/8/21 Anthony wikim...@inbox.org: Is this article vandalized? is a yes/no question... True, but that isn't actually the question that this research tried to answer. It tried to answer How much time has this article spent in a vandalised state?. If we are only interested in whether the most

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Robert Rohde
On Thu, Aug 20, 2009 at 4:37 PM, Anthonywikim...@inbox.org wrote: On Thu, Aug 20, 2009 at 7:13 PM, Robert Rohde raro...@gmail.com wrote: On Thu, Aug 20, 2009 at 3:57 PM, Thomas Daltonthomas.dal...@gmail.com wrote: 2009/8/20 Anthony wikim...@inbox.org: I wouldn't suggest looking at the edit

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 7:54 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/21 Anthony wikim...@inbox.org: Is this article vandalized? is a yes/no question... True, but that isn't actually the question that this research tried to answer. It tried to answer How much time has this

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 7:58 PM, Robert Rohde raro...@gmail.com wrote: You seem to be identifying all errors with vandalism. How so? Sometimes factual errors are simply unintentional mistakes. Obviously we can't know the intent of the person for sure, but after a mistake is found it's

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Kohs
Riddle me this... Is the edit below vandalism? http://en.wikipedia.org/w/index.php?title=Arch_Coaldiff=255482597oldid=255480884 Did the edit take a page and make it worse? Or, did it make the page a better available revision than the version immediately prior to it? Methinks the Wikipedia

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Mark Wagner
On Thu, Aug 20, 2009 at 14:10, Anthonywikim...@inbox.org wrote: On Thu, Aug 20, 2009 at 1:55 PM, Nathan nawr...@gmail.com wrote: My point (which might still be incorrect, of course) was that an analysis based on 30,000 randomly selected pages was more informative about the English Wikipedia

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 9:30 PM, Mark Wagner carni...@gmail.com wrote: On Thu, Aug 20, 2009 at 14:10, Anthonywikim...@inbox.org wrote: if one chooses a random page from Wikipedia right now, what is the probability of receiving a vandalized revision The best way to answer that question

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Kohs
Phil Nash wrote: Many editors undo and revert on the basis of felicity of language and emphasis, and unless it becomes an issue is an epiphenomenon of the encyclopedia that anyone can edit. so I can't see how this is a good example of anything in particular. And, with point proven, I rest my

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Gregory Kohs
And here is where many of the flaws of the University of Minnesota study were exposed: http://chance.dartmouth.edu/chancewiki/index.php/Chance_News_31#The_Unbreakable_Wikipedia.3F Their methodology of tracking the persistence of words was questionable, to say the least. And here was my favorite

Re: [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

2009-08-20 Thread Anthony
On Thu, Aug 20, 2009 at 11:02 PM, Gregory Kohs thekoh...@gmail.com wrote: And here was my favorite part: *We exclude anonymous editors from some analyses, because IPs are not stable: multiple edits by the same human might be recorded under different IPs, and multiple humans can share an IP.*