Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Anthony
On Fri, Aug 28, 2009 at 12:43 AM, Brion Vibber br...@wikimedia.org wrote: On 8/27/09 9:39 PM, Thomas Dalton wrote: 2009/8/28 Gregory Maxwellgmaxw...@gmail.com: If the results of this kind of study have good agreement with mechanical proxy metrics (such as machine detected vandalism) our

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Anthony
On Fri, Aug 28, 2009 at 10:08 AM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/28 Anthony wikim...@inbox.org: If you're going to do it, maybe we should work on a rough-consensus objective definition of vandalism before you release the file, though... Don't we have a consensus

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Robert Rohde
On Thu, Aug 27, 2009 at 9:43 PM, Brion Vibberbr...@wikimedia.org wrote: snip Robert, is it possible to share the source for generating the revert-based stats with other folks who may be interested in pursuing further work on the subject? Thanks! Not as a complete stand-alone entity. The

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Robert Rohde
On Fri, Aug 28, 2009 at 3:55 AM, Anthonywikim...@inbox.org wrote: snip Once we have the list, anyone is free to examine it any way they want, and show their results.  But we're talking about probably less than 200 instances of vandalism here, so it'll be quite easy (and fun) to lambaste

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Lars Aronsson
Anthony wrote: Umm...you would count the number of instances of vandalism? Is the question how to objectively *define* vandalism? On one hand, we have a perception, as expressed by media (and by CEO Sue Gardner, I believe), that vandalism (especially of biographies of living people, BLP)

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-28 Thread Anthony
On Fri, Aug 28, 2009 at 3:44 PM, Lars Aronsson l...@aronsson.se wrote: We can try to find out which edits are reverts, assuming that the previous edit was an act of vandalism. But that's a bad assumption. It gives both false positives and false negatives, and it gives a significant number of

[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Robert Rohde
Recently, I reported on a simple study of how likely one was to encounter recent vandalism in Wikipedia based on selecting articles at random and using revert behavior as a proxy for recent vandalism. http://lists.wikimedia.org/pipermail/foundation-l/2009-August/054171.html One of the key

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Andrew Turvey
, Portugal Subject: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data Recently, I reported on a simple study of how likely one was to encounter recent vandalism in Wikipedia based on selecting articles at random and using revert behavior as a proxy for recent vandalism

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Robert Rohde
I've just read two different news stories on Flagged Revisions that described vandalism as a growing problem for Wikipedia. With that in mind, I would like to highlight one specific point in the analysis I just did. The frequency of reverts to articles -- as a fraction of total edits -- has

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
1:00 edit1:02 revert 1:06 revert 1:14 revert 1:30 revert 2:02 revert How many instances of vandalism does your program count there, and what is the mean and median time to revert? ___ foundation-l mailing list foundation-l@lists.wikimedia.org

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 2:40 PM, Robert Rohde raro...@gmail.com wrote: I've just read two different news stories on Flagged Revisions that described vandalism as a growing problem for Wikipedia. With that in mind, I would like to highlight one specific point in the analysis I just did. The

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/27 Anthony wikim...@inbox.org: Why do you assume that number of reverts has any correlation with amount of vandalism?  Has this been studied? It seems to be a sensible assumption, although checking it would be wise. I would put money on a significant majority of reverts being reverts of

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/27 Anthony wikim...@inbox.org: Why do you assume that number of reverts has any correlation with amount of vandalism? Has this been studied? It seems to be a sensible assumption, although checking it

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 2:58 PM, Anthony wikim...@inbox.org wrote: On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton thomas.dal...@gmail.comwrote: I would put money on a significant majority of reverts being reverts of vandalism rather than BRD reverts, it may not be an overwhelming majority,

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Chad
On Thu, Aug 27, 2009 at 3:33 PM, Anthonywikim...@inbox.org wrote: On Thu, Aug 27, 2009 at 2:58 PM, Anthony wikim...@inbox.org wrote: On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton thomas.dal...@gmail.comwrote: I would put money on a significant majority of reverts being reverts of

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Alex
Anthony wrote: On Thu, Aug 27, 2009 at 2:58 PM, Anthony wikim...@inbox.org wrote: On Thu, Aug 27, 2009 at 2:50 PM, Thomas Dalton thomas.dal...@gmail.comwrote: I would put money on a significant majority of reverts being reverts of vandalism rather than BRD reverts, it may not be an

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 3:45 PM, Chad innocentkil...@gmail.com wrote: /rvv?|revert(ing)?[ ]*(vandal(ism)?)?/ Might give you a slightly wider sample. I'll wait for Robert to release a random sample of edits he actually identified as reverts and/or the actual scripts and data dump he used.

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Stephen Bain
On Fri, Aug 28, 2009 at 4:58 AM, Anthonywikim...@inbox.org wrote: It seems to me to be begging the question. You don't answer the question how bad is vandalism by assuming that vandalism is generally reverted. Can you suggest a better metric then? -- Stephen Bain stephen.b...@gmail.com

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain stephen.b...@gmail.comwrote: On Fri, Aug 28, 2009 at 4:58 AM, Anthonywikim...@inbox.org wrote: It seems to me to be begging the question. You don't answer the question how bad is vandalism by assuming that vandalism is generally reverted.

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony wikim...@inbox.org: On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain stephen.b...@gmail.comwrote: On Fri, Aug 28, 2009 at 4:58 AM, Anthonywikim...@inbox.org wrote: It seems to me to be begging the question.  You don't answer the question how bad is vandalism by assuming

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 8:24 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/28 Anthony wikim...@inbox.org: On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain stephen.b...@gmail.com wrote: On Fri, Aug 28, 2009 at 4:58 AM, Anthonywikim...@inbox.org wrote: It seems to me to be

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Gregory Maxwell
On Thu, Aug 27, 2009 at 8:24 PM, Thomas Daltonthomas.dal...@gmail.com wrote: 2009/8/28 Anthony wikim...@inbox.org: On Thu, Aug 27, 2009 at 7:58 PM, Stephen Bain stephen.b...@gmail.comwrote: On Fri, Aug 28, 2009 at 4:58 AM, Anthonywikim...@inbox.org wrote: It seems to me to be begging the

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony wikim...@inbox.org: He means what would you measure in order to draw conclusions about the severity of vandalism. Umm...you would count the number of instances of vandalism? That's not practical. That would require a person to go through article histories revision by

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Gregory Maxwell gmaxw...@gmail.com: This is somewhat labor intensive, but only somewhat as it doesn't take an inordinate number of samples to produce representative results. This should be the gold standard for this kind of measurement as it would be much closer to what people

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 8:36 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/28 Anthony wikim...@inbox.org: He means what would you measure in order to draw conclusions about the severity of vandalism. Umm...you would count the number of instances of vandalism? That's not

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony wikim...@inbox.org: On Thu, Aug 27, 2009 at 8:36 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/28 Anthony wikim...@inbox.org: He means what would you measure in order to draw conclusions about the severity of vandalism. Umm...you would count the number of

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 8:41 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/8/28 Anthony wikim...@inbox.org: On Thu, Aug 27, 2009 at 8:36 PM, Thomas Dalton thomas.dal...@gmail.com wrote: 2009/8/28 Anthony wikim...@inbox.org: He means what would you measure in order to draw

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Thomas Dalton
2009/8/28 Anthony wikim...@inbox.org: I suggested a better approach last time we had this thread: statistical sampling. This research was based on a sample. What are you talking about? ___ foundation-l mailing list foundation-l@lists.wikimedia.org

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]]. Of those 10 instances of vandalism, either 2 or 4 would not have been found by the automated tool described. 2 if every edit summary containing the word vandalism is counted as vandalism, and 4 if not. The former would

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Nathan
On Thu, Aug 27, 2009 at 9:47 PM, Anthony wikim...@inbox.org wrote: Just took a quick sample of 10 instances of vandalism to [[Ted Stevens]]. Of those 10 instances of vandalism, either 2 or 4 would not have been found by the automated tool described. 2 if every edit summary containing the

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Anthony
On Thu, Aug 27, 2009 at 10:07 PM, Nathan nawr...@gmail.com wrote: Out of curiosity, Anthony, do you still refrain from editing Wikimedia projects over licensing issues? How long has it been, a year? I guess now is as good a time as any to admit it. I started editing again, without logging

Re: [Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

2009-08-27 Thread Brion Vibber
On 8/27/09 9:39 PM, Thomas Dalton wrote: 2009/8/28 Gregory Maxwellgmaxw...@gmail.com: If the results of this kind of study have good agreement with mechanical proxy metrics (such as machine detected vandalism) our confidence in those proxies will increase, if they disagree it will provide an