2009/8/21 Anthony wikim...@inbox.org:
If we are only interested in whether the most
recent revision is vandalised then that is a simpler problem but would
require a much larger sample to get the same quality of data.
How much larger? Do you know anything about this, or you're just guessing?
I am supposed to be taking a wiki-vacation to finish my PhD thesis and
find a job for next year. However, this afternoon I decided to take a
break and consider an interesting question recently suggested to me by
someone else:
When one downloads a dump file, what percentage of the pages are
Robert, thanks for this. I have long wanted that number: it is really
interesting.
-Original Message-
From: Robert Rohde raro...@gmail.com
Date: Thu, 20 Aug 2009 03:06:06
To: Wikimedia Foundation Mailing Listfoundation-l@lists.wikimedia.org;
English
On Thu, Aug 20, 2009 at 6:06 AM, Robert Rohderaro...@gmail.com wrote:
[snip]
When one downloads a dump file, what percentage of the pages are
actually in a vandalized state?
Although you don't actually answer that question, you answer a
different question:
[snip]
approximations: I considered
On Thu, Aug 20, 2009 at 12:06 PM, Robert Rohderaro...@gmail.com wrote:
Given the nature of the approximations I made in doing this analysis I
suspect it is more likely that I have somewhat underestimated the
vandalism problem rather than overestimated it, but as I said in the
beginning I'd
Robert Rohde wrote:
When one downloads a dump file, what percentage of the pages are
actually in a vandalized state?
This is equivalent to asking, if one chooses a random page from
Wikipedia right now, what is the probability of receiving a vandalized
revision?
Is there a possibility of
Gregory Maxwell wrote:
If you were using is gay as a measure of vandalism
over time you might conclude that vandalism is decreasing when in
reality cluebot is performing the same kind of analysis for its
automatic vandalism suppression and the vandals have responded by
vandalizing in forms
While the time and effort that went into Robert Rohde's analysis is
certainly extensive, the outcomes are based on so many flawed assumptions
about the nature of vandalism and vandalism reversion, publicize at one's
peril the key finding of a 0.4% vandalism rate.
On Thu, Aug 20, 2009 at 12:59 PM, Gregory Kohs thekoh...@gmail.com wrote:
While the time and effort that went into Robert Rohde's analysis is
certainly extensive, the outcomes are based on so many flawed assumptions
about the nature of vandalism and vandalism reversion, publicize at one's
On Thu, Aug 20, 2009 at 12:46 PM, Jimmy Walesjwa...@wikia-inc.com wrote:
[snip]
Greg, I think your email sounded a little negative at the start, but not
so much further down. I think you would join me heartily in being super
grateful for people doing this kind of analysis. Yes, some of it
There is another way to detect 100% reverts. It won't catch manual reverts
that are not 100 accurate but most vandal patrollers will use undo, and the
like.
For every revision calculate md5 checksum of content. Then you can easily
look back say 100 revisions to see whether this checksum
Nathan said:
...but certainly its (sic) more informative than a Wikipedia Review
analysis of a relatively small group of articles in a specific topic area.
And you are certainly entitled to a flawed opinion based on incorrect
assumptions, such as ours being a Wikipedia Review analysis. But,
2009/8/20 Erik Zachte erikzac...@infodisiac.com:
There is another way to detect 100% reverts. It won't catch manual reverts
that are not 100 accurate but most vandal patrollers will use undo, and the
like.
For every revision calculate md5 checksum of content. Then you can easily
look back
On Thu, Aug 20, 2009 at 11:23 AM, Erik Zachte erikzac...@infodisiac.comwrote:
There is another way to detect 100% reverts. It won't catch manual reverts
that are not 100 accurate but most vandal patrollers will use undo, and the
like.
For every revision calculate md5 checksum of content.
On Thu, Aug 20, 2009 at 1:30 PM, Gregory Kohs thekoh...@gmail.com wrote:
Nathan said:
...but certainly its (sic) more informative than a Wikipedia Review
analysis of a relatively small group of articles in a specific topic area.
And you are certainly entitled to a flawed opinion based on
Apologies to Nathan regarding the Wikipedia Review description. The
analysis team was, indeed, recruited via Wikipedia Review; however, almost
all of the participants in the research have now departed or reduced their
participation in Wikipedia Review to such a degree, I don't personally
consider
On Thu, Aug 20, 2009 at 1:55 PM, Nathan nawr...@gmail.com wrote:
My point (which might still be incorrect, of course) was that an analysis
based on 30,000 randomly selected pages was more informative about the
English Wikipedia than 100 articles about serving United States Senators.
Any
2009/8/20 Gregory Maxwell gmaxw...@gmail.com:
Going back to your simple study now: The analysis of vandalism
duration and its impact on readers makes an assumption about
readership which we know to be invalid. You're assuming a uniform
distribution of readership: That readers are just as
On Thu, Aug 20, 2009 at 2:10 PM, Anthonywikim...@inbox.org wrote:
On Thu, Aug 20, 2009 at 1:55 PM, Nathan nawr...@gmail.com wrote:
My point (which might still be incorrect, of course) was that an analysis
based on 30,000 randomly selected pages was more informative about the
English Wikipedia
2009/8/20 Jimmy Wales jwa...@wikia-inc.com:
Robert Rohde wrote:
When one downloads a dump file, what percentage of the pages are
actually in a vandalized state?
This is equivalent to asking, if one chooses a random page from
Wikipedia right now, what is the probability of receiving a
Robert Rohde wrote:
Does anyone have a nice comprehensive set of page traffic aggregated
at say a month level? The raw data used by stats.grok.se, etc. is
binned hourly which opens one up to issues of short-term fluctuations,
but I'm not at all interested in downloading 35 GB of hourly
On Thu, Aug 20, 2009 at 6:36 PM, Robert Rohde raro...@gmail.com wrote:
On Thu, Aug 20, 2009 at 2:10 PM, Anthonywikim...@inbox.org wrote:
if one chooses a random page from Wikipedia right now, what is the
probability of receiving a vandalized revision The best way to answer
that
question
2009/8/20 Anthony wikim...@inbox.org:
I wouldn't suggest looking at the edit history at all, just the most recent
revision as of whatever moment in time is chosen. If vandalism is found,
then and only then would one look through the edit history to find out when
it was added.
That only works
On Thu, Aug 20, 2009 at 6:57 PM, Thomas Dalton thomas.dal...@gmail.comwrote:
2009/8/20 Anthony wikim...@inbox.org:
I wouldn't suggest looking at the edit history at all, just the most
recent
revision as of whatever moment in time is chosen. If vandalism is found,
then and only then would
On Thu, Aug 20, 2009 at 3:57 PM, Thomas Daltonthomas.dal...@gmail.com wrote:
2009/8/20 Anthony wikim...@inbox.org:
I wouldn't suggest looking at the edit history at all, just the most recent
revision as of whatever moment in time is chosen. If vandalism is found,
then and only then would one
2009/8/21 Anthony wikim...@inbox.org:
On Thu, Aug 20, 2009 at 6:57 PM, Thomas Dalton thomas.dal...@gmail.comwrote:
2009/8/20 Anthony wikim...@inbox.org:
I wouldn't suggest looking at the edit history at all, just the most
recent
revision as of whatever moment in time is chosen. If
On Thu, Aug 20, 2009 at 7:13 PM, Robert Rohde raro...@gmail.com wrote:
On Thu, Aug 20, 2009 at 3:57 PM, Thomas Daltonthomas.dal...@gmail.com
wrote:
2009/8/20 Anthony wikim...@inbox.org:
I wouldn't suggest looking at the edit history at all, just the most
recent
revision as of whatever
On Thu, Aug 20, 2009 at 7:20 PM, Thomas Dalton thomas.dal...@gmail.comwrote:
2009/8/21 Anthony wikim...@inbox.org:
On Thu, Aug 20, 2009 at 6:57 PM, Thomas Dalton thomas.dal...@gmail.com
wrote:
2009/8/20 Anthony wikim...@inbox.org:
I wouldn't suggest looking at the edit history at all,
2009/8/21 Anthony wikim...@inbox.org:
My God. If a few dozen people couldn't easily determine to a relatively
high degree of certainty what portion of a mere 0.03% of Wikipedia's
articles are *vandalized*, how useless is Wikipedia?
I never said they couldn't. I said they couldn't do it by
2009/8/21 Anthony wikim...@inbox.org:
Is this article vandalized? is a yes/no question...
True, but that isn't actually the question that this research tried to
answer. It tried to answer How much time has this article spent in a
vandalised state?. If we are only interested in whether the most
On Thu, Aug 20, 2009 at 4:37 PM, Anthonywikim...@inbox.org wrote:
On Thu, Aug 20, 2009 at 7:13 PM, Robert Rohde raro...@gmail.com wrote:
On Thu, Aug 20, 2009 at 3:57 PM, Thomas Daltonthomas.dal...@gmail.com
wrote:
2009/8/20 Anthony wikim...@inbox.org:
I wouldn't suggest looking at the edit
On Thu, Aug 20, 2009 at 7:54 PM, Thomas Dalton thomas.dal...@gmail.comwrote:
2009/8/21 Anthony wikim...@inbox.org:
Is this article vandalized? is a yes/no question...
True, but that isn't actually the question that this research tried to
answer. It tried to answer How much time has this
On Thu, Aug 20, 2009 at 7:58 PM, Robert Rohde raro...@gmail.com wrote:
You seem to be identifying all errors with vandalism.
How so?
Sometimes factual errors are simply unintentional mistakes.
Obviously we can't know the intent of the person for sure, but after a
mistake is found it's
Riddle me this...
Is the edit below vandalism?
http://en.wikipedia.org/w/index.php?title=Arch_Coaldiff=255482597oldid=255480884
Did the edit take a page and make it worse? Or, did it make the page a
better available revision than the version immediately prior to it?
Methinks the Wikipedia
On Thu, Aug 20, 2009 at 14:10, Anthonywikim...@inbox.org wrote:
On Thu, Aug 20, 2009 at 1:55 PM, Nathan nawr...@gmail.com wrote:
My point (which might still be incorrect, of course) was that an analysis
based on 30,000 randomly selected pages was more informative about the
English Wikipedia
On Thu, Aug 20, 2009 at 9:30 PM, Mark Wagner carni...@gmail.com wrote:
On Thu, Aug 20, 2009 at 14:10, Anthonywikim...@inbox.org wrote:
if one chooses a random page from Wikipedia right now, what is the
probability of receiving a vandalized revision The best way to answer
that
question
Phil Nash wrote:
Many editors undo and revert on the basis of felicity of language and
emphasis, and unless it becomes an issue is an epiphenomenon of the
encyclopedia that anyone can edit. so I can't see how this is a good
example of anything in particular.
And, with point proven, I rest my
And here is where many of the flaws of the University of Minnesota study
were exposed:
http://chance.dartmouth.edu/chancewiki/index.php/Chance_News_31#The_Unbreakable_Wikipedia.3F
Their methodology of tracking the persistence of words was questionable, to
say the least.
And here was my favorite
On Thu, Aug 20, 2009 at 11:02 PM, Gregory Kohs thekoh...@gmail.com wrote:
And here was my favorite part:
*We exclude anonymous editors from some analyses, because IPs are not
stable: multiple edits by the same human might be recorded under different
IPs, and multiple humans can share an IP.*
39 matches
Mail list logo