Hi all & thanks, Tilman!

We did not analyze Arabic WP, but the tools released alongside the paper
could be used to produce the analysis.

One challenge w working w redirects (and a motivation for the short paper)
is that redirects are actually quite dynamic. They may exist for a time and
then be re-routed as the coverage of a topic changes and moves. Thus the
discussion in the paper of redirect "spells" and all the work to do
something more than just drop them from the analysis.

Also relevant to Reem's original concern and the subsequent discussion
here: page views accrue to redirects even when the *content* that is viewed
exists on the page that is the target of the redirect. Thus even a page
that gets few/zero edits may be viewed via a redirect in a way that the
usual page view data does not account for very precisely.

Not surprisingly, I agree with Tilman that it would be very interesting to
see how some of the comparisons/analyses discussed in this thread might
change w more precise accounting of redirects :)

later,
Aaron





On Thu, Sep 15, 2016 at 11:55 PM, Tilman Bayer <tba...@wikimedia.org> wrote:

> To Andrew's point about excluding redirects, see also this paper by
> Benjamin Mako Hill and Aaron Shaw (CCed): https://mako.cc/copyri
> ghteous/consider-the-redirect
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mako.cc_copyrighteous_consider-2Dthe-2Dredirect&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=IlZ2eRx3u467SjORHP-Mh58OKkhKnYQTQNg0tzegq8s&e=>
> (don't know if they have data for Arabic Wikipedia too)
>
> In short, the distribution of edits is very different for redirects and
> articles. In light of this, and to address Reem's original question, it's
> probably worth looking at the actual histogram before relying on the
> average or other statistical moments.
>
> Also interesting in this regard, although the data is not current:
> https://meta.wikimedia.org/wiki/Wikipedia_article_depth
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__meta.wikimedia.org_wiki_Wikipedia-5Farticle-5Fdepth&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=LtvFbabYBubeZ8WPg1IeqxBA4DlKTxYHS6wh_fJKWCQ&e=>
>
> On Thu, Sep 15, 2016 at 7:00 AM, Dan Andreescu <dandree...@wikimedia.org>
> wrote:
>
>> Good point, updated to *exclude redirects* and rerun:
>>
>> total_namespace_0_revisions: 457,574,404
>> total_namespace_0_pages: 5,236,104
>>
>> per namespace 0 non-redirect article:
>>
>> standard deviation of edits: *324.45*
>> *average* edits: *87.54*
>> standard deviation of days between first and last edit: *1360.16*
>> *average* days between first and last edit: *2316.37*
>>
>> So you were right, Andrew, numbers change, but I think the nature of the
>> data is roughly the same.  It's interesting that average difference between
>> first and last edit is smaller than two standard deviations.  That suggests
>> that curve is also slightly lopsided, with perhaps lots of more recently
>> created articles and few long lived ones.  But that "recent" could be the
>> spike in the 2007-2011 period.  It may be interesting to play with these
>> metrics more, and I'll keep this in mind as we build the new infrastructure
>> (making these queries as fast as possible and easy to dig into).
>>
>> On Wed, Sep 14, 2016 at 6:18 PM, Andrew Gray <andrew.g...@dunelm.org.uk>
>> wrote:
>>
>>> Hi Dan,
>>>
>>> Thanks for running these!
>>>
>>> I'm struck by the figure of 12.8m pages in ns0 - it looks like this
>>> includes redirects (there are ~7.6m ns0 redirects on enwiki, and ~5.2m
>>> articles). This will probably skew things a lot, as the majority of
>>> those will probably be edited once and never touched again, barring
>>> the target page being moved,. Given they're ~60% of the pages, this
>>> will introduce a lot of extra weight for "articles with very few
>>> edits" and "articles that get edited very infrequently".
>>>
>>> It might be worth trying to filter out redirects - I suspect this
>>> would have a noticeable effect on both the distribution and the mean
>>> time between edits.
>>>
>>> Andrew.
>>>
>>> On 14 September 2016 at 22:01, Dan Andreescu <dandree...@wikimedia.org>
>>> wrote:
>>> > Quick follow up 'cause I was curious.  I calculated the average and
>>> standard
>>> > deviation for edits per namespace 0 article on enwiki.  I tried to do
>>> it on
>>> > the research db replicas but it took forever so I did it on the hadoop
>>> > cluster.  Including archived pages isn't useful, doesn't change the
>>> results
>>> > almost at all.  Including pages outside namespace 0 increases the
>>> standard
>>> > deviation and decreases the average.  Here are the results:
>>> >
>>> > 484,170,218 edits on namespace 0
>>> > 12,756,342 pages in namespace 0
>>> >
>>> > standard deviation for edits per page: 213.58
>>> > average edits per page: 38.02
>>> > average days between first and last edit per page: 1215.27
>>> >
>>> > So considering the standard deviation is much larger than the mean, I'm
>>> > pretty confident to answer yes, I think the vast majority of articles
>>> in
>>> > namespace 0 on enwiki get very few edits.  The dataset we're working on
>>> > releasing as part of wikistats 2.0 will allow these kinds of questions
>>> to be
>>> > answered really easily and really quickly.  Stay tuned over the next
>>> few
>>> > quarters :)
>>> >
>>> > And the queries:
>>> > https://gist.github.com/milimetric/8b5f447e3ef09b6fe4384e0f75cc0b34
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_milimetric_8b5f447e3ef09b6fe4384e0f75cc0b34&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=tZiiFOrvp_wXAQqniwoYPD51bARlVtAY-j83JOyH5I0&e=>
>>> >
>>> > If you want to edit those queries to find something else out, I'm
>>> happy to
>>> > run them one or two more times, but then I really have to get back to
>>> my
>>> > real job :)
>>> >
>>> > On Wed, Sep 7, 2016 at 12:42 PM, Andrew Gray <
>>> andrew.g...@dunelm.org.uk>
>>> > wrote:
>>> >>
>>> >> Hi Reem,
>>> >>
>>> >> Here's some rough estimates.
>>> >>
>>> >> English - https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stats.wikimedia.org_EN_TablesWikipediaEN.htm&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=GWcO1Ao1xIQ0_iMknSYLEIsaWKod1SYhqtEUcDhSamM&e=>
>>> >>
>>> >> English has ~5.2 million articles, with an average of ~92 edits per
>>> >> article, not counting deleted edits (or deleted articles). Note that
>>> 80% of
>>> >> those articles are more than three years old, so they've had plenty
>>> of time
>>> >> to build up the 92 edits.
>>> >>
>>> >> [The page does not explicitly say that only article edits are counted
>>> in
>>> >> the tables, but this is easy to confirm -
>>> >> https://en.wikipedia.org/wiki/Wikipedia:Statistics
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Wikipedia-3AStatistics&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=1xHXHMpKl8Y3RGQuHMgc_GpU9wKbpxaaa5YG7yXJvFg&e=>
>>> has 847m edits]
>>> >>
>>> >> Arabic - https://stats.wikimedia.org/EN/TablesWikipediaAR.htm
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stats.wikimedia.org_EN_TablesWikipediaAR.htm&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=og7uA4JMbfqmQJJgPZjMTnQdLwuk7BmoJRth0iDtmvw&e=>
>>> >>
>>> >> Arabic has ~437k articles, ~31 edits/article - but only half of these
>>> are
>>> >> more than three years old, so they're on average a lot younger than
>>> the
>>> >> English ones.
>>> >>
>>> >> As of July there are 3.3m edits/month in English - this is equal to an
>>> >> average of 0.63 edits/article/month - and 226k edits/month in Arabic,
>>> equal
>>> >> to 0.52 edits/article/month. July was a slow month for Arabic, and
>>> March had
>>> >> more than twice as many edits, 487k, across 415k articles.
>>> >>
>>> >> These are plain averages. The distribution is going to be very
>>> skewed, so
>>> >> high-edit articles get most of the attention, and the other articles
>>> easily
>>> >> go months without attention. If we assume an 80:20 distribution -
>>> which is a
>>> >> wild guess but sounds plausible - then the "long tail" of 80% of
>>> articles
>>> >> would get 20% of the edits. In this case, a plausible average would
>>> be:
>>> >>
>>> >> * English long tail, 4.16m articles and 660k edits/month = average of
>>> six
>>> >> months between each edit
>>> >> * Arabic (July) long tail, 350k articles and 45k edits/month =
>>> average of
>>> >> seven or eight months between each edit
>>> >> * Arabic (March) long tail, 332k articles and 97k edits/month =
>>> average of
>>> >> three and a half months between each edit
>>> >>
>>> >> This is a broad range, but it feels more or less right for all those
>>> >> unloved pages...
>>> >>
>>> >> Andrew.
>>> >>
>>> >>
>>> >> On 7 September 2016 at 14:52, Reem Al-Kashif <reemalkas...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I always hear people saying that most of the articles usually
>>> receive
>>> >> > little
>>> >> > to no edits (and that is used to encourage participants to make sure
>>> >> > their
>>> >> > articles are good enough). I would like to know if there are
>>> statistics
>>> >> > that
>>> >> > support this for the English and Arabic Wikipedia.
>>> >> >
>>> >> > Best,
>>> >> > Reem
>>> >> >
>>> >> > --
>>> >> > Kind regards,
>>> >> > Reem Al-Kashif
>>> >> >
>>> >> > _______________________________________________
>>> >> > Analytics mailing list
>>> >> > Analytics@lists.wikimedia.org
>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.wikimedia.org_mailman_listinfo_analytics&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=JMT-D53xJZ_A__mcewSJT8YeYLWOO317gID13W6WT2c&e=>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> - Andrew Gray
>>> >>   andrew.g...@dunelm.org.uk
>>> >>
>>> >>
>>> >> --
>>> >> - Andrew Gray
>>> >>   andrew.g...@dunelm.org.uk
>>> >>
>>> >> _______________________________________________
>>> >> Analytics mailing list
>>> >> Analytics@lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.wikimedia.org_mailman_listinfo_analytics&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=JMT-D53xJZ_A__mcewSJT8YeYLWOO317gID13W6WT2c&e=>
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > Analytics mailing list
>>> > Analytics@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.wikimedia.org_mailman_listinfo_analytics&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=JMT-D53xJZ_A__mcewSJT8YeYLWOO317gID13W6WT2c&e=>
>>> >
>>>
>>>
>>>
>>> --
>>> - Andrew Gray
>>>   andrew.g...@dunelm.org.uk
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.wikimedia.org_mailman_listinfo_analytics&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=JMT-D53xJZ_A__mcewSJT8YeYLWOO317gID13W6WT2c&e=>
>>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.wikimedia.org_mailman_listinfo_analytics&d=CwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=5T1m457ccPNe4uGPnX3ndiOsilo26W92WAyBpZOEMJw&m=jQ-uVMkCVBSN5KvFgfUNjm7UV6_wgZ_EkWPywucIABo&s=JMT-D53xJZ_A__mcewSJT8YeYLWOO317gID13W6WT2c&e=>
>>
>>
>
>
> --
> Tilman Bayer
> Senior Analyst
> Wikimedia Foundation
> IRC (Freenode): HaeB
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to