I think this conversation isn't going anywhere useful because everyone is using 
the same words but with different meanings. In particular "quality" ...

Edits can do a range of things (and often more than 1). The edits might relate 
to:
* the information content of an article (add/edit/remove propositions -- facts 
if you prefer -- about the topic, e.g. Fred Smith was born in 1770)
* the references (add/edit/delete the external sources that support a 
proposition)
* the presentation of the article, e.g. Structure of the article, spelling, 
grammar, appropriately nuanced selection among synonyms, clear prose, 
conformance to the Manual of Style, wikifying, etc

Each of these can have some kind of quality metrics attached (although most 
will be somewhat subjective -- "an article in the New York post is a better 
quality source than ...").

At the moment many in this conversation are using GA as the only quality 
metric. But I think we should see this as a goal not a binary "quality / not 
quality" metric. To achieve GA, clearly you need facts, verification, and 
presentation all in both quantity and quality.

Next who is an IP? Well, we know that IPs don't necessarily map to individual 
people and individual people do not map to a single IP. An IP edit might be 
done by someone who is a registered user (but too lazy to login -- I'm guilty 
of that), who may later become a registered user, or who may never be a 
registered user. 

I postulate that good faith IP edits are predominantly small edits of facts or 
localized edits of presentation (eg spelling). I postulate edits of logged-in 
users would be both large and small and involve facts, references and 
presentation, although clearly individual users may have their own particular 
profiles of edit behavior.

In particular to get an article to GA, you need one (or just a few people) to 
polish the writing (presentation). Getting a super-readable document with many 
voices is very difficult. Therefore I would expect that the final push to 
achieve GA would inevitably involved registered users and not IPs.

Also GA status is a concept and process that is very much "insider" knowledge 
about WP. Anonymous editors and low-activity editors are unlikely to have even 
heard about GA status so therefore are not going to be working toward it. Only 
the very active insiders would see it as their goal and therefore work towards 
it. So I think it is pointless to discuss contribution to quality in terms of 
who gets an article to GA status.

I think we do better to ask the question about the quality of an edit (or the 
set of edits done by a particular user) in terms of whether it adds "correct" 
information, references that support information, improves presentation. If 
someone adds a "fact" and that edit is later obliterated by a rewrite of a 
section but the information is retained (albeit in a different presentation), 
the original edit was still good quality even if it doesn't survive as a string 
of characters. I think the use of "edit survival" to measure the quality of an 
edit is failing to distinguish between information content and presentation, 
but I acknowledge that "edit survival" is easily measured and "information 
content survival" is not, but be cautious about using one as the proxy for 
another.

I think qualitative assessment of a set of randomly selected articles which 
analyses the contribution made by each individual edit in terms of:
* the quality of the article as it was immediately before and after the edit 
(immediate contribution)
* the quality of the article as it is today (overall contribution)

Is more likely to come up with better answers to the question of the 
contributions of anonymous edits, relative to low activity user editors, 
relative to high activity user editors.

For the purposes of this conversation, I am ignoring vandalism (and other bad 
faith behaviour) and edits to reverse them. 


Sent from my iPad

On 01/11/2012, at 10:08 AM, Laura Hale <[email protected]> wrote:

> 
> 
> On Thu, Nov 1, 2012 at 9:14 AM, Piotr Konieczny <[email protected]> wrote:
> I agree, having a high number of edit does not signify creating high quality 
> content - it may only attest to the high use of semi-automated tools for 
> minor edits.
> 
> I also don't dispute that anon's can contribute high quality content, and 
> they do a lot of edits. My point was:
> * anon's don't contribute significantly to most content on Wikipedia that 
> gets peer reviewed (as Pierre noted, by that time they've probably registered 
> anyway);
> * hence majority of Wikipedia's GA+ content is not written by anonymous 
> editors (but the GA+ content is only a small percentage of Wikipedia's total 
> content);
> 
> Do you have any evidence for  anons don't contribute significantly to content 
> that gets peer reviewed?  The reason it would appear they are not involved in 
> processes is because more often than they expressly prohibited from doing so. 
>  The implication here could be: IP addresses are contributing GA level 
> content but regular contributors are not monitoring articles where IP 
> addresses are doing lots of work and regular contributors are not supporting 
> taking of the work to the highest level.
> 
> http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=Samantha_Stosur
>  is one of the more active articles (which is admittedly crap) with a high IP 
> address ratio.  There are several highly active Wikipedia editors 
> contributing to it. 463 of the 749 editors are IP addresses.  Still, total 
> edits by registered editors outnumbers unregistered editors with 1,150 total 
> edits to 1,175.  Despite this, the volume of contributors are not actually 
> resulting in edits that work towards improving assessment.
> 
> A better analysis could be something like this: IP addresses are more likely 
> to represent a large editing population on an article that has higher 
> visibility and more traffic.  The quality of the contributions to these 
> articles is universally poor for registered and unregistered users.  At the 
> same time, wikipedia processes favour articles that have less visibility and 
> where there is less inherent conflict.  The necessity of covering a topic 
> comprehensively also serves as a barrier to taking these higher visibility 
> articles to GA as this is a challenge, and serves as a discouraging factor 
> for taking an article through processes.  GA, Peer Review and FAC favour more 
> narrow topics that are less visible and get less traffic.  This type of 
> article is likely to have a much small editing pool, and less likely to be 
> found by IP address editors.  (Example: Tennis articles have more IP address 
> edits than articles about sport shooting.)  This means IP addresses are less 
> likely to be actively contributing to these articles.  As processes 
> implicitly lock them out, there is little reason for these users to improve 
> per guidelines on these less visible articles.
> 
> Sincerely,
> Laura Hale
> 
> -- 
> twitter: purplepopple
> blog: ozziesport.com
> 
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to