Hi Morten,

Thanks a lot for your advice and code!!!

Shiyue

2016-06-16 0:08 GMT+08:00 Morten Wang <[email protected]>:

> Hi Shiyue,
>
> Whether you choose to use a set time period (e.g. 6 months like Kittur &
> Kraut) or use assessment changes as your criteria, there are additional
> factors you'll have to consider. If you use a set time period there are at
> least three issues you'll need to consider: 1) do articles change quality
> at the same pace?  2) how long before the start of your time period did an
> article get its assessment?  3) what happened to your article between its
> assessment and the start of your time period?
>
> If you instead choose to use rating changes, you have the issue that those
> happen at different times, so you'll have to control for the time lapsed
> between them if you're comparing articles to each other, as well as perhaps
> trying to figure out if an article has an inherent probability for change.
> As long as you consider these types of related issues and control for them,
> your approach should be sane.
>
> I've put the code up on Github:
> https://github.com/nettrom/assessments/blob/master/clean-training-set.py
> It uses a few support files that are all in the same repository (
> https://github.com/nettrom/assessments): assessment.py, db.py, and
> revisions.py
>
> Since I have a Tool Labs[1] account and Pywikibot[2] already set up, the
> code is written to use the replicated databases for fetching revisions and
> such, and Pywikibot as the library to interact with Wikipedia's API.
> Neither of those are hard requirements, you can use the API instead of the
> database access, and switch Pywikibot out with your favourite way of
> accessing the API :)  It also uses mwparserfromhell[3] to parse the
> wikitext.  I don't know of a better parser to use, but if you have one feel
> free to use that instead.
>
>
> References:
> 1: https://tools.wmflabs.org
> 2: https://www.mediawiki.org/wiki/Manual:Pywikibot
> 3: http://mwparserfromhell.readthedocs.io/en/latest/
>
>
> Cheers,
> Morten
>
>
> On 13 June 2016 at 09:03, Shiyue Zhang <[email protected]> wrote:
>
>> Hi Morten,
>>
>> Thanks a lot for your reply!!! I have read your paper: Tell me more: An
>> actionable quality model for Wikipedia. Thanks for introducing me your
>> another work in CSCW 2015, I will read it later.
>>
>> I saw your data. As you mentioned, it only has the revisions when the
>> assessment changed. But, I prefer to get all of the revisions between 2
>> assessment changes, since I want to study what makes the quality change and
>> to predict the quality change. Before, I consider to adopt Kittur et al's
>> formalization of quality changes in 6 months [1]. The problem is I cannot
>> get the precise quality at the start and end point of 6-month period. Now I
>> think I can take the period between 2 assessment changes, though it is
>> also not a perfect answer, if articles are not regularly assessed, as Kerry
>> and Andrew mentioned.
>>
>> I know you have a lot of experience in Wikipedia quality research. Could
>> you give me some advices or references about the quality change study? And
>> it cannot be more great if you could give me your Python code to get the
>> data. I can modify it to get the data I need. Thanks a lot!
>>
>> References:
>> Kittur A, Kraut R E. Harnessing the wisdom of crowds in wikipedia:
>> quality through coordination[C]// ACM Conference on Computer Supported
>> Cooperative Work. ACM, 2008:37-46.
>>
>> Cheers,
>> Shiyue
>>
>>
>>
>>
>> 2016-06-10 23:20 GMT+08:00 Morten Wang <[email protected]>:
>>
>>> Hi Shiyue,
>>>
>>> The issues around assessments that have been brought up are valid and
>>> useful to keep in mind when trying to build machine learners that do
>>> quality predictions. That being said, ORES quality classifier[1] is (AFAIK)
>>> trained on a dataset[2] that I've gathered based on the method I used to
>>> get a dataset to train the classifier used in our CSCW 2015 paper[3]. The
>>> revisions that are in that dataset were gathered by taking a snapshot of
>>> the quality assessment classes and then walking backwards through the talk
>>> page revision history to find the time when the assessment changed, and
>>> then grabbing the revision of the article at that timestamp. If you want
>>> Python code instead of the dataset, let me know.
>>>
>>> The team behind ORES has also been working on writing scripts that'll do
>>> assessment extractions (see for instance [4]), in case you want to process
>>> a dump and get all of them. So far our experience with that is that it
>>> leads to slightly lower performance. Although we're uncertain as to why, my
>>> guess is that the dataset is noisier, perhaps due to changing quality
>>> criteria as Andrew points to.
>>>
>>> Please do get in touch if you have any questions!
>>>
>>> References:
>>> 1: https://meta.wikimedia.org/wiki/ORES/wp10
>>> 2:
>>> https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406
>>> 3:
>>> http://www-users.cs.umn.edu/~morten/publications/cscw2015-improvementprojects.pdf,
>>> see Appendix A for info on the classifier
>>> 4:
>>> https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/extractors/enwiki.py
>>>
>>> Cheers,
>>> Morten
>>>
>>>
>>> On 10 June 2016 at 00:59, Andrew Gray <[email protected]> wrote:
>>>
>>>> Hi Shiyue,
>>>>
>>>> I agree with Kelly - these ratings probably won't do what you need, in
>>>> that case. Sorry!
>>>>
>>>> We simply don't have the people (or the enthusiasm) required to do
>>>> regular updates and I'd guess many are well over five years 'stale' since
>>>> last rating - and most will only ever have been rated once.
>>>>
>>>> There's a second complicating factor for old ratings - not only are
>>>> they stale, but the general standards for that rating might have changed.
>>>> (See eg
>>>> http://www.generalist.org.uk/blog/2010/quality-versus-age-of-wikipedias-featured-articles/
>>>> for a demonstration of that last point - it would be interesting to use
>>>> ORES to do a bigger sample)
>>>>
>>>> Andrew.
>>>> On 10 Jun 2016 07:13, "Shiyue Zhang" <[email protected]> wrote:
>>>>
>>>>> Hi Kerry,
>>>>>
>>>>> Thanks a lot for your reply! Honestly, I am not aware of the problem
>>>>> you mentioned that many wikiprojects don't do regular quality assessment.
>>>>> This problem really matters to me, because I want to get the relatively
>>>>> true quality of a revision of an article. I know Aaron's automated quality
>>>>> assessment tool, but it is also based on a machine learning classifier,
>>>>> which is also my goal to automatically predict quality, especially quality
>>>>> change. So I can't take the results of this tool as my ground truth.
>>>>>
>>>>> 2016-06-10 12:16 GMT+08:00 Kerry Raymond <[email protected]>:
>>>>>
>>>>>> If you are not aware of it, many wikiprojects don’t do any kind of
>>>>>> regular quality assessment. Often an article is project-tagged and 
>>>>>> assessed
>>>>>> when it’s new (which generally means the quality is assessed 
>>>>>> stub/start/C)
>>>>>> and then it’s never re-assessed unless someone working on it is trying to
>>>>>> get it to GA or similar and hence actively requests assessment.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So it’s easy for an article to be much better quality (or even much
>>>>>> worse quality, although that’s probably less likely) than its current
>>>>>> assessment.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think you might do better to use Aaron’s automated quality
>>>>>> assessment tool and apply it to different versions of a set of article 
>>>>>> and
>>>>>> see how that changes over time. Whatever the deficiencies of an automated
>>>>>> tool, I suspect it’s still more reliable than the human processes that we
>>>>>> actually have. But I guess it depends on whether the focus of your study 
>>>>>> is
>>>>>> the quality of articles or is it the process of assessing the quality of
>>>>>> articles? My sense is that you are interested in the former rather than 
>>>>>> the
>>>>>> latter.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Kerry
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Wiki-research-l [mailto:
>>>>>> [email protected]] *On Behalf Of *Shiyue
>>>>>> Zhang
>>>>>> *Sent:* Friday, 10 June 2016 12:42 PM
>>>>>> *To:* Research into Wikimedia content and communities <
>>>>>> [email protected]>
>>>>>> *Subject:* Re: [Wiki-research-l] How to get the exact date when an
>>>>>> article get a quality promotion?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Pine,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for your reply. Yes, it is English Wikipedia. Exactly I want
>>>>>> to get the timestamp of an article's quality rating change. I know
>>>>>> the particular diffs shouldn't be considered as the reason why quality
>>>>>> rating change. I'm trying to get a prediction of quality change beyond a
>>>>>> certain time period, so I need the start and end quality of the time
>>>>>> period.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I hope anyone have the experience on this problem can give me some
>>>>>> advice. Thanks a lot!!!
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2016-06-10 9:47 GMT+08:00 Pine W <[email protected]>:
>>>>>>
>>>>>> Hi Zhang,
>>>>>>
>>>>>> Is this for English Wikipedia?
>>>>>>
>>>>>> You can probably use automation to find the timestamp of an article's
>>>>>> quality rating change on English Wikipedia. Other people on this list
>>>>>> probably know how to do this, and they may comment here.
>>>>>>
>>>>>> However, that does not imply that any paricular diffs should be
>>>>>> considered to have a quality that is equivalent to the quality of the
>>>>>> article. Measuring the quality of diffs is an inexact science, but you
>>>>>> might want to take a look at Revision Scoring. Aaron Halfaker can tell 
>>>>>> you
>>>>>> more about how useful, or not, Revision Scoring is for measuring the
>>>>>> quality of diffs. Hopefully he will respond to this email.
>>>>>>
>>>>>> Pine
>>>>>>
>>>>>> On Jun 9, 2016 18:29, "Shiyue Zhang" <[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm doing research on Wikipedia article quality, and I take advantage
>>>>>> of WikiProject Assessments. But I can only get the latest quality level 
>>>>>> of
>>>>>> an article. I wonder how to  get the quality of each revision, or how to
>>>>>> get the exact date when an article get a quality promotion, for example,
>>>>>> from A-class to FA-class.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I really need your help! Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Zhang Shiyue
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Zhang Shiyue
>>>>>>
>>>>>> *Tel*: +86 18801167900
>>>>>>
>>>>>> *E-mail*: [email protected], [email protected]
>>>>>>
>>>>>> State Key Laboratory of Networking and Switching Technology
>>>>>>
>>>>>> No.10 Xitucheng Road, Haidian District
>>>>>>
>>>>>> Beijing University of Posts and Telecommunications
>>>>>>
>>>>>> Beijing, China.
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiki-research-l mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiki-research-l mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Zhang Shiyue
>>>>>>
>>>>>> *Tel*: +86 18801167900
>>>>>>
>>>>>> *E-mail*: [email protected], [email protected]
>>>>>>
>>>>>> State Key Laboratory of Networking and Switching Technology
>>>>>>
>>>>>> No.10 Xitucheng Road, Haidian District
>>>>>>
>>>>>> Beijing University of Posts and Telecommunications
>>>>>>
>>>>>> Beijing, China.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiki-research-l mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Zhang Shiyue
>>>>>
>>>>> *Tel*: +86 18801167900
>>>>>
>>>>> *E-mail*: [email protected], [email protected]
>>>>>
>>>>> State Key Laboratory of Networking and Switching Technology
>>>>>
>>>>> No.10 Xitucheng Road, Haidian District
>>>>>
>>>>> Beijing University of Posts and Telecommunications
>>>>>
>>>>> Beijing, China.
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>>
>> --
>>
>> Zhang Shiyue
>>
>> *Tel*: +86 18801167900
>>
>> *E-mail*: [email protected], [email protected]
>>
>> State Key Laboratory of Networking and Switching Technology
>>
>> No.10 Xitucheng Road, Haidian District
>>
>> Beijing University of Posts and Telecommunications
>>
>> Beijing, China.
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 

Zhang Shiyue

*Tel*: +86 18801167900

*E-mail*: [email protected], [email protected]

State Key Laboratory of Networking and Switching Technology

No.10 Xitucheng Road, Haidian District

Beijing University of Posts and Telecommunications

Beijing, China.
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to