One missing item is:

 

Submit an application to the IRB.

 

Kerry

 

 

From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of siddhartha banerjee
Sent: Monday, 15 August 2016 8:17 AM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Research on automatically created articles

 

Hello,

 

Based on the discussion and suggestion in the Admin incidents page: 
https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incidents#their_results,
 I have gone to each of the articles (that still existed) and made corrections 
and changes necessary -- both in terms of the content written as well as 
unreliable sources. I have requested administrators to check if my edits still 
have issues, and I would go back and change anything else required. I guess my 
advisor would be posting to this thread only later this week, so before that I 
wanted to summarize all that I learnt during the discussion here and on the 
incidents page. 

 

1. Multiple accounts policy: Do not use multiple user accounts to post content. 

2. Research ethics:  There was a serious issue in assumptions made (even by 
other researchers as can be seen from the multiple papers mentioned who work in 
this area). Furthermore, when our previous work 
(https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-01-28/Recent_research)
 was mentioned on Wikimedia newsletter, it did not provide any indication to us 
about the issues with legitimacy about this kind of research. But, based on 
that, the assumptions were inappropriate. It is better to involve the WMF 
community by letting them know about any project prior to its start and 
engaging them such that best decisions could be taken and such similar 
situations do not arise.

As an administrator mentioned in the discussion and I think is very important 
to note: 'you not only denied the community the opportunity to decide whether 
we wish to allow/participate in this research, you precluded any efforts we 
might have made to minimize the disruption and affect a quick clean-up'. 

Based on the last few emails, it seems that IRB is waived, however, that waiver 
should be stamped (but this should be after the community has been informed of 
a task -- if a research might cause some disruption, it should not be done at 
any cost). Also, it would be better to create articles in a different 
namespace. The problem here was that clicking on red-links directly went to the 
article creation markup page -- which should have been put into draft space. 
But still, even creating drafts imply that other editors are looking at it, 
which should not be done without prior consent. Testing of any content should 
be done offline, and not on Wikipedia -- as it can potentially disrupt. Even 
with moderate quality content, it implies wastage of time for editors. I plan 
to bring all of these to the notice of the research committee who had approved 
this work such that similar issues do not happen in the future. Also, I plan to 
write on this and share this to the wider community who have worked or are 
working on similar problems [I am not sure if they have already been contacted 
by someone from WMF]. If they could be also roped into the discussion. that 
would be better is what I think. 

One thing I would quote from the discussion in the incident page:"Because 
researchers and institutions need to realize that this project is not a 
laboratory for their work, not unless they make an effort to work with the 
community" and this is also very important. 

My apologies for the extra work that had to be done by the numerous editors to 
edit the content and clean them -- that cannot be reverted now but can 
definitely be stopped in future. We did not add any content after Feb earlier 
this year and have promised in that discussion not to create anything more. If 
we want to do some analysis, we plan to use other crowdsourcing techniques 
(such as Amazon mech turk) and find out quality of the generated content. 

 

Please add anything you think that I have missed and also regarding the 
clean-up as I have tried to remove the irrelevant material from all the 
articles edited using the usernames. 

 

 

Thanks,
Sidd

 

 

 

 

 

On Fri, Aug 12, 2016 at 10:02 AM, siddhartha banerjee <sidd2...@gmail.com 
<mailto:sidd2...@gmail.com> > wrote:

Hi,

 

My advisor, Prof. Mitra is busy in travels this week. He said he will be 
posting to this thread about his thoughts later next week. 

 

Also, one thing he wanted me to mention here is the following: 

Although the content in the articles were generated by an algorithm, a human — 
I — took those articles and posted them online. We randomly chose few articles 
and checked whether any objectionable content was collected from the web. We 
planned to remove those before posting on Wikipedia. We did not create a bot 
that went and created the articles randomly. We generated the content offline 
and then copy-pasted the content of randomly selected articles. While 
objectionable content was decided to be removed, we did not make any changes to 
sentences anywhere other than that because that would void checking for 
linguistic consistency -- which was our soul purpose. Also, it was done in 
'good faith' and hence we just worked on bare minimum articles to get an idea , 
not let a bot create random junk. Our algo does not have the capability of 
judging whether the cited references (when we search on google) are reliable or 
not, but we thought that reviewers on Wikipedia would remove content from such 
links as well as references if they are unreliable. While some references were 
removed because of such reasons (eg https://en.wikipedia.org/wiki/Atripliceae), 
there were some articles removed saying promotional content (which, as well, 
our algo cannot really determine). 

 

Thanks for the comments here, we will keep them in mind if we do anything 
similar to this in the future, and I will try to inform other researchers who 
work in this area. 

 

Thanks,

Sidd

 

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to