Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-04 Thread Rui Correia
Thanks James Just out of curiosity, the other day I found two articles with a long section with identical wording, only names and numbers had been changed. Example: The town of ... has a population of .. . The town is know for its challenges in fighting poverty. According to local

[Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread James Heilman
The new and improved version of the copy and detection bot that we at [[WP: MED]] have been using for nearly a year [ https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready to be expanded to other topic areas. It can be found here [

Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread rubin.happy
Hi, James. Is the source code available anywhere? IF you want to try your bot in other languages, I could help you with testing in Russian Wikipedia :) Best regards. rubin16 2015-04-03 12:07 GMT+03:00 James Heilman jmh...@gmail.com: The new and improved version of the copy and detection bot

Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread Rui Correia
Hi James I often suspect copy-paste and find exact matches of the text elsewhere. However, whereas one can painstakingly (unless there is a trick that I am not aware of) ascertain when text was enetered into an article, it is not always possible to know when the other text first appeared on the

[Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread James Heilman
1) Yes the source code is available. User:Eran has posted it here https://github.com/valhallasw/plagiabot 2) This bot ONLY works on new edits within a couple of hours of them occurring. This reducing the number of false positives. It DOES NOT look at old edits. 3) This requires human follow up

Re: [Wikimedia-l] Copy and paste

2012-10-20 Thread Ray Saintonge
On 10/17/12 10:26 PM, James Heilman wrote: We really need a plagiarism detection tool so that we can make sure our sources are not simply copy and pastes of older versions of Wikipedia. Today I was happily improving our article on pneumonia as I have a day off. I came across a recommendation

Re: [Wikimedia-l] Copy and paste

2012-10-18 Thread Federico Leva (Nemo)
How hard would it be to set up a tool like the software that as far as I know the MIT uses to automatically check plagiarism among thesis etc. submitted to their digital library, checking the text of all Wikimedia projects against e.g. newspaper websites and Google Books, and then publishing

Re: [Wikimedia-l] Copy and paste

2012-10-18 Thread Tom Morris
On Thu, Oct 18, 2012 at 6:26 AM, James Heilman jmh...@gmail.com wrote: We really need a plagiarism detection tool so that we can make sure our sources are not simply copy and pastes of older versions of Wikipedia. Today I was happily improving our article on pneumonia as I have a day off. I