Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-04 Thread Rui Correia
Thanks James Just out of curiosity, the other day I found two articles with a long section with identical wording, only names and numbers had been changed. Example: The town of ... has a population of .. . The town is know for its challenges in fighting poverty. According to local

[Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread James Heilman
The new and improved version of the copy and detection bot that we at [[WP: MED]] have been using for nearly a year [ https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready to be expanded to other topic areas. It can be found here [

Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread rubin.happy
Hi, James. Is the source code available anywhere? IF you want to try your bot in other languages, I could help you with testing in Russian Wikipedia :) Best regards. rubin16 2015-04-03 12:07 GMT+03:00 James Heilman jmh...@gmail.com: The new and improved version of the copy and detection bot

Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread Rui Correia
Hi James I often suspect copy-paste and find exact matches of the text elsewhere. However, whereas one can painstakingly (unless there is a trick that I am not aware of) ascertain when text was enetered into an article, it is not always possible to know when the other text first appeared on the

[Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread James Heilman
1) Yes the source code is available. User:Eran has posted it here https://github.com/valhallasw/plagiabot 2) This bot ONLY works on new edits within a couple of hours of them occurring. This reducing the number of false positives. It DOES NOT look at old edits. 3) This requires human follow up