[Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread James Heilman
The new and improved version of the copy and detection bot that we at [[WP:
MED]] have been using for nearly a year [
https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready
to be expanded to other topic areas.

It can be found here [
https://en.wikipedia.org/wiki/User:EranBot/Copyright/rc]. If you install
the common.js code it will give you buttons to click to indicate follow up
of concerns. Additionally one can sort the edits in question by
WikiProject. We are working to set up auto-archiving such that once
concerns are dealt with they will be removed from the main list.

We also want to have automatic compilation of data such as the frequency of
true positives and false positives generated by the bot. A blacklist of
sites that are know mirrors of Wikipedia is here [
https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist]. As this
list is improved / expanded the accuracy of the bot will improve. Many
thanks to [[User:ערן]] for his amazing work.

The bot also has  the potential to work in other languages.

-- 
James Heilman
MD, CCFP-EM, Wikipedian

The Wikipedia Open Textbook of Medicine
www.opentextbookofmedicine.com
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Introducing Kourosh Karimkhany, Vice President of Strategic Partnerships

2015-04-03 Thread Jens Best
Hi Peter,

The complete quote goes: There must be another way to work for the value
of free knowledge for the people but to destroy net neutrality and the
experience of an open web in the very beginning at the same time.

When it comes to schools and other educational organisations in developing
countries the project Wikipedia on a USB-Stick was a good idea to start
from I think. Something equally usuable for mobiles could be one direction
to think. But of course, as the Walled Wikipedia of WP0 this project isn't
really giving the full experience of an open and free wikipedia. So it
would be a pratical alternative for WP0 (without the dealing with the
access providers), but nothing more.

Apart from this existing project I described in a former discussion (and in
talks with e.g. Jan-Bart and others) that a more political initiative of a
public open knowledge project with delepment of a first framework could
be a midterm approach. In short: the public knowledge project would define
standard framework for content which has to be provided for free to
everybody for free use.
This could include different knowledge providing entities from public via
civil-societal to even free content of commercial providers. Every content
could be proved to fit the standards for open knowledge and  in different
countries different content providers would create the mix. The system
would be open and so it would be independent from the access providers.

It could be mandatory or non-mandatory for the access providers to offer
access to the public open knowledge project (which in essence would be a
list of registered websites you have full-functional access to), according
to what would be more appropiated for the actual market situation in the
country or area. The government could provide subsidies for the cost the
access providers have - it would be seen as cost for the cultural 
intellectual infrastructure of your country (like libraries, museums,
schools etc. today.) It would be a mixture between public service and
voluntary engagment of civil and commercial players framed by standards
which are discussed in an possibly multi-stakeholder forum regularly. Then
Wikipedia could be an important knot in a free public knowledge network
secured by laws, international cooperations and civil engagement.

This, of course, would first make the access providers cry out loud,
because of - as they would describe it - unbearable duties for single
telecoms. And surely it would need support by international community,
government and cooperation between the single access providers.
Also, in an absolutist way this would be a violation of net neutrality, but
it would be a violation that isn't driven by the intent to develop a market
with customers used to pay different prizes for different data types which
is the clear intent for which WP0 is misused in reality. Market isn't a
solution for everything. A open public knowledge project would establish an
area in the web which could be experienced as true publicness, as a truely
public place, created, operated and sustained by the triangle which makes
the public (state-people-business). It would be like a public web inside
the internet. Considering the commercialisation of the internet and the
access to it that could be an important counterbalance to the ongoing
development.

Well, this is just a quick thought and surely as ambitious as WP0 is in its
way, but its not always about only the ambition, but also about the path
you walk to reach the then version of what you thought is right in the
beginning. This project would be a real piece of work in strategic
multi-partnership and not some cheap play with some access providers
looking to enrichen their marketing bouquet with the beautiful Wikipedia
flower. It would truely mean to take all our values seriously and work on a
partnership that puts Wikipedia in the center of a network of free
knowledge that would deserve that name. It would mean to become an grown-up
organisation taking strategic professional care of the field it works and
leads in - free knowledge.

Apart from that quick idea I'm also not the only one this question should
be asked. And apart from all possible answers, WP0 still stays the wrong
path. Some things are already wrong even before you learn that their
numbers also don't work out. In the end WP0 is a tiny example about the
ethos of WMF. Do you believe market and entrepreneurship is always good for
your common target (like e.g. free knowledge) or does even
something anarchistic like the web has some structural framework - even
unrecognized in its beginnings - that make sure that openess is possible?
net neutrality isn't a religion (like some people here havong no godd
arguments on their own try to phrase), but net neutrality could be an
important piece of the framework which is needed to balance a network
structure which is ruled by the governments, by the companies and -
happily - by the people in the same time.

So far some 

Re: [Wikimedia-l] [Wikimedia Announcements] New Wikimedia Foundation report on activities in 2014

2015-04-03 Thread Aleksey Bilogur
Still, in my assessment it is lacking on concrete details. There are many
terms that are coined and movements cited which are not definitively
explained, in some cases with hints that the departments doing the
reporting have not themselves yet arrived at precise meaning. I suppose
that, like the entree to the the full-course meal, this is the limitation
to the medium: something to digest ahead of the full-course annual plan. An
overall sense is one of transition.

On Thu, Apr 2, 2015 at 8:19 PM, Risker risker...@gmail.com wrote:

 On 2 April 2015 at 17:48, Andreas Kolbe jayen...@gmail.com wrote:

  On Thu, Apr 2, 2015 at 8:31 PM, Katherine Maher kma...@wikimedia.org
  wrote:
 
   Hi all,
  
   Today the Wikimedia Foundation published a report on its activities in
   calendar year 2014. [...]
  
   Although the information in the report was originally gathered in
  response
   to an internal Foundation need, we planned to make it public as a
 report
   from the very beginning. It is intended to be relatively candid,
 sharing
   insight into where teams feel they have strengths and where they feel
  there
   are development areas. [...]
  
   We hope you find it interesting, and welcome your feedback.
  
   Thanks,
  
   Katherine
  
   [1] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant
   [2] Thanks to everyone at the Foundation who contributed so much great
   information to their various teams sections. And a special thanks to
  Juliet
   Barbara and Heather Walls who wrote and produced the whole thing!
  
  
 
 
  Thanks. This looks indeed like a candid report. If it's an indication of
 a
  change in communication style, I like it.
 
  Good to have it available on Meta as well as in pdf format (I think the
 pdf
  is very nicely done).
 
 

 I agree, pretty much.  This is probably the best 'big picture look at the
 WMF I have seen:  accomplishments, plans, honest assessments of
 challenges.  Thanks very much!

 Risker/Anne
 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Announcement: WMF to file suit against the NSA

2015-04-03 Thread Austin Hair
Okay, but seriously, please stop resurrecting this thread. If you
think it's important that something be done, start a new one, and
*actually suggest something* rather than just copying articles from
somewhere else.

Austin

On Fri, Apr 3, 2015 at 1:58 AM, Andreas Kolbe jayen...@gmail.com wrote:
 Article in Eurasianet today: Wikipedia Founder Distances Himself from
 Kazakhstan PR Machine

 http://www.eurasianet.org/node/72831

 ---o0o---

 [...]

 On March 20, Wikipedia founder Jimmy Wales hosted an Ask Me Anything
 http://www.reddit.com/r/IAmA/comments/2zpkxx/we_are_jameel_jaffer_of_the_aclu_wikipedia/cpl4maq
 conversation
 (AMA) on Reddit, a social-networking platform. Before long the audience was
 questioning Wales’s and Wikipedia’s roles in helping to improve
 Kazakhstan’s image. Back in 2011, Wales awarded
 http://www.eurasianet.org/node/66343 a once-and-future Kazakh government
 employee, Rauan Kenzhekhanuly, the inaugural “Wikipedian of the Year” for
 his work with WikiBilim, a Kazakh-language platform criticized both for
 receiving state funds and for publishing multiple articles toeing the
 authoritarian government’s line. At the time, Wales told EurasiaNet.org,
 “As far as I know, the WikiBilim organization is not politicized.”

 But during the AMA, Wales backpedaled on his decision to name Kenzhekhanuly
 the first Wikipedian of the Year.

 Wales was on the receiving end of a fresh round of criticism last year when
 Kenzhekhanuly was named deputy governor of Kazakhstan’s Kyzylorda
 region. During the AMA, a commenter asked Wales if he would have bestowed
 the award had he known Kenzhekhanuly would go on to serve as deputy
 governor. “If I had known in 2011 that someone would get a job that I
 disapprove of in 2014, would I refuse to give them an award in 2011?” Wales
 responded. “Yes, I would have refused to give that award.”

 Wales also clarified that Kenzhekhanuly “was not a government official” at
 the time of the award – which is, technically, true. However, according to
 Kenzhekhanuly’s LinkedIn profile
 https://www.linkedin.com/pub/rauan-kenzhekhanuly/24/8b7/b16, before
 receiving the award he had served both as a policy adviser to the governor
 in Kazakhstan’s Mangystau region, as well as first secretary at
 Kazakhstan’s embassy in Moscow. After the AMA, Wales said by email that he
 was “not aware” Kenzhekhanuly had held those positions.

 [...]

 ---o0o---
 ___
 Wikimedia-l mailing list, guidelines at: 
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread rubin.happy
Hi, James.

Is the source code available anywhere?
IF you want to try your bot in other languages, I could help you with
testing in Russian Wikipedia :)

Best regards.
rubin16

2015-04-03 12:07 GMT+03:00 James Heilman jmh...@gmail.com:

 The new and improved version of the copy and detection bot that we at [[WP:
 MED]] have been using for nearly a year [
 https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready
 to be expanded to other topic areas.

 It can be found here [
 https://en.wikipedia.org/wiki/User:EranBot/Copyright/rc]. If you install
 the common.js code it will give you buttons to click to indicate follow up
 of concerns. Additionally one can sort the edits in question by
 WikiProject. We are working to set up auto-archiving such that once
 concerns are dealt with they will be removed from the main list.

 We also want to have automatic compilation of data such as the frequency of
 true positives and false positives generated by the bot. A blacklist of
 sites that are know mirrors of Wikipedia is here [
 https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist]. As this
 list is improved / expanded the accuracy of the bot will improve. Many
 thanks to [[User:ערן]] for his amazing work.

 The bot also has  the potential to work in other languages.

 --
 James Heilman
 MD, CCFP-EM, Wikipedian

 The Wikipedia Open Textbook of Medicine
 www.opentextbookofmedicine.com
 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread Rui Correia
Hi James

I often suspect copy-paste and find exact matches of the text
elsewhere. However, whereas one can painstakingly (unless there is a
trick that I am not aware of)  ascertain when text was enetered into
an article, it is not always possible to know when the other text
first appeared on the internet to know for sure who coppied who. From
my limited knowledge, I believe that some trace of the date of upload
must be retained somewhere in the code - will this bot be able to pick
up on that and provide a date?

Thanks and congratulations to all involved and for sharing.

Regards,

Rui

2015-04-03 11:07 GMT+02:00 James Heilman jmh...@gmail.com:
 The new and improved version of the copy and detection bot that we at [[WP:
 MED]] have been using for nearly a year [
 https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready
 to be expanded to other topic areas.

 It can be found here [
 https://en.wikipedia.org/wiki/User:EranBot/Copyright/rc]. If you install
 the common.js code it will give you buttons to click to indicate follow up
 of concerns. Additionally one can sort the edits in question by
 WikiProject. We are working to set up auto-archiving such that once
 concerns are dealt with they will be removed from the main list.

 We also want to have automatic compilation of data such as the frequency of
 true positives and false positives generated by the bot. A blacklist of
 sites that are know mirrors of Wikipedia is here [
 https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist]. As this
 list is improved / expanded the accuracy of the bot will improve. Many
 thanks to [[User:ערן]] for his amazing work.

 The bot also has  the potential to work in other languages.

 --
 James Heilman
 MD, CCFP-EM, Wikipedian

 The Wikipedia Open Textbook of Medicine
 www.opentextbookofmedicine.com
 ___
 Wikimedia-l mailing list, guidelines at: 
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe



-- 
_
Rui Correia
Advocacy, Human Rights, Media and Language Work Consultant
Bridge to Angola - Angola Liaison Consultant

Mobile Number in South Africa +27 74 425 4186
Número de Telemóvel na África do Sul +27 74 425 4186
___

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

[Wikimedia-l] [Wikimedia Announcements] The Signpost -- Volume 11, Issue 13 -- 01 April 2015

2015-04-03 Thread Wikipedia Signpost
In focus: WMF's latest strategy document shows successes, vagueness, and the 
need for better data
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/In_focus

In the media: Wiki-PR duo bulldoze a piñata store; Wifione arbitration case; 
French parliamentary plagiarism
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/In_the_media

Traffic report: All over the place
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/Traffic_report

Featured content: Stop Press. ''Marie Celeste'' Mystery Solved. Crew Found 
Hiding In Wardrobe.
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/Featured_content


Single page view
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia Signpost/Single/2015-04-01

PDF version
http://en.wikipedia.org/wiki/Book:Wikipedia_Signpost/2015-04-01


https://www.facebook.com/wikisignpost / https://twitter.com/wikisignpost
--
Wikipedia Signpost Staff
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost

___
Please note: all replies sent to this mailing list will be immediately directed 
to Wikimedia-l, the public mailing list of the Wikimedia community. For more 
information about Wikimedia-l:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
___
WikimediaAnnounce-l mailing list
wikimediaannounc...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaannounce-l
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

[Wikimedia-l] Copy and Paste Detection Bot

2015-04-03 Thread James Heilman
1) Yes the source code is available. User:Eran has posted it here
https://github.com/valhallasw/plagiabot

2) This bot ONLY works on new edits within a couple of hours of them
occurring. This reducing the number of false positives. It DOES NOT look at
old edits.

3) This requires human follow up and common sense. One needs to make sure
that a) the source is not PD/CCBYSA b) that it is not wiki text that has
been moved around c) that the authors of both are not the same, etc

4) True positive rate is around 50% which is from my perspective good /
useful. This bot has flagged a lot of copyright issues would have been
missed otherwise.

-- 
James Heilman
MD, CCFP-EM, Wikipedian

The Wikipedia Open Textbook of Medicine
www.opentextbookofmedicine.com
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Announcing: The Wikipedia Prize!

2015-04-03 Thread Cristian Consonni
Hi Brian,

2015-03-30 0:25 GMT+02:00 Brian reflect...@gmail.com:
 Although the initial goal of the Netflix Prize was to design a
 collaborative filtering algorithm, it became notorious when the data was
 used to de-anonymize Netflix users. Researchers proved that given just a
 user's movie ratings on one site, you can plug those ratings into another
 site, such as the IMDB. You can then take that information, and with some
 Google searches and optionally a bit of cash (for websites that sell user
 information, including, in some cases, their SSN) figure out who they are.
 You could even drive up to their house and take a selfie with them, or
 follow them to work and meet their boss and tell them about their views on
 the topics they were editing.

somewhat tangentially, and to bring back this to topic to a more
scientific setting I would like to point out that there has already
been reasearch in the past on this topic.

I highly recommend reading the following paper:

Lieberman, Michael D., and Jimmy Lin. You Are Where You Edit:
Locating Wikipedia Contributors through Edit Histories. ICWSM. 2009.
(PDF 
http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/lieberman-lin.YouAreWhereYouEdit.ICWSM09.pdf)

For those of you that don't want to read the whole paper, you can find
a recap of the most relevant findings in this presentation by Maurizio
Napolitano:
http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew

The main idea is associating spatial coordinates to a Wikipedia
articles when possible, this articles are called geopages. Then you
extract from the history of articles the users which have edited a
geopage. If you plot the geopages edited by a given contributor you
can see that they tend to cluster, so you can define an edit area.
The study finds that 30-35% of contributors concentrate their edits in
an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the
area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)).

For another free/libre project with a geographic focus like
OpenStreetMap this is even more marked, check out for example this
tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by
Pascal Neis.

This, of course, is not a straightforward de-anonimization but this
methods work in principle for every contributor even if you obfuscate
their IP or username (provided that you can still assign all the edits
from a given user to a unique and univocal identifier)

C
[1] https://en.wikipedia.org/wiki/Square_degree
[2a] http://yosmhm.neis-one.org/
[2b] http://neis-one.org/2011/08/yosmhm/

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe