Re: [Wikimedia-l] Siamese networks and image classification

2018-01-30 Thread Pine W
Hi John,

I think that these questions might be best asked on the AI mailing list, so
I'm copying this thread to that list. I would suggest that further
discussion should take place on the AI mailing list, because that's where
the WMF experts on AI subjects are likely to participate.

Thanks for your interest in these questions. I too would be interested in
automating image categorization for Commons. I think that full automation
of image categorization would be a long term project, perhaps over the
course of decades, but there may be ways to make incremental progress. :)

Pine 


On Wed, Jan 24, 2018 at 1:20 PM, John Erling Blad  wrote:

> I had a plan to do a more thorough description on meta, but my plans are
> always to optimistic… ;)
>
> Question 1: I wonder if anyone has done any work on categorization of
> images on Commons by using neural nets.
> Question 2: I wonder if anyone has used Siamese networks to do more general
> categorization of images.
>
> A Siamese network is a kind of network used for face recognition. It is
> called Siamese network because it is two parallel networks in its simplest
> form, but one of the networks are precomputed so in effect you only
> evaluates one of the networks. Usually you have one face provided by a
> camera and run through one network, and a number of other precomputed faces
> for the other network. When the difference is small, then you have a face
> match.
>
> In general you can have any kind of image as long as you can train a kind
> of fingerprint type of vector. This fingerprint will then acts like
> locality-sensitive hashing [1]. Because you can create this LSH from the
> vector, you can store alternate models in database, so you can later search
> it and find usable models. Those models can then interpret the vector form
> of the fingerprint, and by evaluating those models with actual input (ie
> the fingerprint vector) you can get an estimate for how probable it is that
> the image belong to a specific category.
>
> What you would get from the previous process is a list of probable
> categorizations. That list can be sorted, and a user trying to categorize
> an image can then chose to select some of the categories.
>
> Note that the output of the network is not just a few categories, it can be
> a variant number of models that outputs a probability for a specific
> category. It is a bit like a map-reduce, where the you find (filter) the
> possible models and then evaluate (map) those models with the fingerprint
> vector.
>
> It is perhaps not obvious, but the idea behind Siamese networks are two
> parallel networks, but the implementation is with a single active network.
> Also the implemented network has a first part that computes the fingerprint
> vector (kind of a bottleneck network) and the second part are the stored
> models that takes this fingerprint vector and calculates a single output
> for the probability of a single category.
>
> [1] https://en.wikipedia.org/wiki/Locality-sensitive_hashing
>
> On Mon, Jan 22, 2018 at 6:47 AM, Pine W  wrote:
>
> > Hi John,
> >
> > I am having a little trouble with understanding your email from January
> > 15th. Could you perhaps state your question or point in a different way?
> >
> > Pine 
> > 
> >
> > On Mon, Jan 15, 2018 at 5:55 AM, John Erling Blad 
> > wrote:
> >
> > > This is the same as the entry on the wishlist for 2016, but describes
> the
> > > actual method.
> > > https://meta.wikimedia.org/wiki/2016_Community_Wishlist_
> > > Survey/Categories/Commons#Use_computer_vision_to_propose_categories
> > >
> > > Both contrastive and triplet loss can be used while learning, but
> neither
> > > are described at Wikipedia.
> > >
> > > On Sun, Jan 14, 2018 at 8:16 PM, Pine W  wrote:
> > >
> > > > Hi John,
> > > >
> > > > I have not heard of an initiative to use Siamese neural networks for
> > > image
> > > > classifications on on Commons. You might make a suggestion on the AI,
> > > > Research, and/or Commons mailing lists regarding this idea. You might
> > > also
> > > > make a suggestion in IdeaLab
> > > > .
> > > >
> > > > Pine 
> > > > 
> > > >
> > > > On Sun, Jan 14, 2018 at 3:46 AM, John Erling Blad 
> > > > wrote:
> > > >
> > > > > Has anyone tried to use a Siamese neural network for image
> > > classification
> > > > > at Commons? I don't know if it will be good enough to run in
> > autonomous
> > > > > mode, but it will probably be a huge help for those that do manual
> > > > > classification.
> > > > >
> > > > > Imagine a network 

[Wikimedia-l] RfC about account creation logs (was "Welcome messages at arwiki")

2018-01-30 Thread Pine W
Hello colleagues,

I have started an RfC here: https://meta.wikimedia.org/
wiki/Requests_for_comment/Account_creation_logs

This RfC is not intended to cast a negative light on communities and
individuals who are making good-faith efforts to welcome new users by
placing welcome messages on the new users' talk pages. The emphasis on the
RfC is on whether changes should be made to the privacy of account creation
logs, which appears to affect the privacy of logged-in users who are
reading Wikimedia wikis which they have not previously visited while logged
in. I made proposals in the RfC regarding how new users may be welcomed.

Regards,

Pine 

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Update on Wikimedia vs. NSA

2018-01-30 Thread Nadine Le Lirzin
Thank you, Katherine. At once for such a precise update and for going
ahead.
Also many thanks to all who dedicate time and work to this case.

Nadine Le Lirzin
Wikimédia France


On Tue, 30 Jan 2018 at 01:42, Philippe Beaudette 
wrote:

> Great update, thank you.
>
> On Mon, Jan 29, 2018 at 4:14 PM, Katherine Maher 
> wrote:
>
> > Hi all,
> >
> > I’d like to share an update and next steps in our lawsuit against the
> U.S.
> > National Security Agency (NSA), Wikimedia Foundation v. NSA.[1] As you’ll
> > recall, in March 2015, the Wikimedia Foundation joined eight other
> > plaintiffs in filing a suit in United States Federal District Court
> against
> > the NSA[2] and the Department of Justice,[3] among others. We have been
> > represented pro bono[4] by the American Civil Liberties Union (ACLU)[5]
> and
> > the Knight First Amendment Institute at Columbia University.[6] The law
> > firm Cooley LLP[7] has also been serving as pro bono co-counsel for the
> > Foundation.
> >
> > Since we’re coming on the three-year anniversary, I wanted to offer a
> > reminder of why we filed this suit. Our challenge supports the
> foundational
> > values of our movement: the right to freedom of expression and access to
> > information. Free knowledge requires freedom of inquiry, particularly in
> > the case of challenging and unpopular truths. Each day people around the
> > world engage with difficult and controversial subjects on Wikipedia and
> > other Wikimedia projects. Pervasive mass surveillance brings the threat
> of
> > reprisal, creates a chilling effect, and undermines the freedoms upon
> which
> > our projects and communities are founded. In bringing this suit, we
> joined
> > a tradition of knowledge stewards who have fought to preserve the
> integrity
> > of intellectual inquiry.
> >
> > Our lawsuit challenges dragnet surveillance by the NSA, specifically the
> > large-scale seizing and searching of Internet communications frequently
> > referred to as “Upstream” surveillance.[8] The U.S. government is tapping
> > directly into the internet’s “backbone”[9]—the network of high-capacity
> > cables, switches, and routers that carry domestic and international
> > communications—and seizing and searching virtually all text-based
> internet
> > communications flowing into and out of the United States. It’s this
> > backbone that connects the global Wikimedia community to our projects.
> > These communications are being seized and searched without any
> requirement
> > that there be suspicion, for example, that the communications have a
> > connection to terrorism or national security threats.
> >
> > Last May, we reached an important milestone: a Federal Court of
> Appeals[10]
> > in the United States ruled[11] that the Foundation alone had plausibly
> > alleged “standing”[12] to proceed with our claims that Upstream mass
> > surveillance seizes and searches of the online communications of
> Wikimedia
> > users, contributors and Foundation staff in violation of the U.S.
> > Constitution and other laws. The Court of Appeals’ ruling means that we
> are
> > the sole remaining plaintiff among the nine original co-plaintiffs. There
> > is still a long road ahead, but this intermediate victory makes this case
> > one of the most important vehicles for challenging the legality of this
> > particular NSA surveillance practice.
> >
> > As a result of our win in the appellate court, we are now proceeding to
> the
> > next stage of the case: discovery.[13] In the U.S. court system, parties
> > use the discovery stage to exchange evidence and ask each other questions
> > about their claims. We have requested information and documents from the
> > government, and they have made similar requests from us. The entire
> phase,
> > which will also involve research and reports from experts, is expected to
> > last the next few months.
> >
> > As part of our commitment to privacy, I want you to know about what this
> > stage of the case means for our data retention practices. Our goal in
> > bringing this lawsuit was to protect user information. In this case, like
> > other litigation in which we engage, we may sometimes be legally required
> > to preserve some information longer than the standard 90-day period in
> our
> > data retention guidelines. These special cases are acknowledged and
> > permitted by our privacy and data retention policies.[14]
> >
> > As always, however, we remain committed to keeping user data no longer
> than
> > legally necessary. We never publish the exact details of
> litigation-related
> > data retention, as part of our legal strategy to keep personal data safe.
> > And we defend any personal data from disclosure to the maximum extent,
> > taking both legal and technical measures to do so. We are keeping
> sensitive
> > material encrypted and offline, and we have the support of the
> experienced
> > legal teams at the ACLU and Cooley in ensuring its safety and