Re: [Wikimedia-l] Solve legal uncertainty of Wikidata

2018-05-19 Thread Rob Speer
I would like to not limit the discussion to interwiki links; it also
applies to Wikipedia infoboxes and Wiktionary tables, for example.

On Thu, 17 May 2018 at 20:55 Denny Vrandečić  wrote:

> Rob Speer wrote:
> > The result of this, by the way, is that commercial entities sell modified
> > versions of Wikidata with impunity. It undermines the terms of other
> > resources such as DBPedia, which also contains facts extracted from
> > Wikipedia and respects its Share-Alike terms. Why would anyone use
> DBPedia
> > and have to agree to share alike, when they can get similar data from
> > Wikidata which promises them it's CC-0?
>
> The comparison to DBpedia is interesting: the terms for DBpedia state
> "Attribution in this case means keep DBpedia URIs visible and active
> through at least one (preferably all) of @href, , or "Link:". If
> live links are impossible (e.g., when printed on paper), a textual
> blurb-based attribution is acceptable."
> http://wiki.dbpedia.org/terms-imprint
>
> So according to these terms, when someone displays data from DBpedia, it is
> entirely sufficient to attribute DBpedia.
>
> What that means is that DBpedia follows exactly the same theory as
> Wikidata: it is OK to extract data from Wikipedia and republish it as your
> own dataset under your own copyright without requiring attribution to the
> original source of the extraction.
>
> (A bit more problematic might be the fact that DBpedia also republishes
> whole paragraphs of Text under these terms, but that's another story)
>
> My understanding is that all that Wikidata has extracted from Wikipedia is
> non-copyrightable in the first place and thus republishing it under a
> different license (or, as in the case of DBpedia for simple triples, with a
> different attribution) is legally sound.
>
> If there is disagreement with that, I would be interested which content
> exactly is considered to be under copyright and where license has not been
> followed on Wikidata.
>
> For completion: the discussion is going on in parallel on the Wikidata
> project chat and in Phabricator:
>
> https://phabricator.wikimedia.org/T193728#4212728
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikipedia_and_other_Wikimedia_projects
>
>
> I would appreciate if we could keep the discussion in a single place.
>
> Gnom1 on Phabricator has offered to actually answer legal questions, but we
> need to come up with the questions that we want to ask. If it should be,
> for example, as Rob Speer states on the bug, "has the copyright of
> interwiki links been breached by having them be moved to Wikidata?", I'd be
> quite happy with that question - if that's the disagreement, let us ask
> Legal help and see if my understanding or yours is correct.
>
> Does this sound like a reasonable question? Or which other question would
> you like to ask instead?
>
>
> On Thu, May 17, 2018 at 4:15 PM Rob Speer  wrote:
>
> > > As always, copyright is predatory. As we can prove that copyright is
> the
> > enemy of science and knowledge
> >
> > Well, this kind of gets to the heart of the issue, doesn't it.
> >
> > I support the Creative Commons license, including the share-alike term,
> > which requires copyright in order to work, and I've contributed to
> multiple
> > Wikimedia projects with the understanding that my work would be protected
> > by CC-By-SA.
> >
> > Wikidata is engaged in a project-wide act of disobedience against
> CC-By-SA.
> > I would say that GerardM has provided an excellent summary of the
> attitude
> > toward Creative Commons that I've encountered on Wikidata: "it's holding
> us
> > back", "it's the enemy", "you can't copyright knowledge", "you can't make
> > us follow it", etc.
> >
> > The result of this, by the way, is that commercial entities sell modified
> > versions of Wikidata with impunity. It undermines the terms of other
> > resources such as DBPedia, which also contains facts extracted from
> > Wikipedia and respects its Share-Alike terms. Why would anyone use
> DBPedia
> > and have to agree to share alike, when they can get similar data from
> > Wikidata which promises them it's CC-0?
> >
> > On Wed, 16 May 2018 at 21:43 Gerard Meijssen 
> > wrote:
> >
> > > Hoi,
> > > Thank you for the overly broad misrepresentation. As always, copyright
> is
> > > predatory. As we can prove that copyright is the enemy of science and
> > > knowledge we should not be upset that *copyright *is abused we should
> &

Re: [Wikimedia-l] Solve legal uncertainty of Wikidata

2018-05-17 Thread Rob Speer
> As always, copyright is predatory. As we can prove that copyright is the
enemy of science and knowledge

Well, this kind of gets to the heart of the issue, doesn't it.

I support the Creative Commons license, including the share-alike term,
which requires copyright in order to work, and I've contributed to multiple
Wikimedia projects with the understanding that my work would be protected
by CC-By-SA.

Wikidata is engaged in a project-wide act of disobedience against CC-By-SA.
I would say that GerardM has provided an excellent summary of the attitude
toward Creative Commons that I've encountered on Wikidata: "it's holding us
back", "it's the enemy", "you can't copyright knowledge", "you can't make
us follow it", etc.

The result of this, by the way, is that commercial entities sell modified
versions of Wikidata with impunity. It undermines the terms of other
resources such as DBPedia, which also contains facts extracted from
Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia
and have to agree to share alike, when they can get similar data from
Wikidata which promises them it's CC-0?

On Wed, 16 May 2018 at 21:43 Gerard Meijssen 
wrote:

> Hoi,
> Thank you for the overly broad misrepresentation. As always, copyright is
> predatory. As we can prove that copyright is the enemy of science and
> knowledge we should not be upset that *copyright *is abused we should
> welcome it as it proves the point. Also when we use texts from everywhere
> and rephrase it in Wikipedia articles "we" are not lily white either.
>
> In "them old days" generally we felt that when people would use Wikipedia,
> it would only serve our purpose; share the sum of all knowledge. I still
> feel really good about that. And, it has been shown that what we do;
> maintain / curate / update that data that it is not easily given to do as
> well as "we" do it.
>
> When we are to be more precise with our copyright, there are a few things
> we could do to make copyright more transparent. When data is to be uploaded
> (Commons / Wikipedia or Wikidata) we should use a user that is OWNED and
> operated by the copyright holder. The operation may be by proxy and as a
> consequence there is no longer a question about copyright as the copyright
> holder can do as we wants. This makes any future noises just that,
> annoying.
>
> As to copyright on Wikidata, when you consider copyright using data from
> Wikipedia. The question is: "What Wikipedia" I have copied a lot of data
> from several Wikipedias and believe me, from a quality point of view there
> is much to be gained by using Wikidata as an instrument for good because it
> is really strong in identifying friends and false friends. It is superior
> as a tool for disambiguation.
>
> About the copyright on data, the overriding question with data is: do you
> copy data wholesale in Wikidata. That is what a database copyright is
> about. As I wrote on my blog [1], the best data to include is data that is
> corroborated by the fact that it is present in multiple sources. This
> negates the notion of a single source, it also underscores that much of the
> data everywhere is replicated a lot. It also underscores, again, the notion
> that data that is only present in single sources is what needs attention.
> It needs tender loving care, it needs other sources to establish
> credentials. That is in its own right what makes any claim of copyright
> moot. It is in this process that it becomes a "creative" process negating
> the copyright held on databases.
>
> I welcome the attention that is given to copyright in Wikidata. However our
> attention to copyright is predatory in two ways. It is how can we get
> around existing copyright and how can we protect our own.  As argued,
> Wikidata shines when it is used for what it is intended to be; the place
> that brings data, of Wikipedias first and elsewhere second, together to be
> used as a repository of quality, open and linked data.
> Thanks,
>GerardM
>
> [1]
>
> https://ultimategerardm.blogspot.nl/2018/05/wikidata-copyright-and-linked-data.html
>
> On 11 May 2018 at 23:10, Rob Speer  wrote:
>
> > Wow, thanks for the heads up. When I was getting upset about projects
> that
> > change the license on Wikimedia content and commercialize it, I had no
> idea
> > that Wikidata was providing them the cover to do so. The Creative Commons
> > violation is coming from inside the house!
> >
> > On Tue, 8 May 2018 at 03:48 mathieu stumpf guntz <
> > psychosl...@culture-libre.org> wrote:
> >
> > > Hello everybody,
> > >
>

Re: [Wikimedia-l] Solve legal uncertainty of Wikidata

2018-05-13 Thread Rob Speer
Wow, thanks for the heads up. When I was getting upset about projects that
change the license on Wikimedia content and commercialize it, I had no idea
that Wikidata was providing them the cover to do so. The Creative Commons
violation is coming from inside the house!

On Tue, 8 May 2018 at 03:48 mathieu stumpf guntz <
psychosl...@culture-libre.org> wrote:

> Hello everybody,
>
> There is a phabricator ticket on Solve legal uncertainty of Wikidata
>  that you might be interested
> to look at and participate in.
>
> As Denny suggested in the ticket to give it more visibility through the
> discussion on the Wikidata chat
> <
> https://www.wikidata.org/wiki/Wikidata:Project_chat#Importing_datasets_under_incompatible_licenses>,
>
> I thought it was interesting to highlight it a bit more.
>
> Cheers
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Amazon Echo' use of Wikipedia; CC license compliance?

2018-04-18 Thread Rob Speer
Right, this worries me too.

I know that Wikimedia doesn't enforce the copyright on the content
themselves, because they don't hold the relevant copyrights, the authors
do. But there seems to be no guidance for what _anyone_ can do to address
and correct large-scale violations. The guides on Wikipedia meta-pages are
about "here's what to do if someone copies content without following the
license", but not "here's what to do if someone copies _all_ the content
without following the license". Asking for takedowns of particular pages
that I was directly involved in, one at a time, would be silly and less
than effective.

Here I'm thinking of things more brazen than the Google Knowledge Graph --
projects that combine multiple CC-By-SA resources together, claim ownership
over the content, and sell it.

I'm not asking Wikimedia to do all the work. But I'd at least like to hear
what has worked and what hasn't worked in enforcing copyright on Wikimedia
projects. If the answer is "nothing works", that doesn't bode well for
Creative Commons data.

On Sun, 15 Apr 2018 at 19:53 Anthony Cole  wrote:

> Is someone from WMF monitoring wikimedia-l and notifying relevant employees
> when an issue arises under their remit? This issue - big companies using
> our writing without attribution and like-licensing - has been hanging with
> no word from the WMF for six months.
>
> Anthony Cole
>
>
> On Thu, Apr 5, 2018 at 6:22 PM, Anthony Cole  wrote:
>
> > I see this from Brian Heater at Tech Crunch on 25 March:
> >
> > "In a conversation earlier this week, Wikimedia’s Chief Revenue Office,
> > Lisa Gruwell told TechCrunch that this sort of usage doesn’t constitute
> any
> > sort of formal relationship. Most companies more or less hook into an API
> > to utilize that breadth of knowledge. It’s handy for sure, and *it’s all
> > well within Wikimedia’s fair use rules*, but as with Maher’s letter, the
> > CRO expressed some concerns about seemingly one-sided relationships ...
> *Smart
> > assistants are certainly playing by the applicable rules when it comes to
> > leveraging that information base.*"[1]
> >
> > That article I link to has both Katherine (WMF ED) and Lisa (Chief
> Revenue
> > Officer) asking the companies who use our work for free to "give back." I
> > want them to give back too, but I don't absolve them of their obligation
> to
> > meaningfully attribute my work and share it with the same rights
> attached.
> > If it is the opinion of the WMF that these smart assistants are not
> > breaching my rights, I'd like to see the legal advice that opinion is
> based
> > on.
> >
> > 1.https://techcrunch.com/2018/03/24/are-corporations-that-
> > use-wikipedia-giving-back/
> >
> > Anthony Cole
> >
> >
> > On Thu, Apr 5, 2018 at 5:47 PM, WereSpielChequers <
> > werespielchequ...@gmail.com> wrote:
> >
> >> Yes of course the WMF can contact those who are detected reusing our
> >> content without fully complying with licenses and encourage them to
> >> comply.
> >>
> >> If a case were to go to court it would need to have one or more
> >> contributors who were willing to cooperate with WMF legal in the case.
> But
> >> I doubt there would be a shortage of contributors who were keen to do
> so.
> >>
> >> As for why the WMF should do so, here are three reasons:
> >>
> >> Each of our wikis is a crowd sourced project. Crowd sourcing requires a
> >> crowd, if a crowd settles down and stabilises it becomes a community.
> The
> >> community is broadly stable, but we need a steady flow of new
> wikimedians,
> >> and our only really effective way of recruiting new Wikimedians is for
> >> them
> >> to see the edit button on our sites. An increasing shift to our content
> >> being used without attribution is an existential threat to the project
> and
> >> hence to the WMF.
> >>
> >> Our communities are made up of volunteers with diverse motivations. For
> >> some of us the BY-SA part of the licensing is important, personally I
> feel
> >> good when i see one of my photos used by someone else but attributed to
> >> me.
> >> If the de facto policy of the WMF was to treat volunteer contributions
> as
> >> effectively CC0 this would be demotivating for some members of our
> >> community. I'm also active on another site where every member regularly
> >> gets stats on their readership, something I very much doubt would happen
> >> if
> >> it wasn't an effective mechanism to encourage continued participation.
> >>
> >> Every organisation needs money, the WMF gets most of its money by asking
> >> for it on wikipedia and other sites. Again, encouraging attribution back
> >> to
> >> Wikipedia etc tackles the existential threat of other sites treating
> >> wikipedia et al as CC0.
> >>
> >>
> >> WSC
> >>
> >> On 5 April 2018 at 08:04, 
> >> wrote:
> >>
> >> >
> >> >
> >> >
> >> >
> >> > Hi,
> >> >
> >> > On 04/04/2018 08:36 PM, Anthony Cole wrote:
> >> > > I'm curious also. I release my articles under "attribution, share
> >> alike"
> >> > > and rely on WM

Re: [Wikimedia-l] BabelNet is remixing Wikimedia content without following CC-By-SA terms

2018-04-13 Thread Rob Speer
Everipedia sounds even worse, because they sound like the kind of
move-fast-and-break-laws blockchain startup that thinks the legal system is
something that happens to other people. But  Roberto Navigli is a respected
academic and presumably has some interest in following the law, if he can
be convinced that his self-serving interpretation of the law will not hold
up.

Again, there has to be a process that's been followed before, right?
BabelNet and Everipedia can't be the first instances of people dumping all
the data from Wikimedia projects into their own projects without following
the license.

Another interesting twist: the CC-By-NC-SA download they offered to "people
wanting to use BabelNet for research purposes" has been taken offline "for
the Easter holiday", which approximately coincides with when Navigli
responded to my e-mail, but unless Easter is a very long holiday in Italy I
suspect that it's gone for the indefinite future. So they aren't sharing
_anything_ anymore.

I believe that what BabelNet needs to do is:

- Change the license of BabelNet from CC-By-NC-SA 3.0 to CC-By-SA 4.0
- Add attribution and license information to their images (or remove the
image galleries)
- Relicense or remove the dependencies of BabelNet that have non-commercial
licenses (they use a toolkit called JLTUtils that is developed at the same
university, under a CC-By-NC-SA license, which is strange because it
appears to be software and not content)
- Reinstate the downloadable version of the data, with no academic-only
restrictions

I don't want to end up issuing some sort of copyright takedown against
BabelNet. It's a project that should keep existing, but under the correct
license.


On Wed, 11 Apr 2018 at 09:49 Michael Peel  wrote:

> They also appear to be using photos from Wikimedia Commons without paying
> attention to the license. I can find photos of mine that are CC-BY-SA-4.0
> licensed that are being used without any metadata at all, let alone
> attribution and the correct CC license info…
>
> The same is also true for Everipedia, BTW.
>
> Thanks,
> Mike
>
> > On 10 Apr 2018, at 14:43, Rob Speer  wrote:
> >
> > BabelNet (http://babelnet.org) is a multilingual knowledge resource that
> > defines words and phrases in many languages. I've noticed that it copies
> > large amounts of content from Wikimedia projects, including Wikipedia,
> > Wiktionary, and Wikiquote, while violating Wikimedia's CC-By-SA license
> by
> > placing the content under an incompatible CC-By-NC-SA license.
> >
> > As one example, I can search BabelNet for "Timsort", a Wikipedia article
> > whose first sentence is one I wrote:
> >
> http://live.babelnet.org/synset?word=Timsort&lang=EN&details=1&orig=Timsort
> >
> > The sentence I wrote appears at the top of the page (with credit to
> > Wikipedia). The rest of the page is also content remixed from Wikipedia,
> > including a gallery of images that are presented without credit. A
> scrolly
> > box in the footer of the page says the content is under the CC-By-NC-SA
> 3.0
> > license. Other pages, such as
> http://babelnet.org/synset?word=bn:00852566n,
> > combine data from multiple different resources.
> >
> > The BabelNet creators are aware of the CC-By-SA licenses of the resources
> > they use (see http://babelnet.org/licenses/). In addition to the
> > non-commercial license they offer, their company, Babelscape (
> > http://babelscape.com/), sells commercial licenses to BabelNet.
> >
> > I reached out to Roberto Navigli, who runs BabelNet and Babelscape, over
> > e-mail on March 23. I asked if the non-commercial license clause was
> simply
> > a mistake. In his reply, Navigli stated that BabelNet is not a derived
> > work, but is a CC-By-NC-SA-licensed collection made of several different
> > works. I responded that BabelNet doesn't meet the Creative Commons
> > definition of a "Collective Work", which would be necessary for it to not
> > be a derived work. Navigli responded:
> >
> > "actually it is a collection of derivative work of several resources with
> > heretogeneous licenses, each of which clearly separated with separate
> > licenses and bundles. By transitivity derivative work is work with a
> > certain license, so it is work. Therefore, it is a collection of works
> with
> > different licenses and it can keep a separate license."
> >
> > I believe this is nonsense on multiple levels. BabelNet is a derived
> work,
> > and if someone could disregard their obligation to share-alike their
> > derived work simply because they derived it from multiple resources,
> there
> >

[Wikimedia-l] BabelNet is remixing Wikimedia content without following CC-By-SA terms

2018-04-10 Thread Rob Speer
BabelNet (http://babelnet.org) is a multilingual knowledge resource that
defines words and phrases in many languages. I've noticed that it copies
large amounts of content from Wikimedia projects, including Wikipedia,
Wiktionary, and Wikiquote, while violating Wikimedia's CC-By-SA license by
placing the content under an incompatible CC-By-NC-SA license.

As one example, I can search BabelNet for "Timsort", a Wikipedia article
whose first sentence is one I wrote:
http://live.babelnet.org/synset?word=Timsort&lang=EN&details=1&orig=Timsort

The sentence I wrote appears at the top of the page (with credit to
Wikipedia). The rest of the page is also content remixed from Wikipedia,
including a gallery of images that are presented without credit. A scrolly
box in the footer of the page says the content is under the CC-By-NC-SA 3.0
license. Other pages, such as http://babelnet.org/synset?word=bn:00852566n,
combine data from multiple different resources.

The BabelNet creators are aware of the CC-By-SA licenses of the resources
they use (see http://babelnet.org/licenses/). In addition to the
non-commercial license they offer, their company, Babelscape (
http://babelscape.com/), sells commercial licenses to BabelNet.

I reached out to Roberto Navigli, who runs BabelNet and Babelscape, over
e-mail on March 23. I asked if the non-commercial license clause was simply
a mistake. In his reply, Navigli stated that BabelNet is not a derived
work, but is a CC-By-NC-SA-licensed collection made of several different
works. I responded that BabelNet doesn't meet the Creative Commons
definition of a "Collective Work", which would be necessary for it to not
be a derived work. Navigli responded:

"actually it is a collection of derivative work of several resources with
heretogeneous licenses, each of which clearly separated with separate
licenses and bundles. By transitivity derivative work is work with a
certain license, so it is work. Therefore, it is a collection of works with
different licenses and it can keep a separate license."

I believe this is nonsense on multiple levels. BabelNet is a derived work,
and if someone could disregard their obligation to share-alike their
derived work simply because they derived it from multiple resources, there
would be no point to putting ShareAlike clauses on data resources at all.

As a Wikipedia contributor (and a lapsed admin), I am sad to see BabelNet
appropriating the hard work of Wikimedians and others, placing a more
restrictive license on it, and selling it. This is also relevant for me
because I run ConceptNet (http://www.conceptnet.io/), a similar knowledge
resource, and I have made sure to follow Creative Commons license
requirements and to release all its data as CC-By-SA.

In a way I see BabelNet as a competitor, but ConceptNet is an open data
project and this space shouldn't have "competitors". If the Creative
Commons license were being used appropriately, then all of us working with
this kind of data would be collaborators in the world of Linked Open Data.
My preferred outcome would be to get BabelNet to change the copyright
notices and Creative Commons links on their site to remove the
"non-commercial" requirement, and to be able to download and use their data
under the CC-By-SA license that it should be under.

I'm sure Wikimedia has dealt with similar situations to this. What would be
the most effective next step to ensure that BabelNet follows the CC-By-SA
license?

-- Rob Speer
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>