> you might enumerate the position of each occurrence of a word in Harry Potter, that's all pure facts after all. But publishing an extensive set of that kind of factual statements would let anyone rebuild this books.
This is just a representation of the artwork. And the artwork is protected as a creative work. So you can’t do that without violating database right (I guess a court won’t buy the argument « but this was not the ebook of Harry Potter, this was the zipfile of an ebook of Harry Potter.) You can’t « hack » the law that way as it has been robust ehough to protect numerical and paper versions of book withou a sustantial change, an editor don’t have to protect the little endian as well as the big endian version of the file :). What is not protected is the idea : you can make a story about a sorcerer school. What is a work of the spirit is defined by the law, in france : http://www.bnf.fr/fr/professionnels/principes_droit_auteur.html A criteria, relevant in databases is « originality » *: another author would not make the same work. In pure factual facts, like a lot of stuffs, a list of work ever published by a specific editor, any author would do the same list eventually. Only the specific presentation of the data can apply as « droit d’auteur ». However databases obey a specific law that aims to protect an organisation that uses a « substancial » amout of resources to build a specific dataset. An example is the french organism IGN https://www.wikidata.org/wiki/Special:EntityPage/Q1665102 <https://www.wikidata.org/wiki/Special:EntityPage/Q1665102> who recolted, updated and publisshed detailed geographic maps of france. Such an editor is allowed to disallow the extraction of a « substancial » amount of datas from his dataset … this last 15 years from the point the editor stops unpdating the data. * 2017-11-30 13:38 GMT+01:00 mathieu stumpf guntz < psychosl...@culture-libre.org>: > > > Le 30/11/2017 à 10:14, John Erling Blad a écrit : > > A single property licensing scheme would allow storage of data, it might > or might not allow reuse of the licensed data together with other data. > Remember that all entries in the servers might be part of an mashup with > all other entries. > > That's a very interesting point. Does anyone know a clear extensive report > of what is legal or not regarding massive import of data extracted from > some source? > > Indeed, if there was really no limit in using "factual statement" data, > that would be a huge loophole in copyright. For example you might enumerate > the position of each occurrence of a word in Harry Potter, that's all pure > facts after all. But publishing an extensive set of that kind of factual > statements would let anyone rebuild this books. > > The same might happen with an extensive extraction of data stored > initially in Wikipedia under CC-by-sa, and imported in Wikidata. There is > already the ArticlePlaceholder[1] extension which is a first step in > generating whole complete prosodic encyclopaedic article, which then should > be logically be publishable under CC0. Thus the concerns of license > laundering. > > Not having a way to track sources and their corresponding licenses doesn't > make automagically disappear that there are licenses issues in the first > place. An integrating license tracking system should enable to detect > possible infractions in remixes. Users should be informed that what they > are trying to mix is legally authorized by the miscellaneous ultimate > sources from which Wikidata gathered them, or not. Until some solid legal > report point in this direction, it's not accurate to pretend unilaterally > that they can do whatever they want regardless of sources from which > Wikidata gathered them in the first place even if it's a massive import of > a differently licensed source. > > [1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder > > > > On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad <jeb...@gmail.com> > wrote: > >> Please keep this civil and on topic! >> >> Licensing was discussed in the start of the project, as in start of >> developing code for the project, and as I recall it the arguments for CC0 >> was valid and sound. That was long before Danny started working for Google. >> >> As I recall it was mention during first week of the project (first week >> of april), and the duscussion reemerged during first week of development. >> That must have been week 4 or 5 (first week of may), as the delivery of the >> laptoppen was delayed. I was against CC0 as I expected problems with reuse >> og external data. The arguments for CC0 convinced me. >> >> And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens >> did too. >> >> Argument is pretty simple: Part A has some data A and claim license A. >> Part B has some data B and claim license B. Both license A and license B >> are sticky, this later data C that use an aggregation of A and B must >> satisfy both license A and license B. That is not viable. >> >> Moving forward to a safe, non-sticky license seems to be the only viable >> solution, and this leads to CC0. >> >> Feel free to discuss the merrit of our choice but do not use personal >> attacs. Thank you. >> >> Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli < >> martinellil...@gmail.com>: >> >>> Oh, and by the way, ODbL was considered as a potential license, but I >>> recall that that license could have been incompatible for reuse with CC >>> BY-SA 3.0. It was actually a point of discussion with the Italian >>> OpenStreetMap community back in 2013, when I first presented at the OSM-IT >>> meeting the possibility of a collaboration between WD and OSM. >>> >>> L. >>> >>> Il 30 nov 2017 08:57, "Luca Martinelli" <martinellil...@gmail.com> ha >>> scritto: >>> >>>> I basically stopped reading this email after the first attack to Denny. >>>> >>>> I was there since the beginning, and I do recall the *extensive* >>>> discussion about what license to use. CC0 was chosen, among other things, >>>> because of the moronic EU rule about database rights, that CC 3.0 licenses >>>> didn't allow us to counter - please remember that 4.0 were still under >>>> discussion, and we couldn't afford the luxury of waiting for 4.0 to come >>>> out before publishing Wikidata. >>>> >>>> And possibly next time provide a TL;DR version of your email at the top. >>>> >>>> Cheers, >>>> >>>> L. >>>> >>>> >>>> Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" < >>>> psychosl...@culture-libre.org> ha scritto: >>>> >>>>> Saluton ĉiuj, >>>>> >>>>> I forward here the message I initially posted on the Meta Tremendous >>>>> Wiktionary User Group talk page >>>>> <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>, >>>>> because I'm interested to have a wider feedback of the community on this >>>>> point. Whether you think that my view is completely misguided or that I >>>>> might have a few relevant points, I'm extremely interested to know it, so >>>>> please be bold. >>>>> >>>>> Before you consider digging further in this reading, keep in mind that >>>>> I stay convinced that Wikidata is a wonderful project and I wish it a >>>>> bright future full of even more amazing things than what it already brung >>>>> so far. My sole concern is really a license issue. >>>>> >>>>> Bellow is a copy/paste of the above linked message: >>>>> >>>>> Thank you Lydia Pintscher >>>>> <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for >>>>> taking the time to answer. Unfortunately this answer >>>>> <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0> >>>>> miss too many important points to solve all concerns which have been >>>>> raised. >>>>> >>>>> Notably, there is still no beginning of hint in it about where the >>>>> decision of using CC0 exclusively for Wikidata came from. But as this >>>>> inquiry on the topic >>>>> <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive> >>>>> advance, an answer is emerging from it. It seems that Wikidata choice >>>>> toward CC0 was heavily influenced by Denny Vrandečić, who – to make it >>>>> short – is now working in the Google Knowledge Graph team. Also it worth >>>>> noting that Google funded a quarter of the initial development work. >>>>> Another quarter came from the Gordon and Betty Moore Foundation, >>>>> established by Intel co-founder. And half the money came from Microsoft >>>>> co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] >>>>> <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>. >>>>> To state it shortly in a conspirational fashion, Wikidata is the puppet >>>>> trojan horse of big tech hegemonic companies into the realm of Wikimedia. >>>>> For a less tragic, more argumentative version, please see the research >>>>> project (work in progress, only chapter 1 is in good enough shape, and >>>>> it's >>>>> only available in French so far). Some proofs that this claim is >>>>> completely >>>>> wrong are welcome, as it would be great that in fact that was the >>>>> community >>>>> that was the driving force behind this single license choice and that it >>>>> is >>>>> the best choice for its future, not the future of giant tech companies. >>>>> This would be a great contribution to bring such a happy light on this >>>>> subject, so we can all let this issue alone and go back contributing in >>>>> more interesting topics. >>>>> >>>>> Now let's examine the thoughts proposed by Lydia. >>>>> Wikidata is here to give more people more access to more knowledge. So >>>>> far, it makes it matches Wikimedia movement stated goal. This means >>>>> we want our data to be used as widely as possible. Sure, as long as >>>>> it rhymes with equity. As in *Our strategic direction: Service and * >>>>> *Equity* >>>>> <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>. >>>>> Just like we want freedom for everybody as widely as possible. That is, >>>>> starting where it confirms each others freedom. Because under this level, >>>>> freedom of one is murder and slavery of others. CC-0 is one step >>>>> towards that. That's a thesis, you can propose to defend it but no >>>>> one have to agree without some convincing proof. Data is different >>>>> from many other things we produce in Wikimedia in that it is aggregated, >>>>> combined, mashed-up, filtered, and so on much more extensively. No >>>>> it's not. From a data processing point of view, everything is data. >>>>> Whether >>>>> it's stored in a wikisyntax, in a relational database or engraved in stone >>>>> only have a commodity side effect. Whether it's a random stream of bit >>>>> generated by a dumb chipset or some encoded prose of Shakespeare make no >>>>> difference. So from this point of view, no, what Wikidata store is not >>>>> different from what is produced anywhere else in Wikimedia projects. Sure, >>>>> the way it's structured does extremely ease many things. But this is not >>>>> because it's data, when elsewhere there would be no data. It's because it >>>>> enforce data to be stored in a way that ease aggregation, combination, >>>>> mashing-up, filtering and so on. Our data lives from being able to >>>>> write queries over millions of statements, putting it into a mobile app, >>>>> visualizing parts of it on a map and much more. Sure. It also lives >>>>> from being curated from millions[2] >>>>> <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2> >>>>> of benevolent contributors, or it would be just a useless pile of random >>>>> bytes. This means, if we require attribution, in a huge number of >>>>> cases attribution would need to go back to potentially millions of editors >>>>> and sources (even if that data is not visible in the end result but only >>>>> helped to get the result). No, it doesn't mean that. First let's >>>>> recall a few basics as it seems the whole answer makes confusion between >>>>> attribution and distribution of contributions under the same license as >>>>> the >>>>> original. Attribution is crucial for traceability and so for reliable and >>>>> trusted knowledge that we are targeting within the Wikimedia movement. The >>>>> "same license" is the sole legal guaranty of equity contributors have. >>>>> That's it, trusted knowledge and equity are requirements for the Wikimedia >>>>> movement goals. That means withdrawing this requirements is withdrawing >>>>> this goals. Now, what would be the additional cost of storing sources >>>>> in Wikidata? Well, zero cost. Actually, it's already here as the >>>>> "reference" attribute is part of the Wikibase item structure. So >>>>> attribution is not a problem, you don't have to put it in front of your >>>>> derived work, just look at a Wikipedia article: until you go to history, >>>>> you have zero attribution visible, and it's ok. It's also have probably >>>>> zero or negligible computing cost, as it doesn't have to be included in >>>>> all >>>>> computations, it just need to be retrievable on demand. What would be >>>>> the additional cost of storing licenses for each item based on its source? >>>>> Well, adding a license attribute might help, but actually if your >>>>> reference >>>>> is a work item, I guess it might comes with a "license" statement, so zero >>>>> additional cost. Now for letting user specify under which free licenses >>>>> they publish their work, that would just require an additional attribute, >>>>> a >>>>> ridiculous weight when balanced with equity concerns it resolves. Could >>>>> that prevent some uses for some actors? Yes, that's actually the point, >>>>> preventing abuse of those who doesn't want to act equitably. For all other >>>>> actors a "distribute under same condition" is fine. This is >>>>> potentially computationally hard to do and and depending on where the data >>>>> is used very inconvenient (think of a map with hundreds of data points in >>>>> a >>>>> mobile app). OpenStreetMap which use ODbL, a copyleft attributive >>>>> license, do exactly that too, doesn't it? By the way, allowing a license >>>>> by >>>>> item would enable to include OpenStreetMap data in WikiData, which is >>>>> currently impossible due to the CC0 single license policy of the project. >>>>> Too bad, it could be so useful to have this data accessible for Wikimedia >>>>> projects, but who cares? This is a burden on our re-users that I do >>>>> not want to impose on them. Wait, which re-users? Surely one might >>>>> expect that Wikidata would care first of re-users which are in the phase >>>>> with Wikimedia goal, so surely needs of Wikimedia community in particular >>>>> and Free/Libre Culture in general should be considered. Do this re-users >>>>> would be penalized by a copyleft license? Surely no, or they wouldn't use >>>>> it extensively as they do. So who are this re-users for who it's thought >>>>> preferable, without consulting the community, to not annoy with questions >>>>> of equity and traceability? It would make it significantly harder to >>>>> re-use our data and be in direct conflict with our goal of spreading >>>>> knowledge. No, technically it would be just as easy as punching a >>>>> button on a computer to do that rather than this. What is in direct >>>>> conflict with our clearly stated goals emerging from the 2017 community >>>>> consultation is going against equity and traceability. You propose to >>>>> discard both to satisfy exogenous demands which should have next to no >>>>> weight in decision impacting so deeply the future of our community. >>>>> Whether >>>>> data can be protected in this way at all or not depends on the >>>>> jurisdiction >>>>> we are talking about. See this Wikilegal on on database rights >>>>> <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights> for more >>>>> details. It says basically that it's applicable in United States and >>>>> Europe on different legal bases and extents. And for the rest of the >>>>> world, >>>>> it doesn't say it doesn't say nothing can apply, it states nothing. So >>>>> even if we would have decided to require attribution it would only be >>>>> enforceable in some jurisdictions. What kind of logic is that? Maybe >>>>> it might not be applicable in some country, so let's withdraw the few >>>>> rights we have. Ambiguity, when it comes to legal matters, also >>>>> unfortunately often means that people refrain from what they want to to >>>>> for >>>>> fear of legal repercussions. This is directly in conflict with our goal of >>>>> spreading knowledge. Economic inequality, social inequity and legal >>>>> imbalance might also refrain people from doing what they want, as they >>>>> fear >>>>> practical repercussions. CC0 strengthen this discrimination factors by >>>>> enforcing people to withdraw the few rights they have to weight against >>>>> the >>>>> growing asymmetry that social structures are concomitantly building. So >>>>> CC0 >>>>> as unique license choice is in direct conflict with our goal of >>>>> *equitably* spreading knowledge. Also it seems like this statement >>>>> suggest that releasing our contributions only under CC0 is the sole >>>>> solution to diminish legal doubts. Actually any well written license would >>>>> do an equal job regarding this point, including many copyleft licenses out >>>>> there. So while associate a clear license to each data item might indeed >>>>> diminish legal uncertainty, it's not an argument at all for enforcing CC0 >>>>> as sole license available to contributors. Moreover, just putting a >>>>> license side by side with a work does not ensure that the person who made >>>>> the association was legally allowed to do so. To have a better confidence >>>>> in the legitimacy of a statement that a work is covered by a certain >>>>> license, there is once again a traceability requirement. For example, >>>>> Wikidata currently include many items which were imported from misc. >>>>> Wikipedia versions, and claim that the derived work obtained – a set of >>>>> items and statements – is under CC0. That is a hugely doubtful statement >>>>> and it alarmingly looks like license laundering >>>>> <https://en.wikipedia.org/wiki/license_laundering>. This is true for >>>>> Wikipedia, but it's also true for any source on which a large scale >>>>> extraction and import are operated, whether through bots or crowd >>>>> sourcing. So >>>>> the Wikidata project is currently extremely misplaced to give lessons on >>>>> legal ambiguity, as it heavily plays with legal blur and the hope that its >>>>> shady practises won't fall under too much scrutiny. Licenses that >>>>> require attribution are often used as a way to try to make it harder for >>>>> big companies to profit from openly available resources. No there are >>>>> not. They are used as *a way to try to make it harder for big >>>>> companies to profit from openly available resources* *in inequitable >>>>> manners*. That's completely different. Copyleft licenses give the >>>>> same rights to big companies and individuals in a manner that lower >>>>> socio-economic inequalities which disproportionally advantage the former. >>>>> The >>>>> thing is there seems to be no indication of this working. Because >>>>> it's not trying to enforce what you pretend, so of course it's not working >>>>> for this goal. But for the goal that copyleft licenses aims at, there are >>>>> clear evidences that yes it works. Big companies have the legal and >>>>> engineering resources to handle both the legal minefield and the technical >>>>> hurdles easily. There is no pitfall in copyleft licenses. Using war >>>>> material analogy is disrespectful. That's true that copyleft licenses >>>>> might >>>>> come with some constraints that non-copyleft free licenses don't have, but >>>>> that the price for fostering equity. And it's a low price, that even >>>>> individuals can manage, it might require a very little extra time on legal >>>>> considerations, but on the other hand using the free work is an immensely >>>>> vast gain that worth it. In Why you shouldn't use the Lesser GPL for >>>>> your next library <https://www.gnu.org/licenses/why-not-lgpl.html> is >>>>> stated *proprietary software developers have the advantage of money; >>>>> free software developers need to make advantages for each other*. >>>>> This might be generalised as *big companies have the advantage of >>>>> money; free/libre culture contributors need to make advantages for each >>>>> other*. So at odd with what pretend this fallacious claims against >>>>> copyleft licenses, they are not a "minefield and the technical hurdles" >>>>> that only big companies can handle. All the more, let's recall who >>>>> financed >>>>> the initial development of Wikidata: only actors which are related to big >>>>> companies. Who it is really hurting is the smaller start-up, >>>>> institution or hacker who can not deal with it. If this statement is >>>>> about copyleft licenses, then this is just plainly false. Smaller actors >>>>> have more to gain in preserving mutual benefit of the common ecosystem >>>>> that >>>>> a copyleft license fosters. With Wikidata we are making structured >>>>> data about the world available for everyone. And that's great. But >>>>> that doesn't require CC0 as sole license to be achieved. We are >>>>> leveling the playing field to give those who currently don’t have access >>>>> to >>>>> the knowledge graphs of the big companies a chance to build something >>>>> amazing. And that's great. But that doesn't require CC0 as sole >>>>> license. Actually CC0 makes it a less sustainable project on this point, >>>>> as >>>>> it allows unfair actors to take it all, add some interesting added value >>>>> that our community can not afford, reach/reinforce an hegemonic position >>>>> in >>>>> the ecosystem with their own closed solution. And, ta ta, Wikidata can be >>>>> discontinued quietly, just like Google did with the defunct Freebase which >>>>> was CC-BY-SA before they bought the company that was running it, and after >>>>> they imported it under CC0 in Wikidata as a new attempt to gather a larger >>>>> community of free curators. And when it will have performed license >>>>> laundering of all Wikimedia projects works with shady mass extract and >>>>> import, Wikimedia can disappear as well. Of course big companies benefits >>>>> more of this possibilities than actors with smaller financial support and >>>>> no hegemonic position. Thereby we are helping more people get access >>>>> to knowledge from more places than just the few big ones. No, with >>>>> CC0 you are certainly helping big companies to reinforce their position in >>>>> which they can distribute information manipulated as they wish, without >>>>> consideration for traceability and equity considerations. Allowing >>>>> contributors to also use copyleft licenses would be far more effective to >>>>> *collect >>>>> and use different forms of free, trusted knowledge* that *focus >>>>> efforts on the knowledge and communities that have been left out by >>>>> structures of power and privilege*, as stated in *Our strategic >>>>> direction: Service and Equity*. CC-0 is becoming more and more >>>>> common. Just like economic inequality >>>>> <https://en.wikipedia.org/wiki/economic_inequality>. But that is not >>>>> what we are aiming to foster in the Wikimedia movement. Many >>>>> organisations are releasing their data under CC-0 and are happy with the >>>>> experience. Among them are the European Union, Europeana, the National >>>>> Library of Sweden and the Metropolitan Museum of Modern Arts. Good >>>>> for them. But they are not the Wikimedia community, they have their own >>>>> goals and plan to be sustainable that does not necessarily meet what our >>>>> community can follow. Different contexts require different means. States >>>>> and their institutions can count on tax revenue, and if taxpayers ends up >>>>> in public domain works, that's great and seems fair. States are rarely >>>>> threatened by companies, they have legal lever to pressure that kind of >>>>> entity, although conflict of interest and lobbying can of course mitigate >>>>> this statement. Importing that kind of data with proper attribution >>>>> and license is fine, be it CC0 or any other free license. But that's not >>>>> an >>>>> argument in favour of enforcing on benevolent a systematic withdraw of all >>>>> their rights as single option to contribute. All this being said we >>>>> do encourage all re-users of our data to give attribution to Wikidata >>>>> because we believe it is in the interest of all parties involved. That's >>>>> it, zero legal hope of equity. And our experience shows that many of >>>>> our re-users do give credit to Wikidata even if they are not forced to. >>>>> Experience >>>>> also show that some prominent actors like Google won't credit the >>>>> Wikimedia >>>>> community anymore when generating directly answer based on, inter alia, >>>>> information coming from Wikidata, which is itself performing license >>>>> laundering of Wikipedia data. Are there no downsides to this? No, of >>>>> course not. Some people chose not to participate, some data can't be >>>>> imported and some re-users do not attribute us. But the benefits I have >>>>> seen over the years for Wikidata and the larger open knowledge ecosystem >>>>> far outweigh them. This should at least backed with some solid >>>>> statistics that it had a positive impact in term of audience and >>>>> contribution in Wikimedia project as a whole. Maybe the introduction of >>>>> Wikidata did have a positive effect on the evolution of total number of >>>>> contributors, or maybe so far it has no significant correlative effect, or >>>>> maybe it is correlative with a decrease of the total number of active >>>>> contributors. Some plots would be interesting here. Mere personal feelings >>>>> of benefits and hindrances means nothing here, mine included of course. >>>>> Plus, >>>>> there is not even the beginning of an attempt to A/B test with a second >>>>> Wikibase instant that allow users to select which licenses its >>>>> contributions are released under, so there is no possible way to state >>>>> anything backed on relevant comparison. The fact that they are some people >>>>> satisfied with the current state of things doesn't mean they would not be >>>>> even more satisfied with a more equitable solution that allows >>>>> contributors >>>>> to chose a free license set for their publications. All the more this is >>>>> all about the sustainability and fostering of our community and reaching >>>>> its goals, not immediate feeling of satisfaction for some people. >>>>> >>>>> - >>>>> >>>>> [1] Wikipedia Signpost 2015, 2nd december >>>>> >>>>> <https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed> >>>>> >>>>> >>>>> - >>>>> >>>>> [2] according to the next statement of Lydia >>>>> >>>>> Once again, I recall this is not a manifesto against Wikidata. The >>>>> motivation behind this message is a hope that one day one might >>>>> participate >>>>> in Wikidata with the same respect for equity and traceability that is >>>>> granted in other Wikimedia projects. >>>>> >>>>> Kun multe da vikiamo, >>>>> mathieu >>>>> >>>>> _______________________________________________ >>>>> Wikidata mailing list >>>>> Wikidata@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>>>> >>>>> _______________________________________________ >>> Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >> > > > _______________________________________________ > Wikidata mailing > listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata > > > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata