Re: [Rd] R datasets ownership(copyright) and license
On 4/5/2012 6:08 AM, Hadley Wickham wrote: And you need not look so far afield for that particular lack of rationality. In the US, databases are covered by the Database and Collections of Information Misappropriation Act of 2003 (http://www.govtrack.us/congress/bills/108/hr3261) which says almost exactly the same thing; that a database that took a lot of time and effort to collate is protected against reproduction 'in commerce' without authorisation. That bill died: http://www.govtrack.us/congress/bills/108/hr3261 After having expressed how copyright law is out of control (and further efforts to strengthen enforcement were being sold by the Motion Picture Association of America to the US and other governments on the grounds that it would make it easier for tyrants to stifle dissent; http://en.wikipedia.org/wiki/Anti-Counterfeiting_Trade_Agreement#Motion_Picture_Association_of_America), now let me strengthen my support for Hadley's position: I think we should vigorously claim fair use wherever plausible (http://en.wikipedia.org/wiki/Fair_use) with a contingency plan to sabotage CRAN (including all mirrors) once per week if we are challenged. This is crudely analogous to what happened when the American Society of Composers, Authors and Publishers (ASCAP) sued the Girl Scouts for failing to pay for the songs that girls sang around Girl Scout campfires. (http://en.wikipedia.org/wiki/Free_Culture_(book) http://en.wikipedia.org/wiki/Free_Culture_%28book%29) If this happens, we should also appeal for help from the Electronic Frontiers Foundation, the American Library Association, the American Civil Liberties Union and others, who are working to challenge the industry's abuse of power. In 2006, Stanford initiated a Fair Use project to fight this abuse of power, and other initiatives are on-going, as documented in the Wikipedia Fair Use article. The major media conglomerates in the US and internationally (ABC-Disney, CBS-Westinghouse, NBC-GE, CNN-TimeWarner, Fox-NewsCorp / Rupert Murdoch) have distorted the political process in the US, Great Britain and elsewhere to favor them and the major international corporate advertisers. (See also my article on Gateway Problems in US Politics Economics at http://occupy.pbworks.com/w/page/52167684/Gateway%20Problems;.) Spencer Hadley -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R datasets ownership(copyright) and license
Yaroslav, coming from an experimental field, I use options 4 and 4a: 4. I measure the data myself, so I am the copyright holder. 4a. I publish data sets that are given to me in order to publish by the person(s) who did the measurement. This is properly annotated in the authors field. So far, the data sets I put as example data into packages are small subsets of real studies or data collected in pre-tests, so they are not that sensitive/valuable. I plan to publish at least one real data set (as own package) eventually. But we're not yet there. Claudia Am 03.04.2012 00:06, schrieb Yaroslav Halchenko: Dear R Developers, Recently filed (and dismissed ;) ) law suit by Astrolabe against tz database developers caused a lot of media-press and discussions and created some kind of precedence in the USA [3]. But also it imho showed that similar attacks might happen in the future, and possibly against data sets which are not that obviously factual thus after all might fall under copyright or IP protection if not in the states then in some other jurisdictions. And 'data copyright/license' question comes over and over again, I just wanted to ask based on what policies or advisories datasets were selected to be shipped with R. From a very very brief look at the datasets, many of them appear to be factual data, thus at least at the moment probably are not copyrightable in the states -- but is there guarantee that they are not protected by copyright elsewhere if their origin abroad? But some seems to come from published works (still) under copyright with All rights reserved, e.g. datasets Harman23 and Harman74 [4]. Although similar question to mine was raised before [e.g. 1,2] I have not found a straight answer e.g. from a list above or a mix of them: 1. we simply did not look into it and adopted them with idea that if someone complains -- we remove corresponding pieces 2. we considered all datasets factual data thus not copyrightable (in USA? around the globe?) 3. for each (or some or majority) dataset we did collected information on possible copyright+license/IP holder and contacted them where unclear about the permission for reuse in a project under GPL license Thank you in advance for the clarification! P.S. Please do not take me wrong -- I am not trying to pick at anyone. I just wanted to get a better sense on the procedures/assumptions R developers use while adopting data for the R package, so that it could be of help for other projects. [1] https://stat.ethz.ch/pipermail/r-help/2007-April/130422.html [2] http://www.mail-archive.com/r-help@r-project.org/msg62486.html [3] http://en.wikipedia.org/wiki/Tz_database [4] it is interesting there that actual data comes from unpublished PhD thesis, but once again from the U of Chicago who holds copyright for the book itself. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R datasets ownership(copyright) and license
2. we considered all datasets factual data thus not copyrightable (in USA? around the globe?) This is definitely true in the US, but not true globally. I have no idea under which jurisdiction a lawsuit would apply. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R datasets ownership(copyright) and license
On 4/3/2012 2:00 PM, Hadley Wickham wrote: 2. we considered all datasets factual data thus not copyrightable (in USA? around the globe?) This is definitely true in the US, but not true globally. I have no idea under which jurisdiction a lawsuit would apply. I'd be careful with the word definitely. The major media conglomerates and their industry associations have successfully destroyed competition to their hegemony in many areas. For example, they sued college students for close to $100 billion, because their improvements of search engines made it easier for people in a university intranet to find copyrighted music placed by others in their public folder. They successfully sued lawyers who advised MP3 that they had reasonable grounds to believe what they did would be legal and Venture Capitalists who funded Napster. In each case, they won not on the law but on the fact that they had larger budgets for lawyers. See Lessig (2004) Free Culture [book available from Amazon and also for free under the Creative Commons license; see Wikipedia, Free Culture (book), http://en.wikipedia.org/wiki/Free_Culture_(book) http://en.wikipedia.org/wiki/Free_Culture_%28book%29]. Spencer Graves Hadley -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R datasets ownership(copyright) and license
I somewhat agree with Spencer -- as I have mentioned, the recent precedence with tz database shows that such claims would not be taken as ungrounded right away and things could easily go all the way to court -- and that might be a really costly endeavor regardless who is right or wrong. Proving that data is factual, and not fictional/creative/original might be another challenge in quite a few cases I bet. While searching for more information -- I found IMHO a very nice (although a bit dated) summary: http://www.bitlaw.com/copyright/database.html which, if we talk about abroad-of-USA summarizes nicely: sui generis right that prohibits the extraction or reutilization of any database in which there has been a substantial investment in either obtaining, verification, or presentation of the data contents. Under this second right, there is no requirement for creativity or originality. so -- I would be especially careful with data from EU ;-) on the other hand above link clarifies to me that it is ok to claim a copyright (e.g. as it is in R) on the collection of factual unprotected (still unsure if that is the case with R datasets) data. On Tue, 03 Apr 2012, Spencer Graves wrote: On 4/3/2012 2:00 PM, Hadley Wickham wrote: 2. we considered all datasets factual data thus not copyrightable (in USA? around the globe?) This is definitely true in the US, but not true globally. I have no idea under which jurisdiction a lawsuit would apply. I'd be careful with the word definitely. The major media conglomerates and their industry associations have successfully destroyed competition to their hegemony in many areas. For example, they sued college students for close to $100 billion, because their improvements of search engines made it easier for people in a university intranet to find copyrighted music placed by others in their public folder. They successfully sued lawyers who advised MP3 that they had reasonable grounds to believe what they did would be legal and Venture Capitalists who funded Napster. In each case, they won not on the law but on the fact that they had larger budgets for lawyers. See Lessig (2004) Free Culture [book available from Amazon and also for free under the Creative Commons license; see Wikipedia, Free Culture (book), http://en.wikipedia.org/wiki/Free_Culture_(book) http://en.wikipedia.org/wiki/Free_Culture_%28book%29]. Spencer Graves Hadley __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- =--= Keep in touch www.onerussian.com Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R datasets ownership(copyright) and license
I somewhat agree with Spencer -- as I have mentioned, the recent precedence with tz database shows that such claims would not be taken as ungrounded right away and things could easily go all the way to court -- and that might be a really costly endeavor regardless who is right or wrong. Proving that data is factual, and not fictional/creative/original might be another challenge in quite a few cases I bet. I think it's generally easy to tell if something is a fact or not, and I doubt any of the datasets in R are fictional. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R datasets ownership(copyright) and license
;-) Let's check where factual ends and fictional/personal/etc starts and how easy to tell. Are survey data asking for answers to specifically crafted original questions (i.e. not just age/race/etc) factual? e.g. \title{The Chatterjee--Price Attitude Data} \description{ From a survey of the clerical employees of a large financial organization, the data are aggregated from the questionnaires of the approximately 35 employees for each of 30 (randomly selected) departments. The numbers give the percent proportion of favourable responses to seven questions in each department.} \usage{attitude} ? On Tue, 03 Apr 2012, Hadley Wickham wrote: I somewhat agree with Spencer -- as I have mentioned, the recent precedence with tz database shows that such claims would not be taken as ungrounded right away and things could easily go all the way to court -- and that might be a really costly endeavor regardless who is right or wrong. Proving that data is factual, and not fictional/creative/original might be another challenge in quite a few cases I bet. I think it's generally easy to tell if something is a fact or not, and I doubt any of the datasets in R are fictional. Hadley -- =--= Keep in touch www.onerussian.com Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel