Re: [Rd] R datasets ownership(copyright) and license

2012-04-05 Thread Spencer Graves

On 4/5/2012 6:08 AM, Hadley Wickham wrote:

And you need not look so far afield for that particular lack of rationality. In 
the US, databases are covered by the Database and Collections of Information 
Misappropriation Act of 2003 (http://www.govtrack.us/congress/bills/108/hr3261) 
which says almost exactly the same thing; that a database that took a lot of 
time and effort to collate is protected against reproduction 'in commerce' 
without authorisation.

That bill died: http://www.govtrack.us/congress/bills/108/hr3261



  After having expressed how copyright law is out of control (and 
further efforts to strengthen enforcement were being sold by the Motion 
Picture Association of America to the US and other governments on the 
grounds that it would make it easier for tyrants to stifle dissent; 
http://en.wikipedia.org/wiki/Anti-Counterfeiting_Trade_Agreement#Motion_Picture_Association_of_America), 
now let me strengthen my support for Hadley's position:




  I think we should vigorously claim fair use wherever plausible 
(http://en.wikipedia.org/wiki/Fair_use) with a contingency plan to 
sabotage CRAN (including all mirrors) once per week if we are 
challenged.  This is crudely analogous to what happened when the  
American Society of Composers, Authors and Publishers (ASCAP) sued the 
Girl Scouts for failing to pay for the songs that girls sang around Girl 
Scout campfires. (http://en.wikipedia.org/wiki/Free_Culture_(book) 
http://en.wikipedia.org/wiki/Free_Culture_%28book%29) If this happens, 
we should also appeal for help from the Electronic Frontiers Foundation, 
the American Library Association, the American Civil Liberties Union and 
others, who are working to challenge the industry's abuse of power.  In 
2006, Stanford initiated a Fair Use project to fight this abuse of 
power, and other initiatives are on-going, as documented in the 
Wikipedia Fair Use article.  The major media conglomerates in the US 
and internationally (ABC-Disney, CBS-Westinghouse, NBC-GE, 
CNN-TimeWarner, Fox-NewsCorp / Rupert Murdoch) have distorted the 
political process in the US, Great Britain and elsewhere to favor them 
and the major international corporate advertisers.  (See also my article 
on Gateway Problems in US Politics  Economics at 
http://occupy.pbworks.com/w/page/52167684/Gateway%20Problems;.)



  Spencer


Hadley




--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R datasets ownership(copyright) and license

2012-04-03 Thread Claudia Beleites
Yaroslav,

coming from an experimental field, I use options 4 and 4a:

4. I measure the data myself, so I am the copyright holder.
4a. I publish data sets that are given to me in order to publish by the
person(s) who did the measurement. This is properly annotated in the
authors field.

So far, the data sets I put as example data into packages are small
subsets of real studies or data collected in pre-tests, so they are not
that sensitive/valuable. I plan to publish at least one real data set
(as own package) eventually. But we're not yet there.

Claudia




Am 03.04.2012 00:06, schrieb Yaroslav Halchenko:
 Dear R Developers,
 
 Recently filed (and dismissed ;) ) law suit by Astrolabe against tz
 database developers caused a lot of media-press and discussions and
 created some kind of precedence in the USA [3].  But also it imho showed
 that similar attacks might happen in the future, and possibly against
 data sets which are not that obviously factual thus after all might
 fall under copyright or IP protection if not in the states then in
 some other jurisdictions.
 
 And 'data copyright/license' question comes over and over again, I just
 wanted to ask based on  what policies or advisories datasets were
 selected to be shipped with R.   From a very very brief look at the
 datasets, many of them appear to be factual data, thus at least at the
 moment probably are not copyrightable in the states -- but is there
 guarantee that they are not protected by copyright elsewhere if their
 origin abroad?   But some seems to come from published works (still)
 under copyright with All rights reserved, e.g. datasets Harman23
 and Harman74 [4].
 
 Although similar question to mine was raised before [e.g. 1,2] I
 have not found a straight answer e.g. from a list above or a mix of
 them:
 
 1. we simply did not look into it and adopted them with idea that if
someone complains -- we remove corresponding pieces
 
 2. we considered all datasets factual data thus not copyrightable (in
USA? around the globe?)
 
 3. for each (or some or majority) dataset we did collected information
on possible copyright+license/IP holder and contacted them where
unclear about the permission for reuse in a project under GPL license
 
 Thank you in advance for the clarification!
 
 P.S. Please do not take me wrong -- I am not trying to pick at
 anyone.  I just wanted to get a better sense on the
 procedures/assumptions R developers use while adopting data for the R
 package, so that it could be of help for other projects.
 
 [1] https://stat.ethz.ch/pipermail/r-help/2007-April/130422.html
 [2] http://www.mail-archive.com/r-help@r-project.org/msg62486.html
 [3] http://en.wikipedia.org/wiki/Tz_database
 [4] it is interesting there that actual data comes from unpublished PhD
 thesis, but once again from the U of Chicago who holds copyright
 for the book itself.
 


-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R datasets ownership(copyright) and license

2012-04-03 Thread Hadley Wickham
 2. we considered all datasets factual data thus not copyrightable (in
   USA? around the globe?)

This is definitely true in the US, but not true globally.  I have no
idea under which jurisdiction a lawsuit would apply.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R datasets ownership(copyright) and license

2012-04-03 Thread Spencer Graves

On 4/3/2012 2:00 PM, Hadley Wickham wrote:

2. we considered all datasets factual data thus not copyrightable (in
   USA? around the globe?)

This is definitely true in the US, but not true globally.  I have no
idea under which jurisdiction a lawsuit would apply.



  I'd be careful with the word definitely.  The major media 
conglomerates and their industry associations have successfully 
destroyed competition to their hegemony in many areas.  For example, 
they sued college students for close to $100 billion, because their 
improvements of search engines made it easier for people in a university 
intranet to find copyrighted music placed by others in their public 
folder.  They successfully sued lawyers who advised MP3 that they had 
reasonable grounds to believe what they did would be legal and Venture 
Capitalists who funded Napster.  In each case, they won not on the law 
but on the fact that they had larger budgets for lawyers.  See Lessig 
(2004) Free Culture [book available from Amazon and also for free under 
the Creative Commons license;  see Wikipedia, Free Culture (book), 
http://en.wikipedia.org/wiki/Free_Culture_(book) 
http://en.wikipedia.org/wiki/Free_Culture_%28book%29].



  Spencer Graves


Hadley



--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R datasets ownership(copyright) and license

2012-04-03 Thread Yaroslav Halchenko
I somewhat agree with Spencer -- as I have mentioned, the recent precedence
with tz database shows that such claims would not be taken as ungrounded right
away and things could easily go all the way to court -- and that might be a
really costly endeavor regardless who is right or wrong.  Proving that
data is factual, and not fictional/creative/original might be another challenge
in quite a few cases I bet.

While searching for more information -- I found IMHO a very nice (although a
bit dated) summary: http://www.bitlaw.com/copyright/database.html which,
if we talk about abroad-of-USA summarizes nicely: 

sui generis right that prohibits the extraction or reutilization of any
database in which there has been a substantial investment in either obtaining,
verification, or presentation of the data contents. Under this second right,
there is no requirement for creativity or originality.

so -- I would be especially careful with data from EU ;-)

on the other hand above link clarifies to me that it is ok to claim a copyright
(e.g.  as it is in R) on the collection of factual unprotected (still unsure if
that is the case with R datasets) data.

On Tue, 03 Apr 2012, Spencer Graves wrote:
 On 4/3/2012 2:00 PM, Hadley Wickham wrote:
 2. we considered all datasets factual data thus not copyrightable (in
USA? around the globe?)
 This is definitely true in the US, but not true globally.  I have no
 idea under which jurisdiction a lawsuit would apply.

   I'd be careful with the word definitely.  The major media
 conglomerates and their industry associations have successfully
 destroyed competition to their hegemony in many areas.  For example,
 they sued college students for close to $100 billion, because their
 improvements of search engines made it easier for people in a
 university intranet to find copyrighted music placed by others in
 their public folder.  They successfully sued lawyers who advised
 MP3 that they had reasonable grounds to believe what they did would
 be legal and Venture Capitalists who funded Napster.  In each case,
 they won not on the law but on the fact that they had larger budgets
 for lawyers.  See Lessig (2004) Free Culture [book available from
 Amazon and also for free under the Creative Commons license;  see
 Wikipedia, Free Culture (book),
 http://en.wikipedia.org/wiki/Free_Culture_(book)
 http://en.wikipedia.org/wiki/Free_Culture_%28book%29].


   Spencer Graves

 Hadley

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
=--=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R datasets ownership(copyright) and license

2012-04-03 Thread Hadley Wickham
 I somewhat agree with Spencer -- as I have mentioned, the recent precedence
 with tz database shows that such claims would not be taken as ungrounded right
 away and things could easily go all the way to court -- and that might be a
 really costly endeavor regardless who is right or wrong.  Proving that
 data is factual, and not fictional/creative/original might be another 
 challenge
 in quite a few cases I bet.

I think it's generally easy to tell if something is a fact or not, and
I doubt any of the datasets in R are fictional.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R datasets ownership(copyright) and license

2012-04-03 Thread Yaroslav Halchenko
;-)  Let's check where factual ends and fictional/personal/etc starts
and how easy to tell.

Are survey data asking for answers to specifically crafted original
questions (i.e. not just age/race/etc) factual? e.g.

\title{The Chatterjee--Price Attitude Data}
\description{
  From a survey of the clerical employees of a large financial
  organization, the data are aggregated from the questionnaires of the
  approximately 35 employees for each of 30 (randomly selected)
  departments.  The numbers give the percent proportion of favourable
  responses to seven questions in each department.}
\usage{attitude}

?

On Tue, 03 Apr 2012, Hadley Wickham wrote:

  I somewhat agree with Spencer -- as I have mentioned, the recent precedence
  with tz database shows that such claims would not be taken as ungrounded 
  right
  away and things could easily go all the way to court -- and that might be a
  really costly endeavor regardless who is right or wrong.  Proving that
  data is factual, and not fictional/creative/original might be another 
  challenge
  in quite a few cases I bet.

 I think it's generally easy to tell if something is a fact or not, and
 I doubt any of the datasets in R are fictional.

 Hadley
-- 
=--=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel