On Wed, Jan 30, 2013 at 11:00 AM, Karen Coyle <[email protected]> wrote:

>
> However, the Internet Archive does have copyright declarations AND
> evidence fields for its digitized materials, so you can view those by
> following the link to the full text. I don't know if there's a
> reasonable way to pull those into the OL database.
>

The Internet Archive calls their field "possible copyright status" which
doesn't sound very authoritative or useful to me.  Certainly any date of
publication or date of copyright would be useful information to bring over.



> I understand when people are nervous about recording copyright
> information, but my approach at U of C was that it is a disservice to
> users to not at least tell them what you *do* know about materials you
> are making available.


I agree that more information is better.  The only caveat that I'd add is
that OL shouldn't replicate dynamic data it isn't committed to keeping up
to date.  It'd be better to direct people back to the original source.


> At the same time, with a very few exceptions, actually determining
> copyright status is a lengthy, expensive process. There is some evidence
> that HathiTrust will be going through that process, and US libraries are
> talking about sharing among them their determinations.


HathiTrust is definitely engaged in doing this research on active basis and
publishing the results of that research.  They publish daily update files
for the changes they've made to their database and those updates include
fields for Access, Rights, and Reason for Rights Determination.  Access is
the high order allow/deny bit while Rights gives what they believe the
status is and the third column is the reason they believe that.

They describe the layout of their rights database along with the process
they use here: http://www.hathitrust.org/rights_database
The files themselves are at: http://www.hathitrust.org/hathifiles
They layout of those files is described
http://www.hathitrust.org/hathifiles_description

For example, the Feb. 1 update includes over 17,000 records that they
updated in some way during that day (not necessarily all for rights
reasons).

allow 6935
deny 10678

ic 7919 in-copyright
pdus 3649 public domain in the U.S.
pd 3284 public domain
und 2759  <== These "undetermined" ones are the ones they're researching
cc-by-nc-nd 2

bib 17415 bibliographically-derived by automatic processes
ren 103 copyright renewal research was conducted
nfi 60 needs further investigation (copyright research partially complete;
an ambiguous, unclear, or other time-consuming situation was encountered)
cdpp 12 title page or verso contain copyright date and/or place of
publication information not in bib record
crms 8 derived from multiple reviews in the Copyright Review Management
System (CRMS) via an internal resolution policy; consult CRMS records for
details
con 2 contractual agreement with copyright holder on file
ncn 1 no printed copyright notice

As you can see, they are doing a ton of work and keeping pretty extensive
records.  It might be useful to replicate some of the top level info,
especially for things which have reliable determinations of being in the
public domain, but I'd be wary of including too much of the fine detail.
 Because their records include ISBNs, LCCNs, OCLC numbers, etc it should be
pretty easy to do strong matching with OL records.

Tom
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to