Re: [CODE4LIB] LOC Authority Data
I appreciate your attention to this stuff Roy, but I'm afraid that doesn't really work either. I think MOST libraries that use OCLC Worldcat for the bulk of their cataloging do NOT in fact contribute all cataloging or holdings back to worldcat. Many libraries have particular items that for reasons of institutional policy (which I admit I find byzantine) keep some holdings out of Worldcat. And/or do not contribute some 'original cataloging' to Worldcat, even if they contribute most---perhaps because some of their 'original cataloging' is not up to AACR2 and/or Worldcat standards, so they can't/don't want to/are embaressed to share it. I'm afraid those new terms may have just excluded my library! I'm not really sure what OCLC is actually trying to accomplish with these terms, what's the goal? But I don't think you're doing it yet. I hope my library isn't now excluded from Worldcat API use---or that I'd need to get our cataloging unit to make fundamental changes in what they do, that they are resistant to, in order to use it. Jonathan --- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 [EMAIL PROTECTED] Roy Tennant [EMAIL PROTECTED] 10/3/2008 10:33 PM On 10/2/08 10/2/08 € 2:39 PM, Jenn Riley [EMAIL PROTECTED] wrote: Thanks for the link, Roy. I hadn't taken the time to look this far into the Grid Services terms of use. One thing stuck out to me, though. What does Library members that do ***all*** their cataloging with an OCLC subscription mean? The all part is what doesn't make sense to me on first read. Jenn, Thanks for asking. We agreed that the wording is perhaps not the best, so we changed it to Library members that contribute all current cataloging and holdings to WorldCat which we think gets more at what we mean. That is, the important thing is that you contribute information about what you have to the common pool. Thanks for spurring us to make this change and we hope that clarifies our intent. Thanks, Roy
Re: [CODE4LIB] LOC Authority Data
- Jonathan Rochkind [EMAIL PROTECTED] wrote: I appreciate your attention to this stuff Roy, but I'm afraid that doesn't really work either. I think MOST libraries that use OCLC Worldcat for the bulk of their cataloging do NOT in fact contribute all cataloging or holdings back to worldcat. Many libraries have particular items that for reasons of institutional policy (which I admit I find byzantine) keep some holdings out of Worldcat. For example, in our case, we exclude batches of records accompanying certain ebook collections, as per the vendors' license terms. A misguided practice, but one that exists nonetheless. Mark Mark Jordan Head of Library Systems W.A.C. Bennett Library, Simon Fraser University Burnaby, British Columbia, V5A 1S6, Canada Voice: 778.782.5753 / Fax: 778.782.3023 [EMAIL PROTECTED]
Re: [CODE4LIB] LOC Authority Data
Ah, yes, that's much clearer, thanks! Jenn -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Roy Tennant Sent: Friday, October 03, 2008 10:33 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data On 10/2/08 10/2/08 € 2:39 PM, Jenn Riley [EMAIL PROTECTED] wrote: Thanks for the link, Roy. I hadn't taken the time to look this far into the Grid Services terms of use. One thing stuck out to me, though. What does Library members that do ***all*** their cataloging with an OCLC subscription mean? The all part is what doesn't make sense to me on first read. Jenn, Thanks for asking. We agreed that the wording is perhaps not the best, so we changed it to Library members that contribute all current cataloging and holdings to WorldCat which we think gets more at what we mean. That is, the important thing is that you contribute information about what you have to the common pool. Thanks for spurring us to make this change and we hope that clarifies our intent. Thanks, Roy
Re: [CODE4LIB] LOC Authority Data
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ya¹aqov Ziso Sent: Thursday, October 02, 2008 5:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Andrew Houghton, kindly explain: 1. LC names/subjects authority files, current with 2008-09-17, are available on your SRW server http://tspilot.oclc.org/lcsh/ for us (a consortium) to harvest and load on our server for our consortial authority maintenance? The SRW server URI endpoint at /lcsh only contains LC subjects. If we are able to provide access to the name authority file it will most likely have the endpoint /naf in keeping with LC's code list. However, harvesting and loading is not permitted under the ResearchWorks license, since we are not licensed to redistribute the works of the vocabulary owners we are providing service to. If you need to load the vocabularies that we are making available in your local system, then you should speak directly with the vocabulary owner for their licensing terms and conditions. 2. Weekly updates to these files to these name/subject files are also available on that SRW server? Since the Terminology service at tspilot.oclc.org is a research project we update vocabularies between other research activities. Most of the vocabularies are fairly static in nature or are updated every 6 or 12 months by the vocabulary owner and can be worked in with other research activities. LC does provides OCLC with weekly updates for the production WorldCat, but we do not have the time to update our research server that frequently. However, we do try to update LCSH every couple of months. You can visit: http://tspilot.oclc.org/resources/index.html to see when the last time each vocabulary was updated. Andy.
Re: [CODE4LIB] LOC Authority Data
On 10/2/08 10/2/08 2:39 PM, Jenn Riley [EMAIL PROTECTED] wrote: Thanks for the link, Roy. I hadn't taken the time to look this far into the Grid Services terms of use. One thing stuck out to me, though. What does Library members that do ***all*** their cataloging with an OCLC subscription mean? The all part is what doesn't make sense to me on first read. Jenn, Thanks for asking. We agreed that the wording is perhaps not the best, so we changed it to Library members that contribute all current cataloging and holdings to WorldCat which we think gets more at what we mean. That is, the important thing is that you contribute information about what you have to the common pool. Thanks for spurring us to make this change and we hope that clarifies our intent. Thanks, Roy
Re: [CODE4LIB] LOC Authority Data
Roy Tennant, corollary to the question below: can OCLC provide a service its members with a list of 010 for the NAME authority records for each specific weekly update? This is a simple grep from the NAF weekly update, not infringing any copy rights. You are not distributing any data, just pointers to it, a simple notification service. We, OCLC members can take it from there, -- Ya¹aqov Ziso, eResources, Rowan University On 10/1/08 9:21 AM, Andrew Nagy [EMAIL PROTECTED] wrote: If only we knew someone who worked in the LOC that we could tell this information to From: Code for Libraries [EMAIL PROTECTED] On Behalf Of Ed Summers [EMAIL PROTECTED] Sent: Monday, September 29, 2008 7:02 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data On Mon, Sep 29, 2008 at 6:01 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. That was the bibliographic records which he purchased and donated to the Internet Archive: http://www.archive.org/details/marc_records_scriblio_net They are also available via a torrent: http://torrents.code4lib.org/ It definitely would be nice to do the same thing for the authority data. It's kind of absurd to me that this data isn't already in the public domain, since it's uh in the public domain. But what do I know, I'm not a lawyer. //Ed
Re: [CODE4LIB] LOC Authority Data
Andrew Houghton, kindly explain: 1. LC names/subjects authority files, current with 2008-09-17, are available on your SRW server http://tspilot.oclc.org/lcsh/ for us (a consortium) to harvest and load on our server for our consortial authority maintenance? 2. Weekly updates to these files to these name/subject files are also available on that SRW server? -- Ya¹aqov On 9/30/08 3:01 PM, Houghton,Andrew [EMAIL PROTECTED] wrote: From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Monday, September 29, 2008 7:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. If you just looking for access to the LCSH authority data, you can access it through our Terminology Services project. The data in our SRW server was updated to the 2008-09-17 weekly update from LC. The SRW server is located at the URI: http://tspilot.oclc.org/lcsh/ Looking for access to other authority files: FAST http://tspilot.oclc.org/fast/ GSAFD http://tspilot.oclc.org/gsafd/ MeSH http://tspilot.oclc.org/mesh/ TGM I http://tspilot.oclc.org/lctgm/ TGM II http://tspilot.oclc.org/gmgpc/ Andy.
Re: [CODE4LIB] LOC Authority Data
The NAF (Name/National Authority File) is still one important database that we are missing any kind of good machine access to, I believe. Agreed. As part of our research project we have enhanced some of the vocabulary data in the service to provide mappings and links between vocabularies. One issue we noticed with FAST was that many of the mapped terms were not being linked. We tracked this back to the term being in NAF rather than in LCSH. So to make the FAST data more usable we would have to include the entire LC authority file, both names and subjects. It is something we are looking into at the moment... Andy. === Question1: who OWNS the NAF/LCSH files that needs to be reimbursed? Question2: does OCLC or FAST (etc.) pay that owner for NAF and LCSH, and their updates? Assumption: OCLC get NAF and LCSH, and their updates from LC/NACO for free (Roy, Andy, correct me if I¹m wrong) Proposal: on the same basis OCLC get¹s these files, CODE4LIB could get them as well Rationale: given opensource technology (for ex. Apache Solr 1.3) and software, CODE4LIB could also explore options for controlled vocabularies. Ya¹aqov
Re: [CODE4LIB] LOC Authority Data
Thanks for the link, Roy. I hadn't taken the time to look this far into the Grid Services terms of use. One thing stuck out to me, though. What does Library members that do ***all*** their cataloging with an OCLC subscription mean? The all part is what doesn't make sense to me on first read. Thanks, Jenn Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Roy Tennant Sent: Tuesday, September 30, 2008 4:30 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Actually, member is more appropriate, and it is not presently behind any sort of wall in its current experimental mode, but it could become part of the WorldCat Grid Services which are free to the folks listed here: http://worldcat.org/devnet/wiki/SearchAPIWhoCanUse With other audience groups yet to be determined (could still be free for some groups/purposes, we don't know yet). Actually distributing the data is another issue, since in most cases it is not ours. Roy On 9/30/08 9/30/08 € 12:23 PM, Ross Singer [EMAIL PROTECTED] wrote: s/customer/partner/ Also, in the case of what the thread was initially calling for, what would be the legalities of redistributing this data? -Ross. On Tue, Sep 30, 2008 at 3:22 PM, Ross Singer [EMAIL PROTECTED] wrote: I'm going to go out on a limb here and assume you need to be an OCLC customer to benefit from this? -Ross. On Tue, Sep 30, 2008 at 3:01 PM, Houghton,Andrew [EMAIL PROTECTED] wrote: From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Monday, September 29, 2008 7:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. If you just looking for access to the LCSH authority data, you can access it through our Terminology Services project. The data in our SRW server was updated to the 2008-09-17 weekly update from LC. The SRW server is located at the URI: http://tspilot.oclc.org/lcsh/ Looking for access to other authority files: FASThttp://tspilot.oclc.org/fast/ GSAFD http://tspilot.oclc.org/gsafd/ MeSHhttp://tspilot.oclc.org/mesh/ TGM I http://tspilot.oclc.org/lctgm/ TGM II http://tspilot.oclc.org/gmgpc/ Andy. --
Re: [CODE4LIB] LOC Authority Data
John Hostage at Harvard could probably tell you the person to contact John Hostage [EMAIL PROTECTED] -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Nagy Sent: Wednesday, October 01, 2008 9:22 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data If only we knew someone who worked in the LOC that we could tell this information to From: Code for Libraries [EMAIL PROTECTED] On Behalf Of Ed Summers [EMAIL PROTECTED] Sent: Monday, September 29, 2008 7:02 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data On Mon, Sep 29, 2008 at 6:01 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. That was the bibliographic records which he purchased and donated to the Internet Archive: http://www.archive.org/details/marc_records_scriblio_net They are also available via a torrent: http://torrents.code4lib.org/ It definitely would be nice to do the same thing for the authority data. It's kind of absurd to me that this data isn't already in the public domain, since it's uh in the public domain. But what do I know, I'm not a lawyer. //Ed
Re: [CODE4LIB] LOC Authority Data
If only we knew someone who worked in the LOC that we could tell this information to From: Code for Libraries [EMAIL PROTECTED] On Behalf Of Ed Summers [EMAIL PROTECTED] Sent: Monday, September 29, 2008 7:02 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data On Mon, Sep 29, 2008 at 6:01 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. That was the bibliographic records which he purchased and donated to the Internet Archive: http://www.archive.org/details/marc_records_scriblio_net They are also available via a torrent: http://torrents.code4lib.org/ It definitely would be nice to do the same thing for the authority data. It's kind of absurd to me that this data isn't already in the public domain, since it's uh in the public domain. But what do I know, I'm not a lawyer. //Ed
Re: [CODE4LIB] LOC Authority Data
On Mon, Sep 29, 2008 at 6:01 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. They are available from the Internet Archive: http://www.archive.org/details/marc_records_scriblio_net Mark Matienzo Applications Developer, NYPL Labs The New York Public Library
Re: [CODE4LIB] LOC Authority Data
On 29 September 2008, Jonathan Rochkind wrote: I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. http://www.archive.org/details/marc_records_scriblio_net Ed Summers said on this list in April: On a whim I created a bittorrent of the concatenated MARC files donated to the Internet Archive by Scriblio (7,030,372 records): http://inkdroid.org/torrents/lc-bib.torrent My share ratio is 9.538. :) Bill -- William Denton, Toronto : www.miskatonic.org www.frbr.org www.openfrbr.org
Re: [CODE4LIB] LOC Authority Data
Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. On Mon, Sep 29, 2008 at 7:02 PM, Ed Summers [EMAIL PROTECTED] wrote: On Mon, Sep 29, 2008 at 6:01 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. That was the bibliographic records which he purchased and donated to the Internet Archive: http://www.archive.org/details/marc_records_scriblio_net They are also available via a torrent: http://torrents.code4lib.org/ It definitely would be nice to do the same thing for the authority data. It's kind of absurd to me that this data isn't already in the public domain, since it's uh in the public domain. But what do I know, I'm not a lawyer. //Ed
Re: [CODE4LIB] LOC Authority Data
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Monday, September 29, 2008 7:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. If you just looking for access to the LCSH authority data, you can access it through our Terminology Services project. The data in our SRW server was updated to the 2008-09-17 weekly update from LC. The SRW server is located at the URI: http://tspilot.oclc.org/lcsh/ Looking for access to other authority files: FASThttp://tspilot.oclc.org/fast/ GSAFD http://tspilot.oclc.org/gsafd/ MeSHhttp://tspilot.oclc.org/mesh/ TGM I http://tspilot.oclc.org/lctgm/ TGM II http://tspilot.oclc.org/gmgpc/ Andy.
Re: [CODE4LIB] LOC Authority Data
s/customer/partner/ Also, in the case of what the thread was initially calling for, what would be the legalities of redistributing this data? -Ross. On Tue, Sep 30, 2008 at 3:22 PM, Ross Singer [EMAIL PROTECTED] wrote: I'm going to go out on a limb here and assume you need to be an OCLC customer to benefit from this? -Ross. On Tue, Sep 30, 2008 at 3:01 PM, Houghton,Andrew [EMAIL PROTECTED] wrote: From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Monday, September 29, 2008 7:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. If you just looking for access to the LCSH authority data, you can access it through our Terminology Services project. The data in our SRW server was updated to the 2008-09-17 weekly update from LC. The SRW server is located at the URI: http://tspilot.oclc.org/lcsh/ Looking for access to other authority files: FASThttp://tspilot.oclc.org/fast/ GSAFD http://tspilot.oclc.org/gsafd/ MeSHhttp://tspilot.oclc.org/mesh/ TGM I http://tspilot.oclc.org/lctgm/ TGM II http://tspilot.oclc.org/gmgpc/ Andy.
Re: [CODE4LIB] LOC Authority Data
I'm going to go out on a limb here and assume you need to be an OCLC customer to benefit from this? -Ross. On Tue, Sep 30, 2008 at 3:01 PM, Houghton,Andrew [EMAIL PROTECTED] wrote: From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Monday, September 29, 2008 7:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. If you just looking for access to the LCSH authority data, you can access it through our Terminology Services project. The data in our SRW server was updated to the 2008-09-17 weekly update from LC. The SRW server is located at the URI: http://tspilot.oclc.org/lcsh/ Looking for access to other authority files: FASThttp://tspilot.oclc.org/fast/ GSAFD http://tspilot.oclc.org/gsafd/ MeSHhttp://tspilot.oclc.org/mesh/ TGM I http://tspilot.oclc.org/lctgm/ TGM II http://tspilot.oclc.org/gmgpc/ Andy.
Re: [CODE4LIB] LOC Authority Data
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Tuesday, September 30, 2008 3:23 PM I'm going to go out on a limb here and assume you need to be an OCLC customer to benefit from this? From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Tuesday, September 30, 2008 3:23 PM s/customer/partner/ Also, in the case of what the thread was initially calling for, what would be the legalities of redistributing this data? You do not need to be a customer/member/partner to access the authority files. It's an ongoing research project [1] which is publicly accessible to anyone over the web. The research project is covered by the OCLC ResearchWorks Terms and Conditions: http://www.oclc.org/research/researchworks/terms.htm Looks like from a quick reading of this license the redistribution of the data is prohibited, but most of the data in the system, except LCSH, is from public sources. So if you wanted to redistribute the data for a vocabulary, you can get permission from the vocabulary maintainer, just as we did. We merely consolidated freely available public controlled vocabularies into a service that other people could be used to build upon, including OCLC Research. BTW, our research project should not be confused with OCLC's production Terminology Service [2] which is only available to members with a cataloging authorization. Actually, I created the prototype for the production service, so people do get confused sometimes. If you have a cataloging authorization you can access the production service as a web service. I posted a how-to on the OCLC developer network listserv a while ago. The production service allows access to AAT, DCT, TGN, GSAFD, Maori Subject Headings (Nga Upoko Tukutuku), MeSH, NGL, TGM I, TGM II and ULAN. Obviously, the Getty vocabularies will never make it into our research project due to licensing restrictions :( Andy. [1] http://tspilot.oclc.org/resources/index.html [2] http://www.oclc.org/terminologies/default.htm
Re: [CODE4LIB] LOC Authority Data
Actually, member is more appropriate, and it is not presently behind any sort of wall in its current experimental mode, but it could become part of the WorldCat Grid Services which are free to the folks listed here: http://worldcat.org/devnet/wiki/SearchAPIWhoCanUse With other audience groups yet to be determined (could still be free for some groups/purposes, we don't know yet). Actually distributing the data is another issue, since in most cases it is not ours. Roy On 9/30/08 9/30/08 12:23 PM, Ross Singer [EMAIL PROTECTED] wrote: s/customer/partner/ Also, in the case of what the thread was initially calling for, what would be the legalities of redistributing this data? -Ross. On Tue, Sep 30, 2008 at 3:22 PM, Ross Singer [EMAIL PROTECTED] wrote: I'm going to go out on a limb here and assume you need to be an OCLC customer to benefit from this? -Ross. On Tue, Sep 30, 2008 at 3:01 PM, Houghton,Andrew [EMAIL PROTECTED] wrote: From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Monday, September 29, 2008 7:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. If you just looking for access to the LCSH authority data, you can access it through our Terminology Services project. The data in our SRW server was updated to the 2008-09-17 weekly update from LC. The SRW server is located at the URI: http://tspilot.oclc.org/lcsh/ Looking for access to other authority files: FASThttp://tspilot.oclc.org/fast/ GSAFD http://tspilot.oclc.org/gsafd/ MeSHhttp://tspilot.oclc.org/mesh/ TGM I http://tspilot.oclc.org/lctgm/ TGM II http://tspilot.oclc.org/gmgpc/ Andy. --
Re: [CODE4LIB] LOC Authority Data
And I'd add it's a huge step forward that OCLC is now making such information available through this manner. I'm pretty sure there is no additional charge for Terminologies use, but I forget if that applies to non-OCLC members as well as OCLC members? The NAF (Name/National Authority File) is still one important database that we are missing any kind of good machine access to, I believe. Jonathan Houghton,Andrew wrote: From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Ross Singer Sent: Monday, September 29, 2008 7:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Also, I noticed another dump on the IA of Library of Congress updates since the initial Bisson load. http://www.archive.org/details/marc_loc_updates In typical IA fashion, it's incredibly difficult to know what the hell this stuff is, though. -Ross. If you just looking for access to the LCSH authority data, you can access it through our Terminology Services project. The data in our SRW server was updated to the 2008-09-17 weekly update from LC. The SRW server is located at the URI: http://tspilot.oclc.org/lcsh/ Looking for access to other authority files: FASThttp://tspilot.oclc.org/fast/ GSAFD http://tspilot.oclc.org/gsafd/ MeSHhttp://tspilot.oclc.org/mesh/ TGM I http://tspilot.oclc.org/lctgm/ TGM II http://tspilot.oclc.org/gmgpc/ Andy. -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] LOC Authority Data
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Rochkind Sent: Tuesday, September 30, 2008 4:35 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data The NAF (Name/National Authority File) is still one important database that we are missing any kind of good machine access to, I believe. Agreed. As part of our research project we have enhanced some of the vocabulary data in the service to provide mappings and links between vocabularies. One issue we noticed with FAST was that many of the mapped terms were not being linked. We tracked this back to the term being in NAF rather than in LCSH. So to make the FAST data more usable we would have to include the entire LC authority file, both names and subjects. It is something we are looking into at the moment... Andy.
Re: [CODE4LIB] LOC Authority Data
Of course there is always the version of the LC/NACO file we've had up for years at http://errol.oclc.org/ (e.g. http://errol.oclc.org/laf/n+90602202.html). People might also be interested in http://viaf.org which doesn't have the original authority records, but does have 'enhanced' versions. VIAF is going to change quite a bit over the next few months, but the expectation is that much of the information is going to be publicly available. --Th -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Houghton,Andrew Sent: Tuesday, September 30, 2008 4:43 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Rochkind Sent: Tuesday, September 30, 2008 4:35 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data The NAF (Name/National Authority File) is still one important database that we are missing any kind of good machine access to, I believe. Agreed. As part of our research project we have enhanced some of the vocabulary data in the service to provide mappings and links between vocabularies. One issue we noticed with FAST was that many of the mapped terms were not being linked. We tracked this back to the term being in NAF rather than in LCSH. So to make the FAST data more usable we would have to include the entire LC authority file, both names and subjects. It is something we are looking into at the moment... Andy.
Re: [CODE4LIB] LOC Authority Data
Socialized medicine? Sure. *We* have authority files! -t On Tue, 23 Sep 2008, David Fiander wrote: One of the most important pages in the print volumes of the Library of Congress Subject Headings (LCSH), is the title page verso, which includes publication and copyright details. The folks at LC very clearly understand US copyright law, since on that page you can see that they claim that the LCSH is copyright LC _outside of the United States of America_. The same probably holds true for the copyright claim on the name authority files. You folks in the United States can do what you will with impunity, but us unwashed masses beyond your shores are likely to get in trouble. Probably the next time we attempt to cross the border. - David On Tue, Sep 23, 2008 at 5:21 PM, Jason Griffey [EMAIL PROTECTED] wrote: As I mentioned, they are available from Ibiblio on the link above. The copyright claim is...well...specious at best. But no one really wants to be the one to go to court and prove it. They've been publicly available for more than a year now on the Fred 2.0 site, and they haven't been sued, to my knowledge. Jason On Tue, Sep 23, 2008 at 5:17 PM, Nate Vack [EMAIL PROTECTED] wrote: On Tue, Sep 23, 2008 at 3:49 PM, Bryan Baldus [EMAIL PROTECTED] wrote: One way (as you likely know) (official, expensive) is via The Library of Congress Cataloging Distribution Service: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights 2: As purely factual data, catalog records are conceptually uncopyrightable Anyone who knows more about this than I do know if they're *really* copyrighted, or if it's more of a we're gonna try and say they're copyrighted and hope no one ignores us? Curious, -Nate
Re: [CODE4LIB] LOC Authority Data
I was aware of this data - but I'm really curious if anyone has ever heard of or seen a scraping process that is run frequently to get updates. The data on the fred2.0 site is from 2006. I'd like to try to keep an up to date copy - especially since us Americans are entitled to free access to the data. Andrew -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jason Griffey Sent: Tuesday, September 23, 2008 5:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LOC Authority Data Simon Spero at UNC did a scrape of the entirety of the LoC Authority files in Dec of 2006. They are available at Fred 2.0: http://www.ibiblio.org/fred2.0/wordpress/?page_id=10 Jason On Tue, Sep 23, 2008 at 4:35 PM, Andrew Nagy [EMAIL PROTECTED] wrote: Hello - I am curious if anyone knows of a way to access the entire collection of authority records from the LOC. It seems that the only way to access them know is one record at a time. Feel free to email me off line if you are uncomfortable posting a response to the list. Thanks Andrew
Re: [CODE4LIB] LOC Authority Data
Although note that these are only *subject* authorities. Andrew, I think you may also be looking for name authorities (since I assume this inquiry came from a suspiciously topically similar thread on vufind-tech). Yes - I would love to be able to obtain all authority files. Also, Ed's SKOS data lumps all of the subfields into one string literal, so: Yeah - the marc record has much more data than the rdf file. I haven't explored the indexing process of authority records in detail enough yet to determine if this string munging is a problem or not. Andrew
Re: [CODE4LIB] LOC Authority Data
Individual facts or datum are not copyrightable, but collections of facts -- particular expressions of data -- are. This is what makes phone books, databases, and the like subject to copyright. P.S. N.B. IANAL On Wed, Sep 24, 2008 at 9:59 AM, Jonathan Rochkind [EMAIL PROTECTED] wrote: Interestingly, outside the US it's somewhat more possible to claim copyright on factual data than inside the US, Europe for instance has types of IP and copyright protection for databases that the US does not. But basically, the answer is that nobody knows for sure, not even the lawyers. Jonathan Bryan Baldus wrote: On Tuesday, September 23, 2008 4:17 PM, Nate Vack wrote: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights The page [1] states: Copyright Records in the MARC Distribution Services originating with the Library of Congress are copyrighted by the Library of Congress for use outside the United States. Subscribers are granted copyright permission to selectively redistribute records outside the United States; contact LC prior to any distribution. So, in the U.S., they are not copyrightable, but outside the U.S. some copyright claim might be justified. 2: As purely factual data, catalog records are conceptually uncopyrightable For the most part, personally I would agree with this, at least for individual records (though some parts of the record, like the 520 summaries, might contain enough original creativity that could be considered copyrightable). Others might believe otherwise, at least as it pertains to the collection of the records as a whole--for example, OCLC's copyright claims on their database of records. ## On the Fred 2.0 records, aside from their age, I wish they were available in MARC 21 format rather than XML with NFC encoding. When I tried to use MarcEdit to convert the files from XML to MARC 21 (January 2007), I ran into issues with character encodings. The files also seemed to lack header lines like: ?xml version=1.0 encoding=UTF-8? collection xmlns=http://www.loc.gov/MARC21/slim; [1] http://www.loc.gov/cds/mds.html#lcaf Thank you for your assistance, Bryan Baldus Cataloger Quality Books Inc. The Best of America's Independent Presses 1-800-323-4241x402 [EMAIL PROTECTED] -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu -- Shawn Boyette [EMAIL PROTECTED]
Re: [CODE4LIB] LOC Authority Data
Oh! You're right, they're clear about that on their web page, as well. As Bryan points out. So, wait: A bunch of libraries could pool together, buy the Whole Enchilada for $28k, and put up a torrent? Or, put another way, for less than the base salary of a starting developer, *everyone* in the US could have access to this *massive* store of authority data and build Awesome Things? Think we could find a consortium that'd pony up? ;-) Cheers, -Nate PS - Dear rest of the world: you're on the honor system, OK? On Tue, Sep 23, 2008 at 4:36 PM, David Fiander [EMAIL PROTECTED] wrote: One of the most important pages in the print volumes of the Library of Congress Subject Headings (LCSH), is the title page verso, which includes publication and copyright details. The folks at LC very clearly understand US copyright law, since on that page you can see that they claim that the LCSH is copyright LC _outside of the United States of America_. The same probably holds true for the copyright claim on the name authority files. You folks in the United States can do what you will with impunity, but us unwashed masses beyond your shores are likely to get in trouble. Probably the next time we attempt to cross the border. - David On Tue, Sep 23, 2008 at 5:21 PM, Jason Griffey [EMAIL PROTECTED] wrote: As I mentioned, they are available from Ibiblio on the link above. The copyright claim is...well...specious at best. But no one really wants to be the one to go to court and prove it. They've been publicly available for more than a year now on the Fred 2.0 site, and they haven't been sued, to my knowledge. Jason On Tue, Sep 23, 2008 at 5:17 PM, Nate Vack [EMAIL PROTECTED] wrote: On Tue, Sep 23, 2008 at 3:49 PM, Bryan Baldus [EMAIL PROTECTED] wrote: One way (as you likely know) (official, expensive) is via The Library of Congress Cataloging Distribution Service: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights 2: As purely factual data, catalog records are conceptually uncopyrightable Anyone who knows more about this than I do know if they're *really* copyrighted, or if it's more of a we're gonna try and say they're copyrighted and hope no one ignores us? Curious, -Nate
Re: [CODE4LIB] LOC Authority Data
As of last update of the LOC authority files, 08-11-2008: Name authority files total 7,161,713 records Subject authority files total 339,144 records http://www.loc.gov/cds/PDFdownloads/csb/index.html informs us American citizens of the quarterly updates for New Subjects, and Replacement Subjects. These Subjects can all be then batch searched and retrieved in OCLC, but that is convoluted, and doesn¹t cover the Names, New or Replacements. Do anyone know of a way of scraping the UPDATES (for both Names and Subjects) for the LC authority files? -- Ya¹aqov Ziso, eResources-Serials, Rowan University 856 256 4804 [EMAIL PROTECTED] On 9/29/08 5:01 PM, Andrew Nagy [EMAIL PROTECTED] wrote: Although note that these are only *subject* authorities. Andrew, I think you may also be looking for name authorities (since I assume this inquiry came from a suspiciously topically similar thread on vufind-tech). Yes - I would love to be able to obtain all authority files. Also, Ed's SKOS data lumps all of the subfields into one string literal, so: Yeah - the marc record has much more data than the rdf file. I haven't explored the indexing process of authority records in detail enough yet to determine if this string munging is a problem or not. Andrew
Re: [CODE4LIB] LOC Authority Data
Actually, I'm pretty sure a phone book is not, in the US, in general, copyrightable. I don't believe US law has any special protection for collections of facts. The canonical introductory intellectual property class example, which happens to be about a phone book in fact, is Feist v. Rural Telephone Service. Which in fact even has it's own wikipedia page: http://en.wikipedia.org/wiki/Feist_v._Rural Jonathan Shawn Boyette wrote: Individual facts or datum are not copyrightable, but collections of facts -- particular expressions of data -- are. This is what makes phone books, databases, and the like subject to copyright. P.S. N.B. IANAL On Wed, Sep 24, 2008 at 9:59 AM, Jonathan Rochkind [EMAIL PROTECTED] wrote: Interestingly, outside the US it's somewhat more possible to claim copyright on factual data than inside the US, Europe for instance has types of IP and copyright protection for databases that the US does not. But basically, the answer is that nobody knows for sure, not even the lawyers. Jonathan Bryan Baldus wrote: On Tuesday, September 23, 2008 4:17 PM, Nate Vack wrote: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights The page [1] states: Copyright Records in the MARC Distribution Services originating with the Library of Congress are copyrighted by the Library of Congress for use outside the United States. Subscribers are granted copyright permission to selectively redistribute records outside the United States; contact LC prior to any distribution. So, in the U.S., they are not copyrightable, but outside the U.S. some copyright claim might be justified. 2: As purely factual data, catalog records are conceptually uncopyrightable For the most part, personally I would agree with this, at least for individual records (though some parts of the record, like the 520 summaries, might contain enough original creativity that could be considered copyrightable). Others might believe otherwise, at least as it pertains to the collection of the records as a whole--for example, OCLC's copyright claims on their database of records. ## On the Fred 2.0 records, aside from their age, I wish they were available in MARC 21 format rather than XML with NFC encoding. When I tried to use MarcEdit to convert the files from XML to MARC 21 (January 2007), I ran into issues with character encodings. The files also seemed to lack header lines like: ?xml version=1.0 encoding=UTF-8? collection xmlns=http://www.loc.gov/MARC21/slim; [1] http://www.loc.gov/cds/mds.html#lcaf Thank you for your assistance, Bryan Baldus Cataloger Quality Books Inc. The Best of America's Independent Presses 1-800-323-4241x402 [EMAIL PROTECTED] -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] LOC Authority Data
Nathan Vack wrote: So, wait: A bunch of libraries could pool together, buy the Whole Enchilada for $28k, and put up a torrent? I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. Jonathan
Re: [CODE4LIB] LOC Authority Data
On Mon, Sep 29, 2008 at 6:01 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: I thought I remembered something about Casey Bisson doing exactly that with a grant/award he received? I forget what happened to it. A snapshot would just be a snapshot of course, it wouldn't include records created or modified after the snapshot. That was the bibliographic records which he purchased and donated to the Internet Archive: http://www.archive.org/details/marc_records_scriblio_net They are also available via a torrent: http://torrents.code4lib.org/ It definitely would be nice to do the same thing for the authority data. It's kind of absurd to me that this data isn't already in the public domain, since it's uh in the public domain. But what do I know, I'm not a lawyer. //Ed
Re: [CODE4LIB] LOC Authority Data
I think they're available on the Internet Archive somewhere too? But I can never remember where. Jonathan Jason Griffey wrote: As I mentioned, they are available from Ibiblio on the link above. The copyright claim is...well...specious at best. But no one really wants to be the one to go to court and prove it. They've been publicly available for more than a year now on the Fred 2.0 site, and they haven't been sued, to my knowledge. Jason On Tue, Sep 23, 2008 at 5:17 PM, Nate Vack [EMAIL PROTECTED] wrote: On Tue, Sep 23, 2008 at 3:49 PM, Bryan Baldus [EMAIL PROTECTED] wrote: One way (as you likely know) (official, expensive) is via The Library of Congress Cataloging Distribution Service: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights 2: As purely factual data, catalog records are conceptually uncopyrightable Anyone who knows more about this than I do know if they're *really* copyrighted, or if it's more of a we're gonna try and say they're copyrighted and hope no one ignores us? Curious, -Nate -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] LOC Authority Data
Interestingly, outside the US it's somewhat more possible to claim copyright on factual data than inside the US, Europe for instance has types of IP and copyright protection for databases that the US does not. But basically, the answer is that nobody knows for sure, not even the lawyers. Jonathan Bryan Baldus wrote: On Tuesday, September 23, 2008 4:17 PM, Nate Vack wrote: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights The page [1] states: Copyright Records in the MARC Distribution Services originating with the Library of Congress are copyrighted by the Library of Congress for use outside the United States. Subscribers are granted copyright permission to selectively redistribute records outside the United States; contact LC prior to any distribution. So, in the U.S., they are not copyrightable, but outside the U.S. some copyright claim might be justified. 2: As purely factual data, catalog records are conceptually uncopyrightable For the most part, personally I would agree with this, at least for individual records (though some parts of the record, like the 520 summaries, might contain enough original creativity that could be considered copyrightable). Others might believe otherwise, at least as it pertains to the collection of the records as a whole--for example, OCLC's copyright claims on their database of records. ## On the Fred 2.0 records, aside from their age, I wish they were available in MARC 21 format rather than XML with NFC encoding. When I tried to use MarcEdit to convert the files from XML to MARC 21 (January 2007), I ran into issues with character encodings. The files also seemed to lack header lines like: ?xml version=1.0 encoding=UTF-8? collection xmlns=http://www.loc.gov/MARC21/slim; [1] http://www.loc.gov/cds/mds.html#lcaf Thank you for your assistance, Bryan Baldus Cataloger Quality Books Inc. The Best of America's Independent Presses 1-800-323-4241x402 [EMAIL PROTECTED] -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] LOC Authority Data
On Tuesday, September 23, 2008 3:35 PM, Andrew Nagy wrote: Hello - I am curious if anyone knows of a way to access the entire collection of authority records from the LOC. It seems that the only way to access them know is one record at a time. Feel free to email me off line if you are uncomfortable posting a response to the list. One way (as you likely know) (official, expensive) is via The Library of Congress Cataloging Distribution Service: http://www.loc.gov/cds/mds.html#lcaf: LC Authority Files Name Authorities MARC records for personal, corporate, conference, and geographical name headings, uniform titles, and series established by LC and cooperating libraries under the National Coordinated Cataloging Operations (NACO) program. Names written in non-roman script appear in romanized form only. Available in MARC 21 and MARCXML formats. 2008 Subscription: Available weekly. Approximately 450,000 records; including 250,000 new records. 2008 Price: $10,565 Retrospective: 1977-2007. 7,000,000 records. File size: 3,350 MB. Avg. record length: 479 bytes. 2008 Price: $10,675 Otherwise, as far as I am aware, the files that are available (for free) are less than current. I hope this helps, Bryan Baldus Cataloger Quality Books Inc. The Best of America's Independent Presses 1-800-323-4241x402 [EMAIL PROTECTED]
Re: [CODE4LIB] LOC Authority Data
Simon Spero at UNC did a scrape of the entirety of the LoC Authority files in Dec of 2006. They are available at Fred 2.0: http://www.ibiblio.org/fred2.0/wordpress/?page_id=10 Jason On Tue, Sep 23, 2008 at 4:35 PM, Andrew Nagy [EMAIL PROTECTED] wrote: Hello - I am curious if anyone knows of a way to access the entire collection of authority records from the LOC. It seems that the only way to access them know is one record at a time. Feel free to email me off line if you are uncomfortable posting a response to the list. Thanks Andrew
Re: [CODE4LIB] LOC Authority Data
One of the most important pages in the print volumes of the Library of Congress Subject Headings (LCSH), is the title page verso, which includes publication and copyright details. The folks at LC very clearly understand US copyright law, since on that page you can see that they claim that the LCSH is copyright LC _outside of the United States of America_. The same probably holds true for the copyright claim on the name authority files. You folks in the United States can do what you will with impunity, but us unwashed masses beyond your shores are likely to get in trouble. Probably the next time we attempt to cross the border. - David On Tue, Sep 23, 2008 at 5:21 PM, Jason Griffey [EMAIL PROTECTED] wrote: As I mentioned, they are available from Ibiblio on the link above. The copyright claim is...well...specious at best. But no one really wants to be the one to go to court and prove it. They've been publicly available for more than a year now on the Fred 2.0 site, and they haven't been sued, to my knowledge. Jason On Tue, Sep 23, 2008 at 5:17 PM, Nate Vack [EMAIL PROTECTED] wrote: On Tue, Sep 23, 2008 at 3:49 PM, Bryan Baldus [EMAIL PROTECTED] wrote: One way (as you likely know) (official, expensive) is via The Library of Congress Cataloging Distribution Service: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights 2: As purely factual data, catalog records are conceptually uncopyrightable Anyone who knows more about this than I do know if they're *really* copyrighted, or if it's more of a we're gonna try and say they're copyrighted and hope no one ignores us? Curious, -Nate
Re: [CODE4LIB] LOC Authority Data
Andrew Nagy wrote: |Hello - I am curious if anyone knows of a way to access the |entire collection of authority records from the LOC. It seems |that the only way to access them know is one record at a time. | Feel free to email me off line if you are uncomfortable |posting a response to the list. See Ed Summers' (who's on this list) LCSH/SKOS project at http://lcsh.info/ Harvey -- === Harvey E. Hahn, Manager, Technical Services Department Arlington Heights (Illinois) Memorial Library 847/506-2644 - FX: 847/506-2650 - Email: hhahn(at)ahml(dot)info OML Scripts web pages: http://www.ahml.info/oml/ Personal web pages: http://users.anet.com/~packrat