Re: [CODE4LIB] Works API
Quoting Emily Lynema : Karen, Is it just Open Library that is excluding serials, or is that the entire OCA project? I think the OCA was focused on monographs but did allow in some serials, possibly because it wasn't clear what they were (as it can be with bound or reprinted serials). I have warned the OL folks that handling serials is quite complex; I think it's a good thing that they are cutting their "bibliographic teeth" on monographs, which are complex enough. So what is OL's vision for work presentation of multi-volume monographs in the future? I don't think it's fixed in stone, but as your example below shows, there will probably be use made of the table of contents area for multi-volume works that have distinct titles or distinct contents. That information will not always be available. As your example also shows, the volume numbers may be embedded in the archive.org name for the item, but I don't know how reliable those are. It doesn't appear that there is a clear statement of volume number that could be displayed, e.g. "v. 1 [link] / v. 2 [link]". If that can be derived from the volume number in the data, then the OL folks are probably clever enough to pull that off. My fear is that those numbers may not have been applied consistently during the scanning process (e.g. I believe that numbers are also used when a work is being scanned that has already been scanned... and I do mean work, not manifestation, although it could be either, because of how the names are derived). kc When we load OCA records back into our local ILS, we label the URLs with volume numbers; I believe these volume numbers are pulled out of the URL to the text itself that OCA gives back to us. Here's an example of one of these records in our catalog: http://www2.lib.ncsu.edu/catalog/record/NCSU2218397 Here's the same record in Open Library: http://openlibrary.org/b/OL23299490M/ferns_%28Filicales%29 So hopefully the volume numbers have indeed been retained, even if just part of the link to the digitized text. I'd be happy to have landing pages like this available in Open Library for multi-volume works (including serials, of course), even if the links to each volume aren't labeled with the volume number! And, of course, I'd need a reliable way to link to these landing pages from external systems (this could maybe be accomplished with identifiers if I thought about it a little). This one record in Open Library is already a success for me, since it aggregates 3 individual records on the Internet Archive site (one for each digitized volume): http://www.archive.org/search.php?query=The%20ferns%20%28Filicales%29%20treated%20comparatively%20with%20a%20view%20to%20their%20natural%20classification%20AND%20mediatype%3Atexts -emily Karen Coyle wrote: Quoting Emily Lynema : What seems like would make more sense for us is to link to a Work record in Open Library or Internet Archive which can then direct users to all volumes digitized for that Work. I searched this title in Open Library and found individual results for the various years of the journal, so it didn't seem like that kind of aggregated record was being exposed to users at this point. See here for an example: http://openlibrary.org/search?q=polytechnisches+Journal Interesting idea, Emily. In general it makes sense, but a few caveats: 1) Open Library does not *consciously* take in non-monographs. Some do slip in, but it is intended to be a Books database 2) Multi-volume items are a general problem because they end up looking like duplicate entries (each is represented by the same bibliographic data), and I fear that some may be lost during de-duping. OL has it on its list of "things to fix". Right now, the record format doesn't have a place to link a digital file to a volume number within a "Manifestation" level record. (And I fear that in some cases the volume numbers may not have been retained in the metadata. *sigh*) kc Do you think a Work record page in Open Library that we could link to from our local systems would be an effective solution to this problem? Anybody have other ideas? -emiliy CODE4LIB automatic digest system wrote: -- Date:Tue, 30 Mar 2010 10:22:41 -0700 From:Karen Coyle Subject: Works API Open Library now has Works defined, and is looking to develop an API for their retrieval. It makes obvious sense that when a Work is retrieved via the API, that the data output would include links to the Editions that link to that Work. Here are a few possible options: 1) Retrieve Work information (author, title, subjects, possibly reviews, descriptions, first lines) alone 2) Retrieve Work information + OL identifiers for all related Editions 3) Retrieve Work information + OL identifiers + any other identifie
Re: [CODE4LIB] Works API
Karen, Is it just Open Library that is excluding serials, or is that the entire OCA project? I'm assuming it's the former; however I think it's the Open Library work surrounding user access to digitized content that's really going to make these materials accessible. It seems much more advanced than access on the archive.org site. So what is OL's vision for work presentation of multi-volume monographs in the future? When we load OCA records back into our local ILS, we label the URLs with volume numbers; I believe these volume numbers are pulled out of the URL to the text itself that OCA gives back to us. Here's an example of one of these records in our catalog: http://www2.lib.ncsu.edu/catalog/record/NCSU2218397 Here's the same record in Open Library: http://openlibrary.org/b/OL23299490M/ferns_%28Filicales%29 So hopefully the volume numbers have indeed been retained, even if just part of the link to the digitized text. I'd be happy to have landing pages like this available in Open Library for multi-volume works (including serials, of course), even if the links to each volume aren't labeled with the volume number! And, of course, I'd need a reliable way to link to these landing pages from external systems (this could maybe be accomplished with identifiers if I thought about it a little). This one record in Open Library is already a success for me, since it aggregates 3 individual records on the Internet Archive site (one for each digitized volume): http://www.archive.org/search.php?query=The%20ferns%20%28Filicales%29%20treated%20comparatively%20with%20a%20view%20to%20their%20natural%20classification%20AND%20mediatype%3Atexts -emily Karen Coyle wrote: Quoting Emily Lynema : What seems like would make more sense for us is to link to a Work record in Open Library or Internet Archive which can then direct users to all volumes digitized for that Work. I searched this title in Open Library and found individual results for the various years of the journal, so it didn't seem like that kind of aggregated record was being exposed to users at this point. See here for an example: http://openlibrary.org/search?q=polytechnisches+Journal Interesting idea, Emily. In general it makes sense, but a few caveats: 1) Open Library does not *consciously* take in non-monographs. Some do slip in, but it is intended to be a Books database 2) Multi-volume items are a general problem because they end up looking like duplicate entries (each is represented by the same bibliographic data), and I fear that some may be lost during de-duping. OL has it on its list of "things to fix". Right now, the record format doesn't have a place to link a digital file to a volume number within a "Manifestation" level record. (And I fear that in some cases the volume numbers may not have been retained in the metadata. *sigh*) kc Do you think a Work record page in Open Library that we could link to from our local systems would be an effective solution to this problem? Anybody have other ideas? -emiliy CODE4LIB automatic digest system wrote: -- Date:Tue, 30 Mar 2010 10:22:41 -0700 From:Karen Coyle Subject: Works API Open Library now has Works defined, and is looking to develop an API for their retrieval. It makes obvious sense that when a Work is retrieved via the API, that the data output would include links to the Editions that link to that Work. Here are a few possible options: 1) Retrieve Work information (author, title, subjects, possibly reviews, descriptions, first lines) alone 2) Retrieve Work information + OL identifiers for all related Editions 3) Retrieve Work information + OL identifiers + any other identifiers related to the Edition (ISBN, OCLC#, LCCN) 4) Retrieve Work information and links to Editions with full text / scans Well, you can see where I'm going with this. What would be useful? kc -- Emily Lynema Associate Department Head Information Technology, NCSU Libraries 919-513-8031 emily_lyn...@ncsu.edu
Re: [CODE4LIB] Works API
On Wed, 31 Mar 2010, stuart yeates wrote: Jonathan Rochkind wrote: Karen Coyle wrote: The OL only has full text links, but the link goes to a page at the Internet Archive that lists all of the available formats. I would prefer that the link go directly to a display of the book, and offer other formats from there (having to click twice really turns people off, especially when they are browsing). So unfortunately, other than "full text" there won't be more to say. In an API, it would be _optimal_ if you'd reveal all these links, tagged with a controlled vocabulary of some kind letting us know what they are, so the client can decide for itself what to do with them (which may not even be immediately showing them to any user at all, but may be analyzing them for some other purpose). Even better, for those of us who have multiple formats of full text (TEI XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs to the full text, differentiated using the mime-type. Would different forms of processing have different mime-types? (ie, we can tell it's a PDF, but can we tell what's actually in it?) Personally, for the different packaging formats, if you're going to be selecting using mime-type, I'd be inclined to hide it all behind a single URL -- the user agent could set the appropriate Accept header, so long as it's being served by HTTP. ... I admit, it's possible that this works better for APIs than user browsing; they might prefer a PDF for digital library objects, but prefer HTML for other purposes. We were hoping to allow users to set cookies to set their preferences on processing & packaging for our system, but I'm still waiting for a response to the paperwork that I filed to be allowed to use them. (little known fact -- OMB M-00-13 outlaws cookies on all government websites; OMB M-03-22 spells out some of the procedures for being allowed around it, but I've given up trying to let them know, when they're set up so bad you can't even report themm [3]) -Joe [OMB M-00-13] http://www.whitehouse.gov/omb/memoranda_m00-13/ [OMB M-03-22] http://www.whitehouse.gov/omb/memoranda_m03-22/ [3] http://politics.slashdot.org/comments.pl?sid=1021887&cid=25678129
Re: [CODE4LIB] Works API
I will just add (again) to the request for all links. As Jonathan says the client can then decide what to show, how to group them, and so on. I had rather sloppily elided things like format of full text into my "structural" information about the link. And second the request that some simple coding (controlled vocabulary anyone?) is used for these values so that we clients can determine what we are seeing. Thanks - Peter > -Original Message- > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of > stuart yeates > Sent: Tuesday, March 30, 2010 18:20 > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Works API > > Jonathan Rochkind wrote: > > Karen Coyle wrote: > >> The OL only has full text links, but the link goes to a page at the > >> Internet Archive that lists all of the available formats. I would > >> prefer that the link go directly to a display of the book, and offer > >> other formats from there (having to click twice really turns people > >> off, especially when they are browsing). So unfortunately, other than > >> "full text" there won't be more to say. > > > > In an API, it would be _optimal_ if you'd reveal all these links, tagged > > with a controlled vocabulary of some kind letting us know what they are, > > so the client can decide for itself what to do with them (which may not > > even be immediately showing them to any user at all, but may be > > analyzing them for some other purpose). > > Even better, for those of us who have multiple formats of full text (TEI > XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs > to the full text, differentiated using the mime-type. > > cheers > stuart > -- > Stuart Yeates > http://www.nzetc.org/ New Zealand Electronic Text Centre > http://researcharchive.vuw.ac.nz/ Institutional Repository
Re: [CODE4LIB] Works API
Jonathan Rochkind wrote: Karen Coyle wrote: The OL only has full text links, but the link goes to a page at the Internet Archive that lists all of the available formats. I would prefer that the link go directly to a display of the book, and offer other formats from there (having to click twice really turns people off, especially when they are browsing). So unfortunately, other than "full text" there won't be more to say. In an API, it would be _optimal_ if you'd reveal all these links, tagged with a controlled vocabulary of some kind letting us know what they are, so the client can decide for itself what to do with them (which may not even be immediately showing them to any user at all, but may be analyzing them for some other purpose). Even better, for those of us who have multiple formats of full text (TEI XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs to the full text, differentiated using the mime-type. cheers stuart -- Stuart Yeates http://www.nzetc.org/ New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/ Institutional Repository
Re: [CODE4LIB] Works API
Karen Coyle wrote: The OL only has full text links, but the link goes to a page at the Internet Archive that lists all of the available formats. I would prefer that the link go directly to a display of the book, and offer other formats from there (having to click twice really turns people off, especially when they are browsing). So unfortunately, other than "full text" there won't be more to say. In an API, it would be _optimal_ if you'd reveal all these links, tagged with a controlled vocabulary of some kind letting us know what they are, so the client can decide for itself what to do with them (which may not even be immediately showing them to any user at all, but may be analyzing them for some other purpose). But the full text link, I agree, should be the first or default link, and if you CAN only supply one in an API, I agree that is the right one -- unless a particular record is not available in full text. (Which hopefully should be apparent from the API response!). Jonathan
Re: [CODE4LIB] Works API
Quoting Peter Noerr : For our purposes (federated search) it would be most useful to have as many of the available links (OL or other) as possible, and as much information about the link as possible. Obvious "structural" stuff like the type of identifier, but also the nature of the linked object (as you suggest "full text", "scan", etc.) This enables the links to be "categorized" in the user display so they can eliminate the ones not of interest, or focus on those that are. The OL only has full text links, but the link goes to a page at the Internet Archive that lists all of the available formats. I would prefer that the link go directly to a display of the book, and offer other formats from there (having to click twice really turns people off, especially when they are browsing). So unfortunately, other than "full text" there won't be more to say. Anything which differentiates the links from the perspective of the user is generally useful. In this regard some information about the editions at the ends of the links (even just a number and/or date) would be useful, and stop systems coming back to OL multiple times for all the linked records only to extract and display one or two bits of information. If you want to link from your bib records (Manifestations) to full texts of books, then you'll probably prefer to retrieve Editions, not Works. There is a plan afoot to produce a file, possibly of MARC records, for all of the full text works that the Internet Archive has. Those are at the Manifestation level, naturally. I'll ask about adding the publication date to the output. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Works API
For our purposes (federated search) it would be most useful to have as many of the available links (OL or other) as possible, and as much information about the link as possible. Obvious "structural" stuff like the type of identifier, but also the nature of the linked object (as you suggest "full text", "scan", etc.) This enables the links to be "categorized" in the user display so they can eliminate the ones not of interest, or focus on those that are. Anything which differentiates the links from the perspective of the user is generally useful. In this regard some information about the editions at the ends of the links (even just a number and/or date) would be useful, and stop systems coming back to OL multiple times for all the linked records only to extract and display one or two bits of information. This has got to be the worst case for user response time, and almost certainly for load on the OL system. So if a certain amount of this information can be statically pre-coordinated with the links, or gathered by OL at request time, it has got to be more efficient. For us the format of the records is of little importance as we convert them anyway. Peter > -Original Message- > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of > Karen Coyle > Sent: Tuesday, March 30, 2010 10:23 > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] Works API > > Open Library now has Works defined, and is looking to develop an API > for their retrieval. It makes obvious sense that when a Work is > retrieved via the API, that the data output would include links to the > Editions that link to that Work. Here are a few possible options: > > 1) Retrieve Work information (author, title, subjects, possibly > reviews, descriptions, first lines) alone > 2) Retrieve Work information + OL identifiers for all related Editions > 3) Retrieve Work information + OL identifiers + any other identifiers > related to the Edition (ISBN, OCLC#, LCCN) > 4) Retrieve Work information and links to Editions with full text / scans > > Well, you can see where I'm going with this. What would be useful? > > kc > > -- > Karen Coyle > kco...@kcoyle.net http://kcoyle.net > ph: 1-510-540-7596 > m: 1-510-435-8234 > skype: kcoylenet
Re: [CODE4LIB] Works API
On Tue, Mar 30, 2010 at 1:52 PM, Karen Coyle wrote: > Ed, thanks. I'll need you to be a bit more -v on this one: are you asking > for a an RDF option on the API, or that Works as a whole be represented as > linked data? The Open Library doesn't present itself as linked data, as you > know, and although that would be very interesting I don't think that's on > their production schedule for the near future. Well you do have a nice start at some Linked Data views already in Open Library, e.g. http://openlibrary.org/b/OL8123073M.rdf I guess what I was suggesting is that you link these Expressions up with their respective Works where you know the relations, perhaps using Ian Davis' FRBR vocabulary? I don't think this precludes a handy web2.0 API like what OCLC and LibraryThing offer already ... but there's an opportunity to make the Linked Data views you have already quite a bit richer I think. That being said, I'm probably in a minority view here thinking that the Linked Data pattern has something to offer. Queue the Tim Spalding rendition of Don't Believe the Semantic Web Hype :-) //Ed [1] http://vocab.org/frbr/
Re: [CODE4LIB] Works API
Quoting Ed Summers : I realize it's of limited utility compared to yet another web2.0 API, but I think it would be good to see Works represented somehow in the RDF Linked Data views...assuming they're not already. //Ed Ed, thanks. I'll need you to be a bit more -v on this one: are you asking for a an RDF option on the API, or that Works as a whole be represented as linked data? The Open Library doesn't present itself as linked data, as you know, and although that would be very interesting I don't think that's on their production schedule for the near future. kc On Tue, Mar 30, 2010 at 1:22 PM, Karen Coyle wrote: Open Library now has Works defined, and is looking to develop an API for their retrieval. It makes obvious sense that when a Work is retrieved via the API, that the data output would include links to the Editions that link to that Work. Here are a few possible options: 1) Retrieve Work information (author, title, subjects, possibly reviews, descriptions, first lines) alone 2) Retrieve Work information + OL identifiers for all related Editions 3) Retrieve Work information + OL identifiers + any other identifiers related to the Edition (ISBN, OCLC#, LCCN) 4) Retrieve Work information and links to Editions with full text / scans Well, you can see where I'm going with this. What would be useful? kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Works API
I realize it's of limited utility compared to yet another web2.0 API, but I think it would be good to see Works represented somehow in the RDF Linked Data views...assuming they're not already. //Ed On Tue, Mar 30, 2010 at 1:22 PM, Karen Coyle wrote: > Open Library now has Works defined, and is looking to develop an API for > their retrieval. It makes obvious sense that when a Work is retrieved via > the API, that the data output would include links to the Editions that link > to that Work. Here are a few possible options: > > 1) Retrieve Work information (author, title, subjects, possibly reviews, > descriptions, first lines) alone > 2) Retrieve Work information + OL identifiers for all related Editions > 3) Retrieve Work information + OL identifiers + any other identifiers > related to the Edition (ISBN, OCLC#, LCCN) > 4) Retrieve Work information and links to Editions with full text / scans > > Well, you can see where I'm going with this. What would be useful? > > kc > > -- > Karen Coyle > kco...@kcoyle.net http://kcoyle.net > ph: 1-510-540-7596 > m: 1-510-435-8234 > skype: kcoylenet >
[CODE4LIB] Works API
Open Library now has Works defined, and is looking to develop an API for their retrieval. It makes obvious sense that when a Work is retrieved via the API, that the data output would include links to the Editions that link to that Work. Here are a few possible options: 1) Retrieve Work information (author, title, subjects, possibly reviews, descriptions, first lines) alone 2) Retrieve Work information + OL identifiers for all related Editions 3) Retrieve Work information + OL identifiers + any other identifiers related to the Edition (ISBN, OCLC#, LCCN) 4) Retrieve Work information and links to Editions with full text / scans Well, you can see where I'm going with this. What would be useful? kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet