Re: [CODE4LIB] Works API

2010-04-02 Thread Karen Coyle

Quoting Emily Lynema :


Karen,

Is it just Open Library that is excluding serials, or is that the
entire OCA project?


I think the OCA was focused on monographs but did allow in some  
serials, possibly because it wasn't clear what they were (as it can be  
with bound or reprinted serials). I have warned the OL folks that  
handling serials is quite complex; I think it's a good thing that they  
are cutting their "bibliographic teeth" on monographs, which are  
complex enough.





So what is OL's vision for work presentation of multi-volume monographs
in the future?


I don't think it's fixed in stone, but as your example below shows,  
there will probably be use made of the table of contents area for  
multi-volume works that have distinct titles or distinct contents.  
That information will not always be available. As your example also  
shows, the volume numbers may be embedded in the archive.org name for  
the item, but I don't know how reliable those are. It doesn't appear  
that there is a clear statement of volume number that could be  
displayed, e.g. "v. 1 [link] / v. 2 [link]". If that can be derived  
from the volume number in the data, then the OL folks are probably  
clever enough to pull that off. My fear is that those numbers may not  
have been applied consistently during the scanning process (e.g. I  
believe that numbers are also used when a work is being scanned that  
has already been scanned... and I do mean work, not manifestation,  
although it could be either, because of how the names are derived).


kc




When we load OCA records back into our local ILS, we label the URLs
with volume numbers; I believe these volume numbers are pulled out of
the URL to the text itself that OCA gives back to us.

Here's an example of one of these records in our catalog:
http://www2.lib.ncsu.edu/catalog/record/NCSU2218397

Here's the same record in Open Library:
http://openlibrary.org/b/OL23299490M/ferns_%28Filicales%29

So hopefully the volume numbers have indeed been retained, even if just
part of the link to the digitized text. I'd be happy to have landing
pages like this available in Open Library for multi-volume works
(including serials, of course), even if the links to each volume aren't
labeled with the volume number! And, of course, I'd need a reliable way
to link to these landing pages from external systems (this could maybe
be accomplished with identifiers if I thought about it a little).

This one record in Open Library is already a success for me, since it
aggregates 3 individual records on the Internet Archive site (one for
each digitized volume):
http://www.archive.org/search.php?query=The%20ferns%20%28Filicales%29%20treated%20comparatively%20with%20a%20view%20to%20their%20natural%20classification%20AND%20mediatype%3Atexts

-emily



Karen Coyle wrote:

Quoting Emily Lynema :



What seems like would make more sense for us is to link to a Work
record in Open Library or Internet Archive which can then direct users
to all volumes digitized for that Work. I searched this title in Open
Library and found individual results for the various years of the
journal, so it didn't seem like that kind of aggregated record was
being exposed to users at this point.

See here for an example:
http://openlibrary.org/search?q=polytechnisches+Journal


Interesting idea, Emily. In general it makes sense, but a few caveats:

1) Open Library does not *consciously* take in non-monographs. Some  
 do slip in, but it is intended to be a Books database
2) Multi-volume items are a general problem because they end up   
looking like duplicate entries (each is represented by the same   
bibliographic data), and I fear that some may be lost during   
de-duping. OL has it on its list of "things to fix". Right now, the  
 record format doesn't have a place to link a digital file to a   
volume number within a "Manifestation" level record. (And I fear   
that in some cases the volume numbers may not have been retained in  
 the metadata. *sigh*)


kc




Do you think a Work record page in Open Library that we could link to
from our local systems would be an effective solution to this problem?
Anybody have other ideas?

-emiliy

CODE4LIB automatic digest system wrote:

--

Date:Tue, 30 Mar 2010 10:22:41 -0700
From:Karen Coyle 
Subject: Works API

Open Library now has Works defined, and is looking to develop an   
 API  for their retrieval. It makes obvious sense that when a  
Work   is  retrieved via the API, that the data output would  
include  links  to the Editions that link to that Work. Here are  
a few  possible  options:


1) Retrieve Work information (author, title, subjects, possibly
 reviews, descriptions, first lines) alone

2) Retrieve Work information + OL identifiers for all related Editions
3) Retrieve Work information + OL identifiers + any other
identifie

Re: [CODE4LIB] Works API

2010-04-01 Thread Emily Lynema

Karen,

Is it just Open Library that is excluding serials, or is that the entire 
OCA project? I'm assuming it's the former; however I think it's the Open 
Library work surrounding user access to digitized content that's really 
going to make these materials accessible. It seems much more advanced 
than access on the archive.org site.


So what is OL's vision for work presentation of multi-volume monographs 
in the future?


When we load OCA records back into our local ILS, we label the URLs with 
volume numbers; I believe these volume numbers are pulled out of the URL 
to the text itself that OCA gives back to us.


Here's an example of one of these records in our catalog:
http://www2.lib.ncsu.edu/catalog/record/NCSU2218397

Here's the same record in Open Library:
http://openlibrary.org/b/OL23299490M/ferns_%28Filicales%29

So hopefully the volume numbers have indeed been retained, even if just 
part of the link to the digitized text. I'd be happy to have landing 
pages like this available in Open Library for multi-volume works 
(including serials, of course), even if the links to each volume aren't 
labeled with the volume number! And, of course, I'd need a reliable way 
to link to these landing pages from external systems (this could maybe 
be accomplished with identifiers if I thought about it a little).


This one record in Open Library is already a success for me, since it 
aggregates 3 individual records on the Internet Archive site (one for 
each digitized volume):

http://www.archive.org/search.php?query=The%20ferns%20%28Filicales%29%20treated%20comparatively%20with%20a%20view%20to%20their%20natural%20classification%20AND%20mediatype%3Atexts

-emily



Karen Coyle wrote:

Quoting Emily Lynema :



What seems like would make more sense for us is to link to a Work
record in Open Library or Internet Archive which can then direct users
to all volumes digitized for that Work. I searched this title in Open
Library and found individual results for the various years of the
journal, so it didn't seem like that kind of aggregated record was
being exposed to users at this point.

See here for an example:
http://openlibrary.org/search?q=polytechnisches+Journal


Interesting idea, Emily. In general it makes sense, but a few caveats:

1) Open Library does not *consciously* take in non-monographs. Some do 
slip in, but it is intended to be a Books database
2) Multi-volume items are a general problem because they end up looking 
like duplicate entries (each is represented by the same bibliographic 
data), and I fear that some may be lost during de-duping. OL has it on 
its list of "things to fix". Right now, the record format doesn't have a 
place to link a digital file to a volume number within a "Manifestation" 
level record. (And I fear that in some cases the volume numbers may not 
have been retained in the metadata. *sigh*)


kc




Do you think a Work record page in Open Library that we could link to
from our local systems would be an effective solution to this problem?
Anybody have other ideas?

-emiliy

CODE4LIB automatic digest system wrote:

--

Date:Tue, 30 Mar 2010 10:22:41 -0700
From:Karen Coyle 
Subject: Works API

Open Library now has Works defined, and is looking to develop an  
API  for their retrieval. It makes obvious sense that when a Work  
is  retrieved via the API, that the data output would include links 
 to the Editions that link to that Work. Here are a few possible  
options:


1) Retrieve Work information (author, title, subjects, possibly   
reviews, descriptions, first lines) alone

2) Retrieve Work information + OL identifiers for all related Editions
3) Retrieve Work information + OL identifiers + any other  
identifiers  related to the Edition (ISBN, OCLC#, LCCN)
4) Retrieve Work information and links to Editions with full text / 
scans


Well, you can see where I'm going with this. What would be useful?

kc




--
Emily Lynema
Associate Department Head
Information Technology, NCSU Libraries
919-513-8031
emily_lyn...@ncsu.edu






Re: [CODE4LIB] Works API

2010-03-31 Thread Joe Hourcle

On Wed, 31 Mar 2010, stuart yeates wrote:


Jonathan Rochkind wrote:

Karen Coyle wrote:
The OL only has full text links, but the link goes to a page at the 
Internet Archive that lists all of the available formats. I would  prefer 
that the link go directly to a display of the book, and offer  other 
formats from there (having to click twice really turns people  off, 
especially when they are browsing). So unfortunately, other than  "full 
text" there won't be more to say.


In an API, it would be _optimal_ if you'd reveal all these links, tagged 
with a controlled vocabulary of some kind letting us know what they are, so 
the client can decide for itself what to do with them (which may not even 
be immediately showing them to any user at all, but may be analyzing them 
for some other purpose). 


Even better, for those of us who have multiple formats of full text (TEI XML, 
HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs to the full 
text, differentiated using the mime-type.


Would different forms of processing have different mime-types?  (ie, we 
can tell it's a PDF, but can we tell what's actually in it?)


Personally, for the different packaging formats, if you're going to be 
selecting using mime-type, I'd be inclined to hide it all behind a single 
URL -- the user agent could set the appropriate Accept header, so long as 
it's being served by HTTP.


...

I admit, it's possible that this works better for APIs than user browsing; 
they might prefer a PDF for digital library objects, but prefer HTML for 
other purposes.  We were hoping to allow users to set cookies to set their 
preferences on processing & packaging for our system, but I'm still 
waiting for a response to the paperwork that I filed to be allowed to use 
them.


(little known fact -- OMB M-00-13 outlaws cookies on all government 
websites; OMB M-03-22 spells out some of the procedures for being allowed 
around it, but I've given up trying to let them know, when they're set up 
so bad you can't even report themm [3])


-Joe


[OMB M-00-13] http://www.whitehouse.gov/omb/memoranda_m00-13/
[OMB M-03-22] http://www.whitehouse.gov/omb/memoranda_m03-22/
[3] http://politics.slashdot.org/comments.pl?sid=1021887&cid=25678129


Re: [CODE4LIB] Works API

2010-03-30 Thread Peter Noerr
I will just add (again) to the request for all links. As Jonathan says the 
client can then decide what to show, how to group them, and so on. 

I had rather sloppily elided things like format of full text into my 
"structural" information about the link. 

And second the request that some simple coding (controlled vocabulary anyone?) 
is used for these values so that we clients can determine what we are seeing.

Thanks  -  Peter


> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> stuart yeates
> Sent: Tuesday, March 30, 2010 18:20
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Works API
> 
> Jonathan Rochkind wrote:
> > Karen Coyle wrote:
> >> The OL only has full text links, but the link goes to a page at the
> >> Internet Archive that lists all of the available formats. I would
> >> prefer that the link go directly to a display of the book, and offer
> >> other formats from there (having to click twice really turns people
> >> off, especially when they are browsing). So unfortunately, other than
> >> "full text" there won't be more to say.
> >
> > In an API, it would be _optimal_ if you'd reveal all these links, tagged
> > with a controlled vocabulary of some kind letting us know what they are,
> > so the client can decide for itself what to do with them (which may not
> > even be immediately showing them to any user at all, but may be
> > analyzing them for some other purpose).
> 
> Even better, for those of us who have multiple formats of full text (TEI
> XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs
> to the full text, differentiated using the mime-type.
> 
> cheers
> stuart
> --
> Stuart Yeates
> http://www.nzetc.org/   New Zealand Electronic Text Centre
> http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] Works API

2010-03-30 Thread stuart yeates

Jonathan Rochkind wrote:

Karen Coyle wrote:
The OL only has full text links, but the link goes to a page at the  
Internet Archive that lists all of the available formats. I would  
prefer that the link go directly to a display of the book, and offer  
other formats from there (having to click twice really turns people  
off, especially when they are browsing). So unfortunately, other than  
"full text" there won't be more to say.


In an API, it would be _optimal_ if you'd reveal all these links, tagged 
with a controlled vocabulary of some kind letting us know what they are, 
so the client can decide for itself what to do with them (which may not 
even be immediately showing them to any user at all, but may be 
analyzing them for some other purpose). 


Even better, for those of us who have multiple formats of full text (TEI 
XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs 
to the full text, differentiated using the mime-type.


cheers
stuart
--
Stuart Yeates
http://www.nzetc.org/   New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] Works API

2010-03-30 Thread Jonathan Rochkind

Karen Coyle wrote:


The OL only has full text links, but the link goes to a page at the  
Internet Archive that lists all of the available formats. I would  
prefer that the link go directly to a display of the book, and offer  
other formats from there (having to click twice really turns people  
off, especially when they are browsing). So unfortunately, other than  
"full text" there won't be more to say.
  


In an API, it would be _optimal_ if you'd reveal all these links, tagged 
with a controlled vocabulary of some kind letting us know what they are, 
so the client can decide for itself what to do with them (which may not 
even be immediately showing them to any user at all, but may be 
analyzing them for some other purpose). 

But the full text link, I agree, should be the first or default link, 
and if you CAN only supply one in an API, I agree that is the right one 
-- unless a particular record is not available in full text.  (Which 
hopefully should be apparent from the API response!).


Jonathan

  


Re: [CODE4LIB] Works API

2010-03-30 Thread Karen Coyle

Quoting Peter Noerr :

For our purposes (federated search) it would be most useful to have   
as many of the available links (OL or other) as possible, and as   
much information about the link as possible. Obvious "structural"   
stuff like the type of identifier, but also the nature of the linked  
 object (as you suggest "full text", "scan", etc.) This enables the   
links to be "categorized" in the user display so they can eliminate   
the ones not of interest, or focus on those that are.


The OL only has full text links, but the link goes to a page at the  
Internet Archive that lists all of the available formats. I would  
prefer that the link go directly to a display of the book, and offer  
other formats from there (having to click twice really turns people  
off, especially when they are browsing). So unfortunately, other than  
"full text" there won't be more to say.




Anything which differentiates the links from the perspective of the   
user is generally useful. In this regard some information about the   
editions at the ends of the links (even just a number and/or date)   
would be useful, and stop systems coming back to OL multiple times   
for all the linked records only to extract and display one or two   
bits of information.


If you want to link from your bib records (Manifestations) to full  
texts of books, then you'll probably prefer to retrieve Editions, not  
Works. There is a plan afoot to produce a file, possibly of MARC  
records, for all of the full text works that the Internet Archive has.  
Those are at the Manifestation level, naturally.


I'll ask about adding the publication date to the output.

kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Works API

2010-03-30 Thread Peter Noerr
For our purposes (federated search) it would be most useful to have as many of 
the available links (OL or other) as possible, and as much information about 
the link as possible. Obvious "structural" stuff like the type of identifier, 
but also the nature of the linked object (as you suggest "full text", "scan", 
etc.) This enables the links to be "categorized" in the user display so they 
can eliminate the ones not of interest, or focus on those that are.

Anything which differentiates the links from the perspective of the user is 
generally useful. In this regard some information about the editions at the 
ends of the links (even just a number and/or date) would be useful, and stop 
systems coming back to OL multiple times for all the linked records only to 
extract and display one or two bits of information. This has got to be the 
worst case for user response time, and almost certainly for load on the OL 
system. So if a certain amount of this information can be statically 
pre-coordinated with the links, or gathered by OL at request time, it has got 
to be more efficient.

For us the format of the records is of little importance as we convert them 
anyway.

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Karen Coyle
> Sent: Tuesday, March 30, 2010 10:23
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Works API
> 
> Open Library now has Works defined, and is looking to develop an API
> for their retrieval. It makes obvious sense that when a Work is
> retrieved via the API, that the data output would include links to the
> Editions that link to that Work. Here are a few possible options:
> 
> 1) Retrieve Work information (author, title, subjects, possibly
> reviews, descriptions, first lines) alone
> 2) Retrieve Work information + OL identifiers for all related Editions
> 3) Retrieve Work information + OL identifiers + any other identifiers
> related to the Edition (ISBN, OCLC#, LCCN)
> 4) Retrieve Work information and links to Editions with full text / scans
> 
> Well, you can see where I'm going with this. What would be useful?
> 
> kc
> 
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet


Re: [CODE4LIB] Works API

2010-03-30 Thread Ed Summers
On Tue, Mar 30, 2010 at 1:52 PM, Karen Coyle  wrote:
> Ed, thanks. I'll need you to be a bit more -v on this one: are you asking
> for a an RDF option on the API, or that Works as a whole be represented as
> linked data? The Open Library doesn't present itself as linked data, as you
> know, and although that would be very interesting I don't think that's on
> their production schedule for the near future.

Well you do have a nice start at some Linked Data views already in
Open Library, e.g.

  http://openlibrary.org/b/OL8123073M.rdf

I guess what I was suggesting is that you link these Expressions up
with their respective Works where you know the relations, perhaps
using Ian Davis' FRBR vocabulary? I don't think this precludes a handy
web2.0 API like what OCLC and LibraryThing offer already ... but
there's an opportunity to make the Linked Data views you have already
quite a bit richer I think.

That being said, I'm probably in a minority view here thinking that
the Linked Data pattern has something to offer. Queue the Tim Spalding
rendition of Don't Believe the Semantic Web Hype :-)

//Ed

[1] http://vocab.org/frbr/


Re: [CODE4LIB] Works API

2010-03-30 Thread Karen Coyle

Quoting Ed Summers :


I realize it's of limited utility compared to yet another web2.0 API,
but I think it would be good to see Works represented somehow in the
RDF Linked Data views...assuming they're not already.
//Ed


Ed, thanks. I'll need you to be a bit more -v on this one: are you  
asking for a an RDF option on the API, or that Works as a whole be  
represented as linked data? The Open Library doesn't present itself as  
linked data, as you know, and although that would be very interesting  
I don't think that's on their production schedule for the near future.


kc



On Tue, Mar 30, 2010 at 1:22 PM, Karen Coyle  wrote:

Open Library now has Works defined, and is looking to develop an API for
their retrieval. It makes obvious sense that when a Work is retrieved via
the API, that the data output would include links to the Editions that link
to that Work. Here are a few possible options:

1) Retrieve Work information (author, title, subjects, possibly reviews,
descriptions, first lines) alone
2) Retrieve Work information + OL identifiers for all related Editions
3) Retrieve Work information + OL identifiers + any other identifiers
related to the Edition (ISBN, OCLC#, LCCN)
4) Retrieve Work information and links to Editions with full text / scans

Well, you can see where I'm going with this. What would be useful?

kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet







--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Works API

2010-03-30 Thread Ed Summers
I realize it's of limited utility compared to yet another web2.0 API,
but I think it would be good to see Works represented somehow in the
RDF Linked Data views...assuming they're not already.
//Ed

On Tue, Mar 30, 2010 at 1:22 PM, Karen Coyle  wrote:
> Open Library now has Works defined, and is looking to develop an API for
> their retrieval. It makes obvious sense that when a Work is retrieved via
> the API, that the data output would include links to the Editions that link
> to that Work. Here are a few possible options:
>
> 1) Retrieve Work information (author, title, subjects, possibly reviews,
> descriptions, first lines) alone
> 2) Retrieve Work information + OL identifiers for all related Editions
> 3) Retrieve Work information + OL identifiers + any other identifiers
> related to the Edition (ISBN, OCLC#, LCCN)
> 4) Retrieve Work information and links to Editions with full text / scans
>
> Well, you can see where I'm going with this. What would be useful?
>
> kc
>
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
>