Re: [CODE4LIB] what's friendlier & less powerful than phpMyAdmin?
Hi All, It ain't free, but there's a lovely client for mysql called navicat (http://www.navicat.com/) that we've been using. And even though I *can* do command line queries, gotta say I love pulling lines between tables to set them up. It's not too expensive and I find that for light to medium weight stuff it's fun and easy to use. -t On Wed, 30 Jul 2008, Eric Lease Morgan wrote: On Jul 30, 2008, at 1:47 PM, Cloutman, David wrote: Perhaps you should put together some MySQL training materials for librarians. A webinar, perhaps. I'd love it if my colleagues had those skills. I don't think there is that much interest, but I could be wrong. There are at least 101 ways enterprise level database skills could be put to work in my library. I'm pretty sick of our core technical solutions being Excel spreadsheets and the occasional Access database. Blech. Tell me about it, and besides, basic SQL is not any more difficult than CCL. SELECT this FROM that WHERE field LIKE "%foo%" Moreover, IMHO, relational databases are the technological bread & butter of librarianship these days. Blissful ignorance does the profession little good. -- Eric Lease Morgan Hesburgh Libraries, University of Notre Dame
Re: [CODE4LIB] creating call number browse
Hi, One approach to the UI might be to use Cooliris (was piclens) and generate a media rss file in call number order. It's limited (to people who have installed cooliris) but it's essentially a coverflow. You can do other things within the browser, but few are going to feel as immediate and tranparent to the user. Again, maybe not for all users, but maybe a cool enhanced version for a subset. Generating that media rss file may get tricky (you need uris to thumbs and "fulls") depending on the API from, and agreements with syndetics. -t On Wed, 17 Sep 2008, Charles Antoine Julien, Mr wrote: I've done some work this. " What I don't know is whether there are any indexing / SQL / query techniques that could be used to browse forward and backword in an index like this." Depending on what you want to do exactly, yes. Look at Querying Ontologies in Relational Database Systems - ►hu-berlin.de [PDF] S Trissl, U Leser - LECTURE NOTES IN COMPUTER SCIENCE, 2005 - Springer If you need more you're looking at CS literature concerning treatment of graphs, directed graphs, cyclical, transitive closure, etc. This can all be done without to much difficulty but as Nate pointed out updating the data is a problem...I've not tackled that part but there is much literature on dynamic graphs and I'm assuming this could also be adequately solved. a decent UI is probably going to be a bigger job Yes, that's the real issue. Could call numbers be placed within a hierarchy? Then display this in an outline view (Windows Explorer) that is also item searchable? Seems to me there is structure in the call numbers that is hidden in current UIs. I also think the actual "Call number" should disappear and replaced by a textual label describing what the numbers mean. Fun stuff to think about... Charles-Antoine -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Emily Lynema Sent: September 17, 2008 11:46 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] creating call number browse Hey all, I would love to tackle the issue of creating a really cool call number browse tool that utilizes book covers, etc. However, I'd like to do this outside of my ILS/OPAC. What I don't know is whether there are any indexing / SQL / query techniques that could be used to browse forward and backword in an index like this. Has anyone else worked on developing a tool like this outside of the OPAC? I guess I would be perfectly happy even if it was something I could build directly on top of the ILS database and its indexes (we use SirsiDynix Unicorn). I wanted to throw a feeler out there before trying to dream up some wild scheme on my own. -emily P.S. The version of BiblioCommons released at Oakville Public Library has a sweet call number browse function accessible from the full record page. I would love to know know how that was accomplished. http://opl.bibliocommons.com/item/show/1413841_mars -- Emily Lynema Systems Librarian for Digital Projects Information Technology, NCSU Libraries 919-513-8031 [EMAIL PROTECTED] No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database: 270.6.21/1674 - Release Date: 17/09/2008 9:33 AM
Re: [CODE4LIB] LOC Authority Data
Socialized medicine? Sure. *We* have authority files! -t On Tue, 23 Sep 2008, David Fiander wrote: One of the most important pages in the print volumes of the Library of Congress Subject Headings (LCSH), is the title page verso, which includes publication and copyright details. The folks at LC very clearly understand US copyright law, since on that page you can see that they claim that the LCSH is copyright LC _outside of the United States of America_. The same probably holds true for the copyright claim on the name authority files. You folks in the United States can do what you will with impunity, but us unwashed masses beyond your shores are likely to get in trouble. Probably the next time we attempt to cross the border. - David On Tue, Sep 23, 2008 at 5:21 PM, Jason Griffey <[EMAIL PROTECTED]> wrote: As I mentioned, they are available from Ibiblio on the link above. The copyright claim is...well...specious at best. But no one really wants to be the one to go to court and prove it. They've been publicly available for more than a year now on the Fred 2.0 site, and they haven't been sued, to my knowledge. Jason On Tue, Sep 23, 2008 at 5:17 PM, Nate Vack <[EMAIL PROTECTED]> wrote: On Tue, Sep 23, 2008 at 3:49 PM, Bryan Baldus <[EMAIL PROTECTED]> wrote: One way (as you likely know) (official, expensive) is via The Library of Congress Cataloging Distribution Service: Huh. They claim copyright of these records. I'd somehow thought: 1: The federal government can't hold copyrights 2: As purely factual data, catalog records are conceptually uncopyrightable Anyone who knows more about this than I do know if they're *really* copyrighted, or if it's more of a "we're gonna try and say they're copyrighted and hope no one ignores us"? Curious, -Nate
Re: [CODE4LIB] creating call number browse
Owen, Unless I'm misunderstanding, what's being asked for is a visualization tool for the *classification*. Faceted browsing by subject is dandy, but is not at all the same thing (though arguments can be made that the lines are blurring). Books that sit next to each other in a classification (DC or LC, or whatever) may not share a majority of subject terms. That collocation via classification is yet another (and occasionally more useful) way of saying that this item is like that item. One that is not necessarily trapped in any other way than call number. -t On Tue, 30 Sep 2008, Stephens, Owen wrote: I'd second Steve's comments - replicating an inherently limited physical browse system seems an odd thing to do in the virtual world. I would have thought that the 'faceted browse' function we are now seeing appearing in library systems (of course, the Endeca implementation is a leader here) is potentially the virtual equivalent of 'browsing the shelves', but hopefully without the limitations that the physical environment brings? Is it the UI rather than the functionality that is lacking here? Perhaps we need to look more carefully at the 'browsing' experience. Thinking about examples outside the library world, I personally like the 'coverflow' browse in iTunes, but I'm able to sort tracks by several criteria and still see a coverflow view. I have to admit that in general I prefer the 'album' order when using coverflow, because otherwise it doesn't make sense (to me that is). It would be interesting to look at what an 'artistflow' might look like, or a 'genreflow'. However, as far as I know I can't actually replicate the experience that I would have with my (now in boxes somewhere) physical CD collection - why was divided by genre, then sorted by artist surname (ok, I admit it, I'm a librarian through and through) Perhaps a better understanding of the 'browse' experience is needed? Some questions - when we browse: When and why do people browse rather than search? How do people make decisions about useful items as they browse? Browsing stacks suggests that items have been 'ordered' - is there something about this that appeals? Does it convey 'authority' in some way that the 'any order you want' doesn't? Owen Owen Stephens Assistant Director: eStrategy and Information Resources Central Library Imperial College London South Kensington Campus London SW7 2AZ t: +44 (0)20 7594 8829 e: [EMAIL PROTECTED] -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Steve Meyer Sent: 29 September 2008 21:45 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] creating call number browse one counter argument that i would make to this is that we consistently hear from faculty that they absolutely adore browsing the stacks--there is something that they have learned to love about the experience regardless of whether they understand that it is made possible by the work of catalogers assigning call numbers and then using them for ordering the stacks. at uw-madison we have a faculty lecture series where we invite professors to talk about their use of library materials and their research and one historian said outright, the one thing that is missing in the online environment is the experience of browsing the stacks. he seemed to understand that with all the mass digitization efforts, we could be on the edge of accomplishing it. that said, i agree that we should do what you say also, just that we should not throw the baby out w/ the bath water. if faculty somehow understand that browsing the stacks is a good experience then we can use it as a metaphor in the online environment. in an unofficial project i have experimented w/ primitive interface tests using both subject heading 'more like this' and a link to a stack browse based on a call number sort: http://j2ee-dev.library.wisc.edu/sanecat/item.html?resourceId=951506 (please, ignore the sloppy import problems, i just didn't care that much for the interface test) as for the original question, this has about a million records and 900,000 w/ item numbers and a simple btree index in the database sorts at an acceptable speed for a development test. -sm Walker, David wrote: a decent UI is probably going to be a bigger job I've always felt that the call number browse was a really useful option, but the most disastrously implemented feature in most ILS catalog interfaces. I think the problem is that we're focusing on the task -- browsing the shelf -- as opposed to the *goal*, which is, I think, simply to show users books that are related to the one they are looking at. If you treat it like that (here are books that are related to this book) and dispense with the notion of call numbers and shelves in the interface (even if what you're doing behind the scenes is in fact a call number browse) then I think you can arrive at a much simpler and straight-forward UI for users. I would treat it little different tha
[CODE4LIB] amazon s3?
Hi Folks, Anybody doing mass storage for their library/consortium on amazon s3? Anybody rejected it as an idea? Willing to share? Please do. Tim +++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill [EMAIL PROTECTED] 919-962-1288 +++
[CODE4LIB] OCA API
Hi Folks, The University Library at UNC-Chapel Hill has created an OCA API. We have harvested (and continue to harvest) standard bibliographic identifiers and link them to OCA identifiers. The API is deliberately modeled after Google's for ease of implementation. Here is a subjec search in UNC's catalog for "North Carolina" limited to the 19th century. http://search.lib.unc.edu/search?Ntk=Subject&Ne=2+200043+206475+206590+11&N=206596&Ntt=north%20carolina You will see links to OCA as well as Google. (The full record has an OCA icon if you want to look.) Right now we are only banging against the API with OCLC numbers, but ISSNs, ISBNs and LC numbers are in there. We are looking for a couple of partners to work with to take use beyond our local OPAC. You would be ideal if: you are interested, you already use the Google API, you have a significant corpus of pre-1923 works in your catalog. As the Google API is familiar to many of you, it would be easy to figure out how to implement UNC's without working with us. Please hold off until we are ready to open it up all the way? This is why we've not yet put up documentation. Caveats and other notes (feel free to skip): *We realize that Open Library has an API, but we had already gone a goodly distance and we are finding relatively meaningful differences in coverage and utility. *We collect the data from OCA as it comes in (the data should be up to date within a half hour or so)...but they occasionally have need to correct/remove works. Right now we are actively working on this issue, but do not yet have a great mechanism to pull deletes and update corrected identifiers. *The data is only as good as the data we harvest. There are a small number of bad links. See above. *Excerpt from a developer on UNC's holdings (we are an OCA Scribe site): ...I decided to run the same script against the [production] database as well to see how much the matching is changing over time with continual updates: - 429311 OCLC's tested - 72350 matched - 2599 of the matches were scanned by UNC So that's 808 new matches since the end of March, not too bad for one month. Effectively we are now linking to ~72 K digitized works that we were not previously able to provide (though as Google digitized books are being added to OCA, there is significant overlap). *When we do open it up it is the API we are offering, we are not prepared to be crawled for data. If you want the data, get in touch and we will see what we can do. If you are interested in being an early partner, please drop me a line and I will be in touch. Tim +++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill sh...@ils.unc.edu 919-962-1288 +++
Re: [CODE4LIB] Setting users google scholar settings
FWIW, this is what we do as well... for the few people who think "I want to use Google Scholar, I'll go to the UNC Library website." Still, folks have learned that to get easy access (after that first hoop) it's worth it. -t +++++++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill sh...@ils.unc.edu 919-962-1288 +++ On Wed, 15 Jul 2009, John Wynstra wrote: We have tested routing off-campus users through our local proxy (WAM) when they link to Google Scholar from our library website . Not recommending it, just saying it works since the proxy is in the correct IP range. It has the benefit of leaving folks authenticated to the documents they will eventually click to--assuming you are using a rewrite proxy. It doesn't actually set preferences, so if users happen to go directly to Google Scholar they may be confused by the missing openURL links. At first it seemed like a crazy idea, but then the thought was that users are using Google Scholar as an extension of our website if they click on our link. Jonathan Rochkind wrote: When I've experimented with this, I haven't been able to figure out a way to set my institution in their preferences, without _removing_ any existing institutions they may have already chosen. I don't want to over-write their existing preferences, if any, I just want to add my institution to them! If anyone figures out a way to do this, I'm interested too. I actually didn't know about the &inst=X feature allowing you to set the institution even just for the current session, that's awfully helpful and better than nothing, thanks! Jonathan Godmar Back wrote: It used to be you could just GET the corresponding form, e.g.: http://scholar.google.com/scholar_setprefs?num=10&instq=&inst=sfx-f7e167eec5dde9063b5a8770ec3aaba7&q=einstein&inststart=0&submit=Save+Preferences - Godmar On Wed, Jul 15, 2009 at 3:17 AM, Stuart Yeates wrote: It's possible to send users to google scholar using URLs such as: http://scholar.google.co.nz/schhp?hl=en&inst=8862113006238551395 where the institution is obtained using the standard preference setting mechanism. Has anyone found a way of persisting this setting in the users browser, so when they start a new session this is the default? Yes, I know they can go "Scholar Preferences" -> "Save" to persist it, but I'm looking for a more automated way of doing it... cheers stuart -- <><><><><><><><><><><><><><><><><><><> John Wynstra Library Information Systems Specialist Rod Library University of Northern Iowa Cedar Falls, IA 50613 wyns...@uni.edu (319)273-6399 <><><><><><><><><><><><><><><><><><><>
[CODE4LIB] find in page, diacritics, etc
Hi Folks, Looking for help/perspectives. Anyone got any clever solutions for allowing folks to find a word with diacritics in a rendered web page regardless of whether or not the user tries with or without diacritics. In indexes this is usually solved by indexing the word with and without, so the user gets what they want regardless of how they search. Thanks in advance for any ideas/enlightenment, Tim
Re: [CODE4LIB] find in page, diacritics, etc
Are you are referring to a "find in page", where a user presses CTRL-F in the browser? Yes, sorry to be unclear. If so, it will depend on the browser. Google Chrome 2.0 will find matches regardless of the diacritics (i.e. user can type "placa" and it matches "pla�a", and vice versa). This doesn't seem to work in Firefox 3.0.13 or IE8. Exactly, and FF and IE are the most common browsers we're seeing. I was wondering if someone (I know this sounds crazy) has explored the idea of marking up the non-diacritic inline version of the word in a span styled in such a way as to make it findable but not intrusive. -t Keith On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearer wrote: Hi Folks, Looking for help/perspectives. Anyone got any clever solutions for allowing folks to find a word with diacritics in a rendered web page regardless of whether or not the user tries with or without diacritics. In indexes this is usually solved by indexing the word with and without, so the user gets what they want regardless of how they search. Thanks in advance for any ideas/enlightenment, Tim
Re: [CODE4LIB] Code4Lib 2011 Proposals
A big old thank you to OCLC for the support! It is deeply appreciated. -t On 3/3/10 10:34 AM, Roy Tennant wrote: On 3/3/10 3/3/10 € 7:22 AM, "Ross Singer" wrote: On Wed, Mar 3, 2010 at 9:55 AM, Paul Joseph wrote: No need to be concerned about the vendors: they're the same suspects who sponsored C4L10. Just to be clear on this -- the same suspects actually shelled out far less for C4L10 than they had in the past. Just to clarify the clarification, OCLC continued our support at the highest level this year, as we have since the conference began. Roy
[CODE4LIB] PREMIS question
Hi folks, Ignoring, for the moment, the utility of doing so...has anyone written (or does anyone know of) an xsl tranform from PREMIS to HTML? I'm finding PREMIS transforms, but nothing that produces output for consumption in a webpage. The idea is to let folks more easily parse the PREMIS information for objects in an IR. That is to say, not make them parse the xml directly. I do have a request into LC but no response yet. Even pointers to a suspect/contact would be welcome. [As an aside, if you've not attended the conference, I just went to my first and it was pound for pound the best one I've attended. So you need to find a way to go.] Thanks, Tim
Re: [CODE4LIB] newbie
Warning: regular expressions can become addictive. And, for some of us batch manipulation of large text sets can provide a whole lot of satisfaction. Finally, I never would have put the strings "PHP" and "sexiness" in a sentence together (though I guess I just did). -t On 3/25/10 4:46 PM, Ethan Gruber wrote: If one's interests were digital library data curation and migration, the most useful things to know would be XSLT, bash scripting, Perl, and knowledge of regular expressions. I've done a lot of migration with bash scripting, regular expressions, and XSLT alone, without the need for Perl, but Perl or SAX would be useful in migrating non-XML or invalid XML/SGML. I used simple, iterative scripts to migrate thousands of TEI files from TEI Lite to a more consistent schema. I've done similar things to go from a 500 page HTML thumbnail gallery of manuscripts into an EAD guide. Roy is right in stating there is more to programming than web pages. A lot of dirty work behind the scenes in libraries is done without the sexiness of PHP or Ruby on Rails applications. Ethan On Thu, Mar 25, 2010 at 4:36 PM, Genny Engelwrote: Agreed -- I coded up many nice SQL injection vulnerabilities before I ever learned PHP. As for Perl, anyone remember the notorious formmail.cgi from Matt's Script Archive? For **web** programming specifically, it's critically important for newbies to get a grounding in security issues, regardless of the language being used. Also, in usability issues, accessibility issues, etc. for anything that's actually going to get used by the public. But really, that mainly applies if you're going to be developing a whole app complete with web-accessible front end. If your interests aren't particularly in web development, you have a whole other set of potential issues to learn about, and I'm probably ignorant of most of them. My first language was C, which according to langpop.com [1] is still the most popular language around! If you don't want to get bogged down in the web security issues, etc., then you might lean toward learning a general-purpose language like C or Java, rather than one designed for a specific purpose as PHP is for web development. [1] http://www.langpop.com/ yitzchak.schaf...@gmx.com 03/25/10 07:56AM>>> On 3/24/2010 17:43, Joe Hourcle wrote: I know there's a lot of stuff written in it, but *please* don't recommend PHP to beginners. Yes, you can get a lot of stuff done with it, but I've had way too many incidents where newbie coders didn't check their inputs, and we've had to clean up after them. Another way of looking at this: part of learning a language is learning its vulnerabilities and how to deal with them. And how to avoid security holes in web code in general.
Re: [CODE4LIB] parse an OAI-PHM response
Depending on how locked down the php.ini file is (lots of good reasons to do this) you might look into curl. http://curl.haxx.se/ Curl can work in php. http://us.php.net/curl It talks lots of protocols (including https, which is how I got on board), including gopher for any killer apps you have planned. -t +++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill [EMAIL PROTECTED] 919-962-1288 +++ On Fri, 27 Jul 2007, John McGrath wrote: You could use either the PEAR HTTP_Request package, or the built-in fopen/fread commands, which can make http calls in addition to opening local files. The easiest way, though, in my opinion, is file_get_contents, which automatically dumps the response into a String object. And it's fast, apparently: http://us.php.net/manual/en/function.file-get-contents.php http://pear.php.net/package/HTTP_Request http://us.php.net/fopen http://us.php.net/manual/en/function.fread.php Best, John On Jul 27, 2007, at 9:31 PM, Andrew Hankinson wrote: Hi folks, I'm wanting to implement a PHP parser for an OAI-PMH response from our Dspace installation. I'm a bit stuck on one point: how do I get the PHP script to send a request to the OAI-PMH server, and get the XML response in return so I can then parse it? Any thoughts or pointers would be appreciated! Andrew
[CODE4LIB] library find and bibliographic citation export?
Hi, I'm interested to know if anyone working with LibraryFind has begun work to create a tool for bibliographic export to citation management tools like refworks, etc. Thanks! Tim +++++++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill [EMAIL PROTECTED] 919-962-1288 +++
Re: [CODE4LIB] OpenContent SRU search of OAISter, weirdness?
Dumb question, no experience with the syntax, but should there be a wildcard -or- use of something other than equals (sorry, if I'm way off base, but most query syntax I use requires "like" or wildcarding). -t On Thu, 25 Oct 2007, Jonathan Rochkind wrote: That was a typo in my problem report, I'm afraid. I was actually searching Jansen, and that still exhibits the problems I mentioned. I've also moved this conversation to indexdata's own list for this service, at http://lists.indexdata.dk/cgi-bin/mailman/listinfo/oclist (thanks to Jason Ronallo for bringing that list to my attention). Jonathan Joshua Santelli wrote: You're not getting any hits because the name is not Jensen, it's Jansen. I'm not sure where Jensen came from but the OAIster indexes here have Jansen. josh On 10/24/07 6:04 PM, "Jonathan Rochkind" <[EMAIL PROTECTED]> wrote: I'm messing with SRU search of http://indexdata.dk/opencontent/oaister I have some behavior I can't explain. There's this article that is in OAISter, called ""Resurrection and Appropriation: Reputational Trajectories, Memory Work, and the Political Use of Historical Figures" by "Robert S. Jensen". I do an SRU search with query: dc.title = "Resurrection and Appropriation: Reputational Trajectories, Memory Work, and the Political Use of Historical Figures" And I find the record, one hit. Good. You too could try, and see what the DC returned looks like. It does have a dc:creator of Robert S. Jensen. But I try a search that includes the author. dc.title = "Resurrection and Appropriation: Reputational Trajectories, Memory Work, and the Political Use of Historical Figures" and dc.creator = "Jensen" 0 hits. and cql.serverChoice = "Jensen"=> 0 hits Same using full name "Robert S. Jensen" (just as it appears in the record), with cql.serverChoice or dc.creator. Is this just a bad index, or is something else going on, or what? As I try sample searches on title and author, I keep running into false negatives for things that ought to be in the OAISter index. Sometimes I can figure out why (title not quite right; title has curly quotes, index does not, etc.), but in this case I have no idea. But the net result is it's hard to actually find your known item this way, via an automated search on known item metadata. Jonathan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] Library Software Manifesto
Hi Roy, Not sure how to make this succinct enough to be elegant (i.e. a bullet point) but... We have a large enough staff to "break into" software when necessary. A typical scenario is: We need a feature added (or bug removed) to make workflow tenable We request the feature (bug fix) We hear "ok, thanks for mentioning it" or "known problem" but have absolutely no idea the extent of the need, where it will be prioritized We wait a long while, give up, and develop a work around Two weeks after our success the company releases the feature/fix Essentially, I'd like to know the extent of the issue. If 90% of their customers have reported/requested I would like to know this. To avoid doing the devlopment locally to replicate something that will be coming (soon?). Many times the local devlopment makes life possible for a year or two, so it isn't fruitless. It's only when it turns out the whole world has been screaming, but the company doesn't want to acknowledge where it is on the list o'priorities. Maybe: - I have a right to access a prioritized list of what the developers are working toward. (except more elegantly phrased) Tim Roy Tennant wrote: I have a presentation coming up and I'm considering doing what I'm calling a "Library Software Manifesto". Some of the following may not be completely understandable on the face of it, and I would be explaining the meaning during the presentation, but this is what I have so far and I'd be interested in other ideas this group has or comments on this. Thanks, Roy Consumer Rights - I have a right to use what I buy - I have a right to the API if I've bought the product - I have a right to accurate, complete documentation - I have a right to my data - I have a right to not have simple things needlessly complicated Consumer Responsibilities - I have a responsibility to communicate my needs clearly and specifically - I have a responsibility to report reproducible bugs in a way as to facilitate reproducing it - I have a responsibility to report irreproducible bugs with as much detail as I can provide - I have a responsibility to request new features responsibly - I have a responsibility to view any adjustments to default settings critically
[CODE4LIB] oca api?
Hi Folks, I'm looking into tapping the texts in the Open Content Alliance. A few questions... As near as I can tell, they don't expose (perhaps even store?) any common unique identifiers (oclc number, issn, isbn, loc number). We're a contributer so I can use curl to grab our records via http (and regexp my way to our local catalog identifiers, which they do store/expose). I've played a bit with the z39.50 interface at indexdata (http://www.indexdata.dk/opencontent/), but I'm not confident about the content behind it. I get very limited results, for instance I can't find any UNC records and we're fairly new to the game. Again, I'm looking for unique identifiers in what I can get back and it's slim pickings. Anyone cracked this nut? Got any life lessons for me? Thanks! Tim +++++++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill [EMAIL PROTECTED] 919-962-1288 +++
Re: [CODE4LIB] oca api?
Yup, Chris' email was exactly what I was hoping for. Now if there were a nice way to pre-screen for records that don't have empty (isbn|issn|oclc#) without all the work of looking per record (and the overhead for the server, and the overhead if more than one organization starts to do this). I guess I want to search for uniqueID != NULL and only get their unique id back, and script from there. Still and all, this now seems a very doable thing. Chris, many thanks! -t On Mon, 25 Feb 2008, Tennant,Roy wrote: Well, from where Chris left off it would be fairly easy to check for a file in the directory with an "marc.xml" filename extension, then XSLT for: 39004822 If such exists, and then you'll have the ISBN. To sweeten it further, send that into xISBN or ThingISBN and get other ISBNs for the same work. This seems completely scriptable to me. Perhaps someone at c4l will have it done before the conference is over. And Tim, the example above is one that's in your catalog. Roy -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Chris Freeland Sent: Monday, February 25, 2008 11:51 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] oca api? Steve & Tim, I'm the tech director for the Biodiversity Heritage Library (BHL), which is a consortium of 10 natural history libraries who have partnered with Internet Archive (IA)/OCA for scanning our collections. We've just launched our revamped portal, complete with more than 7,500 books & 2.8 million pages scanned by IA & other digitization partners, at: http://www.biodiversitylibrary.org To build this portal we ingest metadata from IA. We found their OAI interface to pull scanned items inconsistently based on date of scanning, so we switched to using their custom query interface. Here's an example of a query we fire off: http://www.archive.org/services/search.php?query=collection:(biodiversit y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH OI%20Library)&limit=10&submit=submit This is returning scanned items from the "biodiversity" collection, updated between 10/31/2007 - 11/30/2007, restricted to one of our contributing libraries (MBLWHOI Library), and limited to 10 results. The results are styled in the browser; view source to see the good stuff. We use this list to grab the identifiers we've yet to ingest. Some background: When a book is scanned through IA/OCA scanning, they create their own unique identifier (like "annalesacademiae21univ") and grab a MARC record from the contributing library's catalog. All of the scanned files, derivatives, and metadata files are stored on IA's clusters in a directory named with the identifier. Steve mentioned using their /details/ directive, then sniffing the page to get the cluster location and the files for downloading. An easier method is to use their /download/ directive, as in: http://www.archive.org/download/ID$, or in the example above: http://www.archive.org/download/annalesacademiae21univ That automatically does a lookup on the cluster, which means you don't have to scrape info off pages. You can also address any files within that directory, as in: http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2 1univ_marc.xml The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for these scanned books is to grab them out of the MARC record. So the long-winded answer to your question, Tim, is no, there's no simple way to crossref what IA has scanned with your catalog - THAT I KNOW OF. Big caveat on that last part. Happy to help with any other questions I can, Chris Freeland -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Steve Toub Sent: Sunday, February 24, 2008 11:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] oca api? --- Tim Shearer <[EMAIL PROTECTED]> wrote: Hi Folks, I'm looking into tapping the texts in the Open Content Alliance. A few questions... As near as I can tell, they don't expose (perhaps even store?) any common unique identifiers (oclc number, issn, isbn, loc number). I poked around in this world a few months ago in my previous job at California Digital Library, also an OCA partner. The unique key seems to be text string identifier (one that seems to be completely different from the text string identifier in Open Library). Apparently there was talk at the last partner meeting about moving to ISBNs: http://dilettantes.code4lib.org/2007/10/22/tales-from-the-open-content-a lliance/ To obtain identifiers in bulk, I think the recommended approach is the OAI-PMH interface, which seems more reliable in recent months: http://www.archive.org/services/oai.php?verb=Identify http://www.archive.org/services/oai.php?verb=ListIdentifiers&metadataPre fix=oai_dc&set=collection:cdl etc.
Re: [CODE4LIB] oca api?
, and limited to 10 results. The results are styled in the browser; view source to see the good stuff. We use this list to grab the identifiers we've yet to ingest. Some background: When a book is scanned through IA/OCA scanning, they create their own unique identifier (like "annalesacademiae21univ") and grab a MARC record from the contributing library's catalog. All of the scanned files, derivatives, and metadata files are stored on IA's clusters in a directory named with the identifier. Steve mentioned using their /details/ directive, then sniffing the page to get the cluster location and the files for downloading. An easier method is to use their /download/ directive, as in: http://www.archive.org/download/ID$, or in the example above: http://www.archive.org/download/annalesacademiae21univ That automatically does a lookup on the cluster, which means you don't have to scrape info off pages. You can also address any files within that directory, as in: http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2 1univ_marc.xml The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for these scanned books is to grab them out of the MARC record. So the long-winded answer to your question, Tim, is no, there's no simple way to crossref what IA has scanned with your catalog - THAT I KNOW OF. Big caveat on that last part. Happy to help with any other questions I can, Chris Freeland -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Steve Toub Sent: Sunday, February 24, 2008 11:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] oca api? --- Tim Shearer <[EMAIL PROTECTED]> wrote: Hi Folks, I'm looking into tapping the texts in the Open Content Alliance. A few questions... As near as I can tell, they don't expose (perhaps even store?) any common unique identifiers (oclc number, issn, isbn, loc number). I poked around in this world a few months ago in my previous job at California Digital Library, also an OCA partner. The unique key seems to be text string identifier (one that seems to be completely different from the text string identifier in Open Library). Apparently there was talk at the last partner meeting about moving to ISBNs: http://dilettantes.code4lib.org/2007/10/22/tales-from-the-open-content-a lliance/ To obtain identifiers in bulk, I think the recommended approach is the OAI-PMH interface, which seems more reliable in recent months: http://www.archive.org/services/oai.php?verb=Identify http://www.archive.org/services/oai.php?verb=ListIdentifiers&metadataPre fix=oai_dc&set=collection:cdl etc. Additional instructions if you want to grab the content files. From any book's metadata page (e.g., http://www.archive.org/details/chemicallecturee00newtrich) click through on the "Usage Rights: See Terms" link; the rights are on a pane on the left-hand side. Once you know the identifier, you can grab the content files, using this syntax: http://www.archive.org/details/$ID Like so: http://www.archive.org/details/chemicallecturee00newtrich And then sniff the page to find the FTP link: ftp://ia340915.us.archive.org/2/items/chemicallecturee00newtrich But I think they prefer to use HTTP for these, not the FTP, so switch this to: http://ia340915.us.archive.org/2/items/chemicallecturee00newtrich Hope this helps! --SET We're a contributer so I can use curl to grab our records via http (and regexp my way to our local catalog identifiers, which they do store/expose). I've played a bit with the z39.50 interface at indexdata (http://www.indexdata.dk/opencontent/), but I'm not confident about the content behind it. I get very limited results, for instance I can't find any UNC records and we're fairly new to the game. Again, I'm looking for unique identifiers in what I can get back and it's slim pickings. Anyone cracked this nut? Got any life lessons for me? Thanks! Tim +++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill [EMAIL PROTECTED] 919-962-1288 +++ -- -- Sebastian Hammer, Index Data [EMAIL PROTECTED] www.indexdata.com Ph: (603) 209-6853 Fax: (866) 383-4485
[CODE4LIB] musing on oca apiRe: [CODE4LIB] oca api?
ng, so we switched to using their custom query interface. Here's an example of a query we fire off: http://www.archive.org/services/search.php?query=collection:(biodiversit y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH OI%20Library)&limit=10&submit=submit This is returning scanned items from the "biodiversity" collection, updated between 10/31/2007 - 11/30/2007, restricted to one of our contributing libraries (MBLWHOI Library), and limited to 10 results. The results are styled in the browser; view source to see the good stuff. We use this list to grab the identifiers we've yet to ingest. Some background: When a book is scanned through IA/OCA scanning, they create their own unique identifier (like "annalesacademiae21univ") and grab a MARC record from the contributing library's catalog. All of the scanned files, derivatives, and metadata files are stored on IA's clusters in a directory named with the identifier. Steve mentioned using their /details/ directive, then sniffing the page to get the cluster location and the files for downloading. An easier method is to use their /download/ directive, as in: http://www.archive.org/download/ID$, or in the example above: http://www.archive.org/download/annalesacademiae21univ That automatically does a lookup on the cluster, which means you don't have to scrape info off pages. You can also address any files within that directory, as in: http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2 1univ_marc.xml The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for these scanned books is to grab them out of the MARC record. So the long-winded answer to your question, Tim, is no, there's no simple way to crossref what IA has scanned with your catalog - THAT I KNOW OF. Big caveat on that last part. Happy to help with any other questions I can, Chris Freeland -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Steve Toub Sent: Sunday, February 24, 2008 11:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] oca api? --- Tim Shearer <[EMAIL PROTECTED]> wrote: Hi Folks, I'm looking into tapping the texts in the Open Content Alliance. A few questions... As near as I can tell, they don't expose (perhaps even store?) any common unique identifiers (oclc number, issn, isbn, loc number). I poked around in this world a few months ago in my previous job at California Digital Library, also an OCA partner. The unique key seems to be text string identifier (one that seems to be completely different from the text string identifier in Open Library). Apparently there was talk at the last partner meeting about moving to ISBNs: http://dilettantes.code4lib.org/2007/10/22/tales-from-the-open-content-a lliance/ To obtain identifiers in bulk, I think the recommended approach is the OAI-PMH interface, which seems more reliable in recent months: http://www.archive.org/services/oai.php?verb=Identify http://www.archive.org/services/oai.php?verb=ListIdentifiers&metadataPre fix=oai_dc&set=collection:cdl etc. Additional instructions if you want to grab the content files. From any book's metadata page (e.g., http://www.archive.org/details/chemicallecturee00newtrich) click through on the "Usage Rights: See Terms" link; the rights are on a pane on the left-hand side. Once you know the identifier, you can grab the content files, using this syntax: http://www.archive.org/details/$ID Like so: http://www.archive.org/details/chemicallecturee00newtrich And then sniff the page to find the FTP link: ftp://ia340915.us.archive.org/2/items/chemicallecturee00newtrich But I think they prefer to use HTTP for these, not the FTP, so switch this to: http://ia340915.us.archive.org/2/items/chemicallecturee00newtrich Hope this helps! --SET We're a contributer so I can use curl to grab our records via http (and regexp my way to our local catalog identifiers, which they do store/expose). I've played a bit with the z39.50 interface at indexdata (http://www.indexdata.dk/opencontent/), but I'm not confident about the content behind it. I get very limited results, for instance I can't find any UNC records and we're fairly new to the game. Again, I'm looking for unique identifiers in what I can get back and it's slim pickings. Anyone cracked this nut? Got any life lessons for me? Thanks! Tim +++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill [EMAIL PROTECTED] 919-962-1288 +++
[CODE4LIB] more musing/clarification on oca apiRe: [CODE4LIB] oca api?
ld you like it? I've not built a queryable webservice and am going on record as ignorant. Is there a query language I should lean toward? A return data structure that I should adopt? All this stems from the my belief that what I can see of the architecture indicates a split between what the participating library sends and what the oca system uses. It appears that their index/record of record simply ignores all those hooks people have been adding to bib records. If both parts were wrapped and offered up with a Solr interface I could get on with putting links into my catalog. Still, they make both their record (with their identifier) available, and my record (with the rest) available. So, like I said in an earlier post, I'm glad for the opportunity to be frustrated. Whew. If I'd been hacking instead of writing I'd have something to show and y'all would be less bored. Thanks! -t Like Karen and Bess and others have said, I recommend that you coordinate this with the Open Library project. At the meeting last Friday, it did sound like they would be interested in providing identifier disambiguation types of service - give them an ISBN, and they'll give you the records associated with it. Also, there was discussion about building an Open Librar yAPI (to enable some cool integration with wikipedia), and I suggested a that libraries using an API would want the search results to include information about whether the title has a digitized copy. So I would hope the service that you're envisioning is something that would be provided by an Open Library API (but we don't know when that might come about). As OCA moves forward, folks may well be digitizing identical books. So there may not be a one to one relationship between unique catalog identifier, unique oca identifier, and isbn/lccn/oclc number. -emily -- Date:Thu, 6 Mar 2008 08:47:04 -0500 From:Tim Shearer <[EMAIL PROTECTED]> Subject: musing on oca apiRe: [CODE4LIB] oca api? Howdy folks, I've been playing and thinking. I'd like to have what amounts to a unique identifier index to oca digitized texts. I want to be able to pull all the records that have oclc numbers, issns, isbns, etc. I want it to be lightweight, fast, searchable. Would anyone else want/use such a thing? I'm thinking about building something like this. If I do, it would be ideal if wouldn't be a duplication of effort, so anyone got this in the works? And if it would meet the needs of others. My basic notion is to crawl the site (starting with "americana", the American Libraries. Pull the oca unique identifier (e.g. northcarolinayea1910rale) and associate it with unique identifiers (oclc numbers, issns, isbns, lc numbers) contributing institution's alias and unique catalog identifier upload date That's all I was thinking of. Then there's what you might be able to do with it: Give me all the oca unique identifiers that have oclc numbers Give me all the oca unique identifiers with isbns that were uploaded between x and y date Give me the oca unique identifier for this oclc number Planning to do: keep crawling it and keep it up to date. Things I wasn't planning to do: worry about other unique ids (you'd have to go to xISBN or ThingISBN yourself) worry about storing anything else from oca. It would be good for being able to add an 856 to matches in your catalog. It would not be good for grabbing all marc records for all of oca. Anyhow, is this duplication of effort? Would you like something like this? What else would you like it to do (keeping in mind this is an unfunded pet project)? How would you want to talk to it? I was thinking of a web service, but hadn't thought too much about how to query it or how I'd deliver results. Of course I'm being an idiot and trying out new tools at the same time (python to see what the buzz is all about, sqlite just to learn it (it may not work out)). Thoughts? Vicious criticism? -t -- Date:Thu, 6 Mar 2008 11:05:41 -0500 From:Jodi Schneider <[EMAIL PROTECTED]> Subject: Re: musing on oca apiRe: [CODE4LIB] oca api? Great idea, Tim! The open library tech list that Bess mentions is [EMAIL PROTECTED], described at http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech -Jodi Jodi Schneider Science Library Specialist Amherst College 413-542-2076 -- Date:Thu, 6 Mar 2008 08:32:43 -0800 From:Karen Coyle <[EMAIL PROTECTED]> Subject: Re: musing on oca apiRe: [CODE4LIB] oca api? We talked about something like this at the Open Library meeting last Friday. The ol list is [EMAIL PROTECTED] (join at http://mail.archive.org/cgi-bin/mailman/listinfo/ol-lib). I thi
Re: [CODE4LIB] dict protocol
Hi Eric, Given the likely need to map back from an alternate name (string search in the definition?) to the auth name (maybe the most common use for such a service?), I think this route might be on the inefficient side. I've been wondering about names as handles, with a crossref-like middleman piece. But not doing anything about such ideas. -t On Mon, 31 Mar 2008, Eric Lease Morgan wrote: Over the weekend I had fun with the DICT protocol, a DICT server, a DICT client, and the creation of dictionaries for the afore mentioned. The DICT protocol seems to be a simple client/server protocol for searching remote content and returning "definitions" of the query. [1] I was initially drawn to the protocol for its content. Specifically, I wanted a dictionary because I thought it would be useful in a "next generation" library catalog application. The server was trivial to install because it is available via yum. Since it is protocol there are a number of clients and libraries available. There's also bunches o' data to be had, albeit a bit dated. Some of it includes: 1913 dictionary, version 2.0 of WordNet, the CIA World Fact Book (2000), Moby's Thesaurus, a gazetteer, and quite a number of English to other dictionaries. What's interesting is the DICT protocol data is not limited to "dictionaries" as the Fact Book exemplifies. The data really only has two fields: headword (key), and note (definition). After thinking about it, I thought authority lists would be a pretty good candidate for DICT. The headword would be the term, and the definition would be the See From and See Also listings. Off on an adventure, I downloaded subject authorities from FRED. [2] I used a shell script to loop through my data (subjects2dictd, attached) which employed XSLT to parse the MARCXML (subjects2dict.xsl, attached) and then ran various dict* utilities. The end result is a "dictionary" query-able with your favorite DICT client. From a Linux shell, try: dict -h 208.81.177.118 -d subjects -s substring blues While I think this is pretty kewl, I wonder whether or not DICT is the correct approach. Maybe I should use a more robust, full-text indexer for this problem? After all, DICT servers only look at the headword when searching, not the definitions. On the other hand DICT was *pretty* easy to get up an running, and authority lists are a type of dictionary. [1] http://www.dict.org [2] http://www.ibiblio.org/fred2.0/authorities/ -- Eric Lease Morgan University Libraries of Notre Dame subjects2dictd Description: Binary data subjects2dict.xsl Description: Binary data
code4lib@listserv.nd.edu
So now I have to compile my jokes? -t On Thu, 3 Apr 2008, Ryan Ordway wrote: #include main(t,_,a) char *a; { return!0 ..- .-.. .-.. .. .. -- --. --- .. -. --. - --- ... .- -.-- .- -... --- ..- - - .. ... - .-. . .- -.. .. ... - .- - -. --- -. . --- ..-. -.-- --- ..- ... ..- ..-. ..-. . .-. ..-. .-. --- -- .-. -- .. - . .-- .- -.-- .. -.. --- .-- . -. .. ..- ... . -- -.-- .--. .-. . ..-. . .-. .-. . -.. .. -. .--. ..- - -.. . ...- .. -.-. . .-.-.- .-.-.- .-.-.- -- -- .--- .- ..-. On 4/3/08 6:51 AM, "Walter Lewis" <[EMAIL PROTECTED]> wrote: Sebastian Hammer wrote: A true hacker has no need for these crude tools. He waits for cosmic radiation to pummel the magnetic patterns on his drive into a pleasing and functional sequence of bits. Alas, having been doing this (along with my partners, the four Yorkshiremen) since the Stone Age ... We used to arrange pebbles in the middle of road into the relevant patterns (we *dreamed* of being able to afford the wire for an abacus). Passing carts would then help "crunch" the numbers. Walter for whom graph paper, templates, pencils, 80 column punchcards and IBM Assembler were formative experiences === Jeremy Frumkin Head, Emerging Technologies and Services 121 The Valley Library, Oregon State University Corvallis OR 97331-4501 [EMAIL PROTECTED] 541.602.4905 541.737.3453 (Fax) === " Without ambition one starts nothing. Without work one finishes nothing. " - Emerson -- Ryan Ordway E-mail: [EMAIL PROTECTED] Unix Systems Administrator [EMAIL PROTECTED] OSU Libraries, Corvallis, OR 97331Office: Valley Library #4657