Re: [CODE4LIB] LCSH, Bisac, facets, hierarchy?
The University of Michigan maintains what we call “High Level Browse” — a mapping of LC/Dewey call numbers to a limited hierarchy, based loosely around academic departments (at least at the time it started). It’s still maintained, and may prove generally useful as well. The HLB hierarchy <http://www.lib.umich.edu/browse> gives you an idea of what it is, and you can download and XML dump of the categories and their associated call number ranges <http://www.lib.umich.edu/browse/categories/xml.php> (1.8mb) if that’s your thing. On Wed, Apr 13, 2016 at 10:38 AM, William Denton <w...@pobox.com> wrote: > On 13 April 2016, Mark Watkins wrote: > > I'm a library sciences newbie, but it seems like LCSH doesn't really >> provide a formal hierarchy of genre/topic, just a giant controlled >> vocabulary. Bisac seems to provide the "expected" hierarchy. >> >> Is anyone aware of any approaches (or better yet code!) that translates >> lcsh to something like BISAC categories (either BISAC specifically or some >> other hierarchy/ontology)? General web searching didn't find anything >> obvious. >> > > There's HILCC, the Hierarchical Interface of LC Classification: > > https://www1.columbia.edu/sec/cu/libraries/bts/hilcc/subject_map.html > > Bill > -- > William Denton ↔ Toronto, Canada ↔ https://www.miskatonic.org/ -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Ruby MARC::Record: anyone need ruby 1.8 support anymore?
Ruby 1.8 was EOL'd about 2.5 years ago, so in theory everyone should be long off of it. In practice, well, I thought I'd ask before making any releases that change that. Sidenote: does dropping supported for a long-EOLd version of the software constitute a major version change under SemVer? None of the public interfaces would change (it's a performance-focused release I'm considering). -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Traject 2.0.0 released: index MARC into Solr with ruby
[Apologies, as always, for any cross-post copies] The traject https://github.com/traject-project/traject/ maintainers are happy to announce the release of traject version 2.0.0. Traject is an ETL (extract/transform/load) system designed and optimized for indexing MARC records into Solr. It is similar in functionality to solrmarc https://code.google.com/p/solrmarc/, but with everything written in ruby instead of java. Traject 2.0 brings several notable changes: - Support for MRI (“normal”) rub y, JRuby , and rbx - New Solr JSON writer (for solr versions =3.2) accessible from MRI and with about 20% better performance than previous indexing. - New writers for producing tab-delimited/CSV files (Note that while traject runs fine under MRI, you’ll get substantially faster indexing using JRuby due to traject’s use of multiple threads when available). Traject is in production use indexing metadata for the library catalogs of the University of Michigan, the HathiTrust, Johns Hopkins, and Brown University. (Using Traject? Let us know!) - The traject README https://github.com/traject-project/traject/ and doc folder https://github.com/traject-project/traject/tree/master/doc contain reference information, and we also provide a sample real-ish configuration https://github.com/traject-project/traject_sample to help get you started. - Brown University is using traject for a new search interface; the Brown configuration https://github.com/Brown-University-Library/bul-traject/ is a great example of a real-life traject installation. - The University of Michigan and Hathitrust catalog are also indexed with traject; their shared configuration https://github.com/billdueber/ht_traject provides another (potentially overly)-complex real-life set of configuration files. Thanks to everyone who provided feedback for this release! Feel free to contact me with questions directly, or add issues/ pull requests to the github project https://github.com/traject-project/traject/ . -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Announcement: ruby-marc 0.8.2 re-released as version 1.0.0
The ruby-marc https://github.com/ruby-marc/ruby-marc team is happy to announce that we’ve decided to release the current code as version 1.0.0. There are no non-cosmetic changes to this code compared to the until-now-current version 0.8.2. The jump to version 1.0.0 reflects the *de facto* use of the marc gem in production at dozens of institutions and allows further development to more easily adhere to semantic versioning http://semver.org/. In that vein, please begin the process of updating your gem directives in Gemfiles and .gemspec files to something like gem 'marc', '~1' …to be sure you have the latest backwards-compatible version for your projects. Thanks to everyone involved, from committers to folks who file bugs, for the progress ruby-marc has made over the years. Special thanks for the most recent releases go to Jonathan Rochkind, whose work on encodings (including MARC-8!!) has been relentless. -Bill Dueber, for the ruby-marc contributors- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Lorem Ipsum metadata? Is there such a thing?
code that handles paging in a UI, and I had to make it all up by hand. This hurts my soul. Someone please tell me such a service exists, and link me to it, so I never have to do this again. Or else, I may just make such a service, to save us all. But I don't want to go coding some new service if it already exists, because that sort of thing is for chumps. -- HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ Making things that are beautiful is real fun. --Lou Reed -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] The lie of the API
On Sun, Dec 1, 2013 at 7:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote: +1 to all of Richard's points here. Making something easier for you to develop is no justification for making it harder to consume or deviating from well supported standards. I just want to point out that as much as we all really, *really* want easy to consume and following the standards to be the same thingthey're not. Correct content negotiation is one of those things that often follows the phrase all they have to do..., which is always a red flag, as in Why give the user different URLs when *all they have to do is* Caching, json vs javascript vs jsonp, etc. all make this harder. If *all * *I have to do* is know that all the consumers of my data are going to do content negotiation right, and then I need to get deep into the guts of my caching mechanism, then set up an environment where it's all easy to test...well, it's harder. And don't tell me how lazy I am until you invent a day with a lot more hours. I'm sick of people telling me I'm lazy because I'm not pure. I expose APIs (which have their own share of problems, of course) because I want them to be *useful* and *used. * -Bill, apparently feeling a little bitter this morning - -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARC field lengths
I'm running it against the HathiTrust catalog right now. It'll just take a while, given that I don't have access to Roy's Hadoop cluster :-) On Wed, Oct 16, 2013 at 1:38 PM, Sean Hannan shan...@jhu.edu wrote: That sounds like a request for Roy to fire up the ole OCLC Hadoop. -Sean On 10/16/13 1:06 PM, Karen Coyle li...@kcoyle.net wrote: Anybody have data for the average length of specific MARC fields in some reasonably representative database? I mainly need 100, 245, 6xx. Thanks, kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARC field lengths
For the HathiTrust catalog's 6,046,746 bibs and looking at only the lengths of the subfields $a and $b in 245s, I get an average length of 62.0 On Wed, Oct 16, 2013 at 3:24 PM, Kyle Banerjee kyle.baner...@gmail.comwrote: 245 not including $c, indicators, or delimiters, |h (which occurs before |b), |n, |p, with trailing slash preceding |c stripped for about 9 million records for Orbis Cascade collections is 70.1 kyle On Wed, Oct 16, 2013 at 12:00 PM, Karen Coyle li...@kcoyle.net wrote: Thanks, Roy (and others!) It looks like the 245 is including the $c - dang! I should have been more specific. I'm mainly interested in the title, which is $a $b -- I'm looking at the gains and losses of bytes should one implement FRBR. As a hedge, could I ask what've you got for the 240? that may be closer to reality. kc On 10/16/13 10:57 AM, Roy Tennant wrote: I don't even have to fire it up. That's a statistic that we generate quarterly (albeit via Hadoop). Here you go: 100 - 30.3 245 - 103.1 600 - 41 610 - 48.8 611 - 61.4 630 - 40.8 648 - 23.8 650 - 35.1 651 - 39.6 653 - 33.3 654 - 38.1 655 - 22.5 656 - 30.6 657 - 27.4 658 - 30.7 662 - 41.7 Roy On Wed, Oct 16, 2013 at 10:38 AM, Sean Hannan shan...@jhu.edu wrote: That sounds like a request for Roy to fire up the ole OCLC Hadoop. -Sean On 10/16/13 1:06 PM, Karen Coyle li...@kcoyle.net wrote: Anybody have data for the average length of specific MARC fields in some reasonably representative database? I mainly need 100, 245, 6xx. Thanks, kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] ANNOUNCEMENT: Traject MARC-Solr indexer release
'traject' means to transmit (e.g., trajectory) -- or at least it did, when people still used it, which they don't. The traject workflow is incredibly general: *a reader* sends *a record* to *an indexing routine* which stuffs...stuff...into a context object which is then sent to *a writer*. We have a few different MARC readers, a few useful writers (one of which, obviously, is the solr writer), and a bunch of shipped routines (which we're calling macros but are just well-formed ruby lambda or blocks) for extracting and transforming common MARC data. [see http://robotlibrarian.billdueber.com/announcing-traject-indexing-software/for more explanation and some examples] But there's no reason why a reader couldn't produce a MODS record which would then be worked on. I'm already imagining readers and writers that target databases (RDBMS or NoSQL), or a queueing system like Hornet, etc. If there are people at Stanford that want to talk about how (easy it is) to extend traject, I'd be happy to have that conversation. On Tue, Oct 15, 2013 at 12:28 PM, Tom Cramer tcra...@stanford.edu wrote: ++ Jonathan and Bill. 1.) Do you have any thoughts on extending traject to index other types of data--say MODS--into solr, in the future? 2.) What's the etymology of 'traject'? - Tom On Oct 14, 2013, at 8:53 AM, Jonathan Rochkind wrote: Jonathan Rochkind (Johns Hopkins) and Bill Dueber (University of Michigan), are happy to announce a robust, feature-complete beta release of traject, a tool for indexing MARC data to Solr. traject, in the vein of solrmarc, allows you to define your indexing rules using simple macro and translation files. However, traject runs under JRuby and is ruby all the way down, so you can easily provide additional logic by simply requiring ruby files. There's a sample configuration file to give you a feel for traject[1]. You can view the code[2] on github, and easily install it as a (jruby) gem using gem install traject. traject is in a beta release hoping for feedback from more testers prior to a 1.0.0 release, but it is already being used in production to generate the HathiTrust (metadata-lookup) Catalog (http://www.hathitrust.org/). traject was developed using a test-driven approach and has undergone both continuous integration and an extensive benchmarking/profiling period to keep it fast. It is also well covered by high-quality documentation. Feedback is very welcome on all aspects of traject including documentation, ease of getting started, features, any problems you have, etc. What we think makes traject great: * It's all just well-crafted and documented ruby code; easy to program, easy to read, easy to modify (the whole code base is only 6400 lines of code, more than a third of which is tests) * Fast. Traject by default indexes using multiple threads, so you can use all your cores! * Decoupled from specific readers/writers, so you can use ruby-marc or marc4j to read, and write to solr, a debug file, or anywhere else you'd like with little extra code. * Designed so it's easy to test your own code and distribute it as a gem We're hoping to build up an ecosystem around traject and encourage people to ask questions and contribute code (either directly to the project or via releasing plug-in gems). [1] https://github.com/traject-project/traject/blob/master/test/test_support/demo_config.rb [2] http://github.com/traject-project/traject -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Good MARC PHP Libraries,
Given that File_MARC has been around since, what, the late 1950's, why don't you just slap a 1.0 on it? It's not like anyone isn't using it because they're waiting for the API to stabilize; we're all using it regardless. On Thu, Sep 26, 2013 at 12:01 AM, Dan Scott deni...@gmail.com wrote: I hear the maintainer of File_MARC is pretty responsive to questions and bug reports. This list might be a good place to raise questions about usage; others may be interested. Was the random undescriptive exit error something like the following? C:\phppear install File_MARC Failed to download pear/File_MARC within preferred state stable, latest release is version 0.7.3, stability beta, use channel://pear.php.net/File_MARC-0.7.3 to install install failed One of these days that package will make it to 1.0 and the -beta will no longer be necessary. Or the pear.php.net install instructions will include that. Or newer versions of PEAR will be smarter about detecting that no stable version is available and automatically offer to install the beta. On Wed, Sep 25, 2013 at 8:18 PM, Riley Childs ri...@tfsgeo.com wrote: Thanks! I will give it a shot tomorrow Riley Childs Junior and Library Tech Manager Charlotte United Christian Academy +1 (704) 497-2086 Sent from my iPhone Please excuse mistakes On Sep 25, 2013, at 8:14 PM, Ross Singer rossfsin...@gmail.com wrote: Try: pear install file_marc-beta -Ross. On Wednesday, September 25, 2013, Riley Childs wrote: I have been having some troubles with the installation (some random undescriptive exit error) Riley Childs Junior and Library Tech Manager Charlotte United Christian Academy +1 (704) 497-2086 Sent from my iPhone Please excuse mistakes On Sep 25, 2013, at 7:28 PM, Eric Phetteplace phett...@gmail.com javascript:; wrote: I think File_MARC is the standard: http://pear.php.net/package/File_MARC/ Are there others? Best, Eric On Wed, Sep 25, 2013 at 7:17 PM, Riley Childs ri...@tfsgeo.com javascript:; wrote: Does anyone know of any good MARC PHP Libraries, I am struggling to create MARC records out of our proprietary database. Riley Childs Junior and Library Tech Manager Charlotte United Christian Academy +1 (704) 497-2086 Sent from my iPhone Please excuse mistakes -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A Proposal to serialize MARC in JSON
I can see where you might think that no progress has been made because the only real document of the format is that old, old blog post. The problem, however, is not a lack of progress but a lack of documentation of that progress. File_MARC (PHP), MARC::Record (perl), ruby-marc (ruby) and marc4j (java) will all deal, to one extent or another, either with the JSON directly or with a hash/map data structure that maps directly to that JSON structure. [BTW, can anyone summarize the state of pymarc wrt marc-in-json?] On Tue, Sep 3, 2013 at 5:09 AM, dasos ili dasos_...@yahoo.gr wrote: It is exactly three years back, and no real progress has been made concerning this proposal to serialize MARC in JSON: http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/ Meanwhile new tools for searching and retrieving records have come in, such as Solr and Elasticsearch. Any ideas on how one could alter (or propose a new format) more suited to the mechanisms of these two search platforms? Any example implemantations would be also really appreciated, thank you in advance -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Ruby AlephSequential file reader
I've written up a quick-and-dirty (well, except for the 'quick' part) ruby-marc reader class to read AlephSequential files as output from the Ex Libris Aleph system. If you don't know what that is, or why you would want it, thank your god and move on. Initial code is at https://github.com/billdueber/marc_alephsequential If there's any interest, I'll gemify it and/or start a discussion about whether or not to fold this into ruby-marc proper. Speed isn't too awful -- about 150% the speed of reading a marc-binary file with ruby-marc on my machine. Pull requests are *always *in fashion (...at the Copa...) -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] New perl module MARC::File::MiJ -- marc-in-json for perl
The marc-in-jsonhttp://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/format is, as you might expect, a JSON serialization for MARC. A JSON serialization for MARC is potentially useful in the same places where MARC-XML would be useful (long records, utility of human-readable records, etc.) without what many perceive to be the relative pain of working with XML vs JSON. It's currently supported across several implementations: - ruby's *marc* gem - php's *File_MARC* - java's *marc4j* - python's *pymarc* There wasn't one for perl, so I wrote one :-) MARC::File::MiJhttp://search.cpan.org/~gmcharlt/MARC-File-MiJ-0.01/lib/MARC/File/MiJ.pmis a perl module that allows MARC::Record to encode/decode marc-in-json. It also supplies a handler to MARC::File/MARC::Batch that will read marc-in-json records from a newline-delimited-json (ndj) file (where each line is a JSON object without unescaped newlines, ending with a newline). marc-in-json encoding/decoding tends to be pretty fasthttp://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/, since json parsers tend to be pretty fast, and uncompressed filesizes occupy a middle-ground between binary marc and marc-xml. A sample file of about 18k marc records looks like this: 31M topics.mrc 56M topics.ndj (newline-delimited JSON) 93M topics.xml 8.9M topics.mrc.gz 7.9M topics.ndj.gz 8.7M topics.xml.gz ...so obviously it compresses pretty well, too. I can take generic questions; bugs should go to https://rt.cpan.org/Public/Bug/Report.html?Queue=MARC-File-MiJ [ Note that there are many other possible JSON serializations for MARChttp://jakoblog.de/2011/04/13/mapping-bibliographic-record-subfields-to-json/, including the (incompatible) one implemented in the MARC::File::JSONhttp://search.cpan.org/~cfouts/MARC-File-JSON-0.002/lib/MARC/File/JSON.pmmodule] -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] ISBN/LCCN normalization for Solr
Thanks to the efforts of Jay Lurker, Jonathan Rochkind, and Adam Constabaris, Solr analyzer filters to normalize ISBNs (to ISBN13s) and LCCNs are now cleaned up and ready to work with Solr 4.x. I've extracted the code into a new repo, shined up the README, and provided a .jar for download and instructions on what to do with it. Get it while it's hot at https://github.com/billdueber/solr-libstdnum-normalize -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A Responsibility to Encourage Better Browsers ( ? )
Keep in mind that many old-IE users are there because their corporate/gov entity requires it. Our entire univeristy health/hospital complex, for example, was on IE6 until...last year, maybe?... because they had several critical pieces of software written as active-x components that only ran in IE6. Which, sure, you can say that's dumb (because it is), but at the same time we couldn't have a setup that made it hard for the doctors and researchers use the library. On Tue, Feb 19, 2013 at 10:22 AM, Michael Schofield mschofi...@nova.eduwrote: Hi everyone, I'm having a change of heart. It is kind of sacrilegious, especially if you-like me-evangelize mobile-first, progressively enhanced web design, to throw alerts when users hit your site using IE7 / IE8 that encourage upgrading or changing browsers. Especially in libraries which are legally and morally mandated to be the pinnacle of accessibility, your website should - er, ideally - be functional in every browser. That's certainly what I say when I give a talk. But you know what? I'm kind of starting to not care. I understand that patrons blah blah might not blah blah have access to anything but IE7 or IE8 - but, you know, if they're on anything other than Windows 95 that isn't true. * Using Old IE makes you REALLY vulnerable to malicious software. * Spriting IEs that don't support gradients, background size, CSS shapes, etc. and spinning-up IE friendly stylesheets (which, admittedly, is REALLY easy to do with Modernizr and SASS) can be a time-sink, which I am starting to think is more of a disservice to the tax- and tuition-payers that pad my wallet. I ensure that web services are 100% functional for deprecated browsers, and there is lingering pressure-especially from the public wing of our institution (which I totally understand and, in the past, sympathized with) to present identical experiences across browsers. But you know what I did today? I sinned. From our global script, if modernizr detects that the browser is lt-ie9, it appends just below the navbar a subtle notice: Did you know that your version of Internet Explorer is several years old? Why not give Firefox, Google Chrome, or Safari a try?* In most circles this is considered the most heinous practice. But, you know, I can no longer passively stand by and see IE8 rank above the others when I give the analytics report to our web committee. Nope. The first step in this process was dropping all support for IE7 / Compatibility Mode a few months ago. Now that Google, jQuery, and others will soon drop support for IE8 - its time to politely join-in and make luddite patrons aware. IMHO, anyway. Already, old IE users get the raw end of the bargain because just viewing our website makes several additional server requests to pull additional CSS and JS bloat, not to mention all the images graphics they don't support. Thankfully, IE8 is cool with icon fonts, otherwise I'd be weeping at my desk. Now, why haven't I extended this behavior to browsers with limited support for, say, css gradients? That's trickier. A user might have the latest HTC phone but opt to surf in Opera Mini. There are too many variables and too many webkits (etc.). With old IE you can infer that a.) the user has a lap- or desktop, and [more importantly] b.) that old IE will never be a phone. Anyway, This is a really small-potatoes rant / action, but in a culture of all accessibility / never pressuring the user / whatever, it feels momentous. I kind of feel stupid getting all high and mighty about it. What do you think? Michael | Front End Librarian | www.ns4lib.com * Why, you may ask, did I not suggest IE9? Well, IE9 isn't exactly the experience we'd prefer them to have, but also according to our analytics the huge majority of old IE users are on Windows XP - where 9 isn't an option anyway. Eventually, down the road, we'll encourage IE9ers to upgrade too (once things like flexbox become standard), and at least they should have the option to try IE10. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] library date parsing
Speaking of which...does any have robust code for getting the date of publication out of a MARC record, correcting for (or ignoring or otherwise dealing with) stuff in the fixed fields, dates on other calendars, dates that are far enough in the future that they must be a mistake, etc.? -Bill yes, that *was* published in 5763 Dueber On Thu, Feb 7, 2013 at 11:40 AM, Kevin S. Clarke kscla...@gmail.com wrote: I have an idea stuck in my memory that OCLC wrote a Java-based date parsing library long ago (that parses all the library world's strange date formats). My search-fu seems to be weak, though, because I don't seem to be able to Google/find it. Was it just a crazy dream or does anyone know what I'm talking about (and how to find it)? Thanks, Kevin -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] code4lib 2013 location
On Tue, Feb 5, 2013 at 12:01 PM, Francis Kayiwa kay...@uic.edu wrote: Power will be better than the Superbowl post half-time but we expect you to share. :-) Does this mean We'll loaded for bear or Bring your own plug-strips? Also, a reminder to people -- put your name on your computer *and your power adapter.* Things can get...confusing. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] conf presenters: a kind request
I'm gonna add to this briefly, and probably a bit less tactfully than Jonathan :-) - My number-one complaint about past presentations: Don't have slides we can't read. You probably can't read this, but... isn't a helpful thing to hear during a presentation. Make it legible, or figure out a different way to present the information. A kick-ass poster or UML diagram or flowchart or whatever isn't kick-ass when we can't read it. It's just an uninformative blur. [Note: this doesn't mean you shouldn't include the kick-ass poster when you upload your slides. Please do!] - Make sure your content fits well in the time allotted. You're not there to get through as much as possible. You're there to best use our collective time to make the argument that what you're doing is important/impressive/worth knowing, and to convey *as much of the interesting bits as you can without rushing*. The goal isn't for you to get lots of words out of your mouth; the goal is for us to understand them. If you absolutely can't cut it down to a point where you're not rushing, then you haven't done the hard work of distilling out the interesting bits, and you should get on that right away. - On the flip side, don't present for 8mn and leave plenty of time for questions. Odds are your'e not saying anything interesting enough to elicit questions in those 8 minutes. If you really only have 8mn of content, well, you shouldn't have proposed a talk. But odds are you *do* have interesting things to say, and may want to chat with your colleagues to figure out exactly what that is. - Don't make the 3.38 million messages on creating a non-threatening environment be for naught. Please. As Jonathan said: this is a great, great audience. We're all forgiving, we're all interested, we're all eager to lean new things and figure out how to apply them to our own situations. We love to hear about your successes. We *love* to hear about failures that include a way for us to avoid them, and you're going to be well-received no matter what because a bunch of people voted to hear you! On Mon, Feb 4, 2013 at 10:47 AM, Jonathan Rochkind rochk...@jhu.edu wrote: We are all very excited about the conference next week, to speak to our peers and to hear what our peers have to say! I would like to suggest that those presenting be considerate to your audience, and actually prepare your talk in advance! You may think you can get away with making some slides that morning during someone elses talk and winging it; nobody will notice right? Or they wont' care if they do? From past years, I can say that for me at least, yeah, I can often tell who hasn't actually prepared their talk. And I'll consider it disrespectful to the time of the audience, who voted for your talk and then got on airplanes to come see it, and you didn't spend the time to plan it advance and make it as high quality for them as you could. I don't mean to make people nervous about public speaking. The code4lib audience is a very kind and generous audience, they are a good audience. It'll go great! Just maybe repay their generosity by actually preparing your talk in advance, you know? Do your best, it'll go great! If you aren't sure how to do this, the one thing you can probably do to prepare (maybe this is obvious) is practice your presentation in advance, with a timer, just once. In front of a friend or just by yourself. Did you finish on time, and get at least half of what was important in? Then you're done preparing, that was it! Yes, if you're going to have slides, this means making your slides or notes/outline in advance so you can practice your delivery just once! Just practice it once in advance (even the night before, as a last resort!), and it'll go great! Jonathan -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Why we need multiple discovery services engine?
and hosted metadata results presented seperately (although probably preferably in a consistent UI), rather than merged. A bunch more discussion of these issues is included in my blog post at: http://bibwild.wordpress.com/2012/10/02/article-search-improvement-strateg y/ From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Wayne Lam [ wing...@gmail.com] Sent: Thursday, January 31, 2013 9:31 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Why we need multiple discovery services engine? Hi all, I saw in numerous of library website, many of them would have their own based discovery services (e.g. blacklight / vufind) and at the same time they will have vendor based discovery services (e.g. EDS / Primo / Summon). Instead of having to maintain 2 separate system, why not put everything into just one? Any special reason or concern? Best Wayne -- Emily Lynema Associate Department Head Information Technology, NCSU Libraries 919-513-8031 emily_lyn...@ncsu.edu -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Code4Lib Conference streaming?
...and a gentle reminder to people actually *at* the conference to *please don't stream the talk you're actually sitting in*. If you can't see, move up; don't kill the wifi ;-) -Bill, remembering the conf at IU where this happened - On Wed, Jan 30, 2013 at 8:49 AM, Sarah Wiebe swi...@georgebrown.ca wrote: +1 Eagerly awaiting streaming news. :) -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Phetteplace Sent: January-29-13 9:59 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib Conference streaming? yayyy! I can't stress how valuable this is for those of us who can only attend a couple conferences a year. Best, Eric Phetteplace Emerging Technologies Librarian Chesapeake College Wye Mills, MD On Tue, Jan 29, 2013 at 9:41 PM, Margaret Heller mhell...@luc.edu wrote: Yes, thanks to the people at UIC Learning Environments Technology Services the conference will be streamed and archived. We are awaiting details, but certainly will publicize it widely when we have them. Margaret Heller Margaret Heller Digital Services Librarian Loyola University Chicago 773.508.2686 Tom Keays tomke...@gmail.com 01/29/13 20:36 PM I was wondering if talks from the conference would be streamed this year? It was really great to have it the last time I was unable to attend. Tom -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Adding authority control to IR's that don't have it built in
Has anyone created a nice little wrapper around FAST? I'd like to test out including FAST subjects in our catalog, but am hoping someone else went through the work of building the code to do it :-) I know FAST has a web interface, but I've got about 10M records and would rather use something local. On Tue, Jan 29, 2013 at 4:36 PM, Ed Summers e...@pobox.com wrote: Hi Kyle, If you are thinking of doing name or subject authority control you might want to check out OCLC's VIAF AutoSuggest service [1] and FAST AutoSuggest [2]. There are also autosuggest searches for the name and subject authority files, that are lightly documented in their OpenSearch document [3]. In general, I really like this approach, and I think it has a lot of potential for newer cataloging interfaces. I'll describe two scenarios that I'm familiar with, that have worked quite well (so far). Note, these aren't IR per-se, but perhaps they will translate to your situation. As part of the National Digital Newspaper Program LC has a simple app so that librarians can create essays that describe newspapers in detail. Rather than making this part of our public website we created an Essay Editor as a standalone django app that provides a web based editing environment, for authority the essays. Part of this process is linking up the essay with the correct newspaper. Rather than load all the newspapers that could be described into the Essay Editor, and keep them up to date, we exposed an OpenSearch API in the main Chronicling America website (where all the newspaper records are loaded and maintained) [4]. It has been working quite well so far. Another example is the jobs.code4lib.org website that allows people to enter jobs announcements. I wanted to make sure that it was possible to view jobs by organization [5], or skill [6] -- so some form of authority control was needed. I ended up using Freebase Suggest [7] that makes it quite easy to build simple forms that present users with subsets of Freebase entities, depending on what they type. A nice side benefit of using Freebase is that you get descriptive text and images for the employers and topics for free. It has been working pretty well so far. There is a bit of an annoying conflict between the Freebase CSS and Twitter Bootstrap, which might be resolved by updating Bootstrap. Also, I've noticed Freebase's service slowing down a bit lately, which hopefully won't degrade further. The big caveat here is that these external services are dependencies. If they go down, a significant portion of your app might go down to. Minimizing this dependency, or allowing things degrade well is good to keep in mind. Also, it's worth remembering identifiers (if they are available) for the selected matches, so that they can be used for linking your data with the external resource. A simple string might change. I hope this helps. Thanks for the question, I think this is an area where we can really improve some of our back-office interfaces and applications. //Ed [1] http://www.oclc.org/developer/documentation/virtual-international-authority-file-viaf/request-types#autosuggest [2] http://experimental.worldcat.org/fast/assignfast/ [3] http://id.loc.gov/authorities/opensearch/ [4] http://chroniclingamerica.loc.gov/about/api/#autosuggest [5] http://jobs.code4lib.org/employer/university-of-illinois-at-urbana-champaign/ [6] http://jobs.code4lib.org/jobs/ruby/ [7] http://wiki.freebase.com/wiki/Freebase_Suggest On Tue, Jan 29, 2013 at 11:59 AM, Kyle Banerjee kyle.baner...@gmail.com wrote: How are libraries doing this and how well is it working? Most systems that even claim to have authority control simply allow a controlled keyword list. But this does nothing for the see and see also references that are essential for many use cases (people known by many names, entities that change names, merge or whatever over time, etc). The two most obvious solutions to me are to write an app that provides this information interactively as the query is typed (requires access to the search box) or to have a record that serves as a disambiguation page (might not be noticed by the user for a variety of reasons). Are there other options, and what do you recommend? Thanks, kyle -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Adding authority control to IR's that don't have it built in
Right -- I'd like to show the FAST stuff as facets in our catalog search (or, at least try it out and see if anyone salutes). So I'd need to inject the FAST data into the records at index time. On Tue, Jan 29, 2013 at 4:59 PM, Ed Summers e...@pobox.com wrote: I think that Mike Giarlo and Michael Witt used the FAST AutoSuggest as part of their databib project [1]. But are you talking about bringing the data down for a local index? //Ed [1] http://databib.org/ On Tue, Jan 29, 2013 at 4:45 PM, Bill Dueber b...@dueber.com wrote: Has anyone created a nice little wrapper around FAST? I'd like to test out including FAST subjects in our catalog, but am hoping someone else went through the work of building the code to do it :-) I know FAST has a web interface, but I've got about 10M records and would rather use something local. On Tue, Jan 29, 2013 at 4:36 PM, Ed Summers e...@pobox.com wrote: Hi Kyle, If you are thinking of doing name or subject authority control you might want to check out OCLC's VIAF AutoSuggest service [1] and FAST AutoSuggest [2]. There are also autosuggest searches for the name and subject authority files, that are lightly documented in their OpenSearch document [3]. In general, I really like this approach, and I think it has a lot of potential for newer cataloging interfaces. I'll describe two scenarios that I'm familiar with, that have worked quite well (so far). Note, these aren't IR per-se, but perhaps they will translate to your situation. As part of the National Digital Newspaper Program LC has a simple app so that librarians can create essays that describe newspapers in detail. Rather than making this part of our public website we created an Essay Editor as a standalone django app that provides a web based editing environment, for authority the essays. Part of this process is linking up the essay with the correct newspaper. Rather than load all the newspapers that could be described into the Essay Editor, and keep them up to date, we exposed an OpenSearch API in the main Chronicling America website (where all the newspaper records are loaded and maintained) [4]. It has been working quite well so far. Another example is the jobs.code4lib.org website that allows people to enter jobs announcements. I wanted to make sure that it was possible to view jobs by organization [5], or skill [6] -- so some form of authority control was needed. I ended up using Freebase Suggest [7] that makes it quite easy to build simple forms that present users with subsets of Freebase entities, depending on what they type. A nice side benefit of using Freebase is that you get descriptive text and images for the employers and topics for free. It has been working pretty well so far. There is a bit of an annoying conflict between the Freebase CSS and Twitter Bootstrap, which might be resolved by updating Bootstrap. Also, I've noticed Freebase's service slowing down a bit lately, which hopefully won't degrade further. The big caveat here is that these external services are dependencies. If they go down, a significant portion of your app might go down to. Minimizing this dependency, or allowing things degrade well is good to keep in mind. Also, it's worth remembering identifiers (if they are available) for the selected matches, so that they can be used for linking your data with the external resource. A simple string might change. I hope this helps. Thanks for the question, I think this is an area where we can really improve some of our back-office interfaces and applications. //Ed [1] http://www.oclc.org/developer/documentation/virtual-international-authority-file-viaf/request-types#autosuggest [2] http://experimental.worldcat.org/fast/assignfast/ [3] http://id.loc.gov/authorities/opensearch/ [4] http://chroniclingamerica.loc.gov/about/api/#autosuggest [5] http://jobs.code4lib.org/employer/university-of-illinois-at-urbana-champaign/ [6] http://jobs.code4lib.org/jobs/ruby/ [7] http://wiki.freebase.com/wiki/Freebase_Suggest On Tue, Jan 29, 2013 at 11:59 AM, Kyle Banerjee kyle.baner...@gmail.com wrote: How are libraries doing this and how well is it working? Most systems that even claim to have authority control simply allow a controlled keyword list. But this does nothing for the see and see also references that are essential for many use cases (people known by many names, entities that change names, merge or whatever over time, etc). The two most obvious solutions to me are to write an app that provides this information interactively as the query is typed (requires access to the search box) or to have a record that serves as a disambiguation page (might not be noticed by the user for a variety of reasons). Are there other options, and what do you recommend? Thanks, kyle -- Bill Dueber
Re: [CODE4LIB] Anyone have a SUSHI client?
Yeah -- I found that right away. Most of what's there appears to be abandonware. On Thu, Jan 24, 2013 at 9:10 AM, Tom Keays tomke...@gmail.com wrote: Hey. NISO has a list of SUSHI tools. http://www.niso.org/workrooms/sushi/tools/ Tom -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Anyone have a SUSHI client?
[Background: SUSHI http://www.niso.org/committees/SUSHI/SUSHI_comm.htmlis a SOAP protocol for getting data on use of electronic resources in the COUNTER format] I'm just starting to look at trying to get COUNTER data via SUSHI into our data warehouse, and I'm discovering that apparently no one has worked on a SUSHI client since late 2009. UnlessI'm missing one? Anyone out there using SUSHI and have a client that works and is up-to-date and has some documentation of some sort? I'd prefer ruby or java, but will take anything that'll run under linux (i.e., not C#) at this point. I'm desperately trying not to have to deal with the raw SOAP and parsing the XML and such, so any help would be appreciated. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Anybody using the Open Library APIs?
The HathiTrust BibAPI might help you out -- you can get MARC-XML back with a call, although of course its only as good as the underlying record and our coverage won't be nearly as good as the OCLC. Format is: http://catalog.hathitrust.org/api/volumes/full/isbn/080582796X.json On Tue, Jan 22, 2013 at 8:38 PM, William Denton w...@pobox.com wrote: On 21 January 2013, David Fiander wrote: All I'm really looking for at this point is a way to convert an ISBN into basic bibliographic data, and to find any related ISBNs, a la OCLC's xISBN service. LibraryThing's thingISBN is nice and might serve your needs: http://www.librarything.com/**wiki/index.php/LibraryThing_**APIshttp://www.librarything.com/wiki/index.php/LibraryThing_APIs Bill -- William Denton Toronto, Canada http://www.miskatonic.org/ -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Zoia
On Tue, Jan 22, 2013 at 9:50 PM, Genny Engel gen...@sonoma.lib.ca.us wrote: Guess there's no groundswell of support for firing Zoia and replacing her/it with a GLaDOS irc bot, then? I'm in. We've both said things you're going to regret. [GLaDOS https://en.wikipedia.org/wiki/Glados is the really-quite-mean AI from the games Portal and Portal2] On Tue, Jan 22, 2013 at 9:50 PM, Genny Engel gen...@sonoma.lib.ca.uswrote: Guess there's no groundswell of support for firing Zoia and replacing her/it with a GLaDOS irc bot, then? *Sigh.* Genny Engel -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Andromeda Yelton Sent: Friday, January 18, 2013 11:30 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Zoia FWIW, I am both an active #libtechwomen participant and someone who is so thoroughly charmed by zoia I am frequently bothered she isn't right there *in my real life*. (Yes, I have tried to issue zoia commands during face-to-face conversations with non-Code4Libbers.) I think a collaboratively maintained bot with a highly open ethos is always going to end up with some things that cross people's lines, and that's an opportunity to talk about those lines and rearticulate our group norms. And to that end, I'm in favor of weeding the collection of plugins, whether because of offensiveness or disuse. (Perhaps this would be a good use of github's issue tracker, too?) I also think some sort of 'what's zoia and how can you contribute' link would be useful in any welcome-newbie plugin; it did take me a while to figure out what was going on there. (Just as it took me the while to acquire the tastes for, say, coffee, bourbon, and blue cheese, tastes which I would now defend ferociously.) But not having zoia would make me sad. And defining zoia to be woman-unfriendly, when zoia-lovers and zoia-haters appear to span the gender spectrum and have a variety of reasons (both gendered and non) for their reactions, would make me sad too. @love zoia. Andromeda -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] A gentle proposal: slim down zoia during the conference
I'd like to propose that zoia (the IRC bot that provides help and entertainment in the #code4lib IRC channel) have some of its normal plugins disabled during conf. With three or four times as many people online during conference, things can get out of hand. Lots of zoia plugins can be useful during conference; I'm mostly thinking of stuff whose utility is suspect and whose output covers several lines. Some examples: - @mf - @cast - @tdih - @sing The goal, really, is to try and turn the firehose that the IRC channel becomes into something at least plausibly manageable in realtime. I can also make a case for things that newbies will just find confusing (chef, takify, etc.) or offensive (@forecast, @mf again) but I'll let others potentially make that case. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Groupon: $9 for 3-Day CTA Pass
I guess it depends on when you're leaving, but by my numbers it's more than three weeks until the conference... On Wed, Jan 16, 2013 at 11:22 AM, Wilhelmina Randtke rand...@gmail.comwrote: It says Allow up to 3 weeks for delivery of CTA Pass. This is better if you are going to ALA over the summer, or something else more in the future. -Wilhelmina Randtke On Wed, Jan 16, 2013 at 10:17 AM, Carmen Mitchell carmenmitch...@gmail.comwrote: For the folks going to Chicago this year...This is a great deal. $9 for a 3-Day Pass from the Chicago Transit Authority ($20 Value) http://www.groupon.com/deals/chicago-transit-authority-cta-3?utm_campaign=UserReferral_dpamp;utm_medium=emailamp;utm_source=uu83298 -Carmen -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] code4lib 2013 location
Because it seems like it might be useful, I've started a publicly-editable google map at http://goo.gl/maps/LWqay Right now, it has two points: the hotel and the conference location. Please add stuff as appropriate if the urge strikes you. On Fri, Jan 11, 2013 at 7:54 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 06:41:26PM -0500, Cynthia Ng wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? The entire conference and pre-conference is at UIC. The Forum is a revenue generating part of UIC. The pre-conference will be at the University Libraries on Monday with the exception of the Drupal one. The hotel is a mile or thereabouts from UIC Forum. Here is the problem with us natives planning. It never crossed our minds that walking a mile while on the *upper limit* of our shuttling to and from work is not the norm for everyone. This was brought to our attention and we will have a shuttle from the Hotel to the Conference venue. While we're on the subject, are the pre-conferences happening at the same location? See above. ./fxk On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy. -- Speed is subsittute fo accurancy. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] refworks export
The best generic format is probably RIS. It's simple and everyone reads it. For export to Refworks, I actually use the Refworks tagged formathttp://www.refworks.com/rwathens/help/RefWorks_Tagged_Format.htm -- it's at least as expressive as other tagged formats (RIS, Endnote, etc.) and allows more types (conference proceeding, book, etc.). I've attached two files (or, at least, I hope they're attached; not sure what the mailing software will do) that are simple YAML files specifying the mappings that I use, if you want to start there. It's pretty easy to see from the YAML files how to write the code to produce the actual export files. Let me know if you improve them :-) On Thu, Dec 27, 2012 at 4:16 PM, Jonathan Rochkind rochk...@jhu.edu wrote: If I have software I'm writing that I want to provide an export to refwroks from... ...refworks supports import in a bazillion different formats, many vendor-specific. What are people's experience with the best, most complete, easiest to work wtih, 'generic' format for RefWorks import? EndNote? RIS? Other? -- Bill Dueber Library Systems Programmer University of Michigan Library refworksFormatExport.yaml Description: Binary data risexport.yaml Description: Binary data
Re: [CODE4LIB] Code4Lib MidWest
I'm very interested, and imagine there are a few of us here in Ann Arbor that would make the day-trip. I'm personally on vacation the first week, if you're keeping track. On Sat, Apr 28, 2012 at 9:20 AM, Mita Williams mita.willi...@gmail.comwrote: I'm interested. I'd prefer during the week instead of weekends. Thanks! M On Fri, Apr 27, 2012 at 8:32 AM, Ken Irwin kir...@wittenberg.edu wrote: Thanks Ranti! I am definitely interested, and would favor a the latter end of the proposed timeframe. Ken -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Thursday, April 26, 2012 3:08 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib MidWest Hi Ranti, I work virtually with Educopia Institute and the MetaArchive Cooperative, and am based near Grand Rapids, MI. I would definitely look forward to attending being so close and all, and could do so either early in the week or the weekend. But would prefer the weekend. Best, Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Apr 26, 2012 at 2:45 PM, Ranti Junus ranti.ju...@gmail.com wrote: Hello All, Michigan State University (Lansing, MI) is hosting the next Code4Lib Midwest. We aim to hold the event in either week of July 16th or 23rd (but most likely not July 27th) either as 1.5 or 2 days event. So, my question for those who might be interested to come: would it be better to have it early in the week or weekend? Let me know and then I'll set up a doodle poll for the date options. thanks, ranti. -- Bulk mail. Postage paid. -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21
On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero sesunc...@gmail.com wrote: Actually Anglo and Francophone centric. And the USMARC style 245 was a poor replacement for the UKMARC approach (someone at the British Library hosted Linked Data meeting wondered why there were punctation characters included in the data in the title field. The catalogers wept slightly). Simon Slightly? I cry my eyes out *every single day* about that. Well, every weekday, anyway. -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Modern NACO Normalization (esp. in java?)
I'm about to embark on trying to write code to apply NACO normalization to strings (not for field-to-field comparisons, but for correctly sorting things). I was drivin to this by a complaint about how some Arabic manuscript titles are sorting. My end goal is a Solr filter, so I'm most interested in Java code. It doesn't look hard so much as long and error-prone so I'm hoping someone has already done this (or at least has a character map that I can easily convert to java). I've seen the code at the OCLChttp://www.oclc.org/research/activities/naco/default.htm, but it's 10 years old and doesn't have a lot of the non-latin stuff in it. Evergreen has a perl implementationhttp://git.evergreen-ils.org/?p=Evergreen.git;a=blob;f=Open-ILS/src/perlmods/lib/OpenILS/Utils/Normalize.pm: that's probably where I'll start if no one has anything else. Anyone? -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Modern NACO Normalization (esp. in java?)
Wow! Thanks, Ralph! This is great! On Wed, Apr 11, 2012 at 12:04 PM, LeVan,Ralph le...@oclc.org wrote: I'm pretty sure attachments don't work on the list, so I'm just pasting my NACO normalizer below. Note that there are 2007 versions of the normalize() method in there. This is used for all the VIAF and Identities indexing. Ralph -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Anyone using marc2solr?
A while ago I released the software I've been using for solr indexing as marc2solr (and related gems). I'm planning on starting over from the ground up, butwell, I really like the name. :-) Is there anyone out there actually *using* marc2solr besides me, in a way that would make repurposing the github/rubygem name a bad idea? I know in general it's a good idea to not do that, but I have a feeling this is essentially an internal project that happens to be exposed on the public web. [Note: I'm pretty sure a flame war about reusing old github/gem names isn't a great use of anyone's time.] -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Unicode font for PDF generation?
I don't know if it's any good, but TITUS[1] is a pan-unicode font free for non-commercial use. I don't know if that included embedding in a PDF or not. 1. http://titus.fkidg1.uni-frankfurt.de/unicode/tituut.asp On Fri, Mar 16, 2012 at 6:13 PM, Mark Redar mark.re...@ucop.edu wrote: Hi All, We're having some fun with unicode characters in PDF generation. We have a process that automatically generates a pdf from XML input. The tool stack doesn't support multiple fonts for displaying different codepoints so we need a good pan-unicode font to bundle with the pdfs. Currently, we use the DejaVu font family for creating the pdfs. This has good coverage for latin cyrillic characters but has no CJK (chinese-japanese-korean) coverage. We've looked into licensing a commercial fonts, but for web server use these require annual licensing fees that are substantial (in the thousands of $). A number of our source documents contain CJK characters and some contributors have noticed the lack of support for these characters. Does anyone know of a good pan-unicode free font that includes CJK codepoints that looks good? Gnu unifont has the coverage, but it is not the best looking font. Barring that, we're thinking of rolling our own pan-unicode font. There are good open source fonts for portions of the unicode character sets. We're hoping to find some way to take a number of open source fonts and combine them into one large pan-unicode font. Does anyone have experience with font authoring and merging different fonts? It looks as though FontForge can merge fonts, but it's not clear how to deal with overlapping codepoints in the merged fonts. Thanks, Mark -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] NON-MARC ILS?
On Wed, Mar 14, 2012 at 2:17 PM, Wilfred Drew dr...@tc3.edu wrote: I did not mean to sound snarky in my earlier message but I do not understand why no one is talking about standards and why we have them. This includes standard ways to present and transmit data between systems. That is oen of the big reasons for using MARC. I think at least partially because the standard (MARC21 with AACR2) is incredibly arcane with an enormous learning curve. It's hard, it doesn't make sense in lots and lots of ways, and for many applications the initial cost is just plain too steep, no matter what the eventual benefits. MARC/AACR2 is the standard I spend most of my time with, but that doesn't mean I find it easy to defend. Personally, I don't find it hard to imagine bibliographic applications where MARC cataloging is way over the top. If you only have a few thousand volumes, even something as simplistic as an RIS record for each item that includes a shelf-number will get you an awfully long way. Whether or not it gets your far enough is a different (and more difficult) question that can only be answered by the people on the ground, who know what they have and can guess at what's coming.
Re: [CODE4LIB] Preserving hyperlinks in conversion from Excel/googledocs/anything to PDF (was Any ideas for free pdf to excel conversion?)
What exactly are you trying to do? Take a list of links and turn them into...a list of hot links in a PDF file? On Mon, Mar 5, 2012 at 8:46 AM, Matt Amory matt.am...@gmail.com wrote: Does anyone know of any script library that can convert a set of (~200) hyperlinks into Acrobat's goofy protocol? I do own Acrobat Pro. Thanks On Wed, Dec 14, 2011 at 1:08 PM, Matt Amory matt.am...@gmail.com wrote: Just looking to preserve column structure. -- Matt Amory (917) 771-4157 matt.am...@gmail.com http://www.linkedin.com/pub/matt-amory/8/515/239 -- Matt Amory (917) 771-4157 matt.am...@gmail.com http://www.linkedin.com/pub/matt-amory/8/515/239 -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Metadata war stories...
://ead.lib.virginia.edu/vivaxtf/view?docId=uva-sc/viu00888.xml;query=;brand=default#adminlink On Fri, Jan 27, 2012 at 6:26 PM, Roy Tennantroytenn...@gmail.com wrote: Oh, I should have also mentioned that some of the worst problems occur when people treat their metadata like it will never leave their institution. When that happens you get all kinds of crazy cruft in a record. For example, just off the top of my head: * Embedded HTML markup (one of my favorites is animg tag) * URLs to remote resources that are hard-coded to go through a particular institution's proxy * Notes that only have meaning for that institution * Text that is meant to display to the end-user but may only do so in certain systems; e.g., Click here in a particular subfield. Sigh... Roy On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennantroytenn...@gmail.com wrote: Thanks a lot for the kind shout-out Leslie. I have been pondering what I might propose to discuss at this event, since there is certainly plenty of fodder. Recently we (OCLC Research) did an investigation of 856 fields in WorldCat (some 40 million of them) and that might prove interesting. By the time ALA rolls around there may something else entirely I could talk about. That's one of the wonderful things about having 250 million MARC records sitting out on a 32-node cluster. There are any number of potentially interesting investigations one could do. Roy On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslielesl...@loc.gov wrote: Roy's fabulous Bitter Harvest paper: http://roytennant.com/bitter_**harvest.html http://roytennant.com/bitter_harvest.html -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.**EDU CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Sending html via ajax -vs- building html in js (was: jQuery Ajax request to update a PHP variable)
To these I would add: * Reuse. The call you're making may be providing data that would be useful in other contexts as well. If you're generating application-specific html, that can't happen. But really, separation of concerns is the biggest one. Having to dig through both template and code to make stylistic changes is icky. Now excuse me, I have to go work with PHP. And then take a shower to try to get the smell off me. On Wed, Dec 7, 2011 at 5:19 PM, Robert Sanderson azarot...@gmail.comwrote: Here's some off the top of my head: * Separation of concerns -- You can keep your server side data transfer and change the front end easily by working with the javascript, rather than reworking both. * Lax Security -- It's easier to get into trouble when you're simply inlining HTML received, compared to building the elements. Getting into the same bad habits as SQL injection. It might not be a big deal now, but it will be later on. * Obfuscation -- It's easier to debug one layer of code rather than two at once. It's thus also easier to maintain the two layers of code, and easier to see at which end the system is failing. Rob On Wed, Dec 7, 2011 at 3:12 PM, Jonathan Rochkind rochk...@jhu.edu wrote: A fair number? Anyone but Godmar? On 12/7/2011 5:02 PM, Nate Vack wrote: OK. So we have a fair number of very smart people saying, in essence, it's better to build your HTML in javascript than send it via ajax and insert it. So, I'm wondering: Why? Is it an issue of data transfer size? Is there a security issue lurking? Is it tedious to bind events to the new / updated code? Something else? I've thought about it a lot and can't think of anything hugely compelling... Thanks! -Nate -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] marc in json
I've worked to deprecate marc-hash (what tends to be referred to as Bill Dueber's JSON format) in favor of Ross's marc-in-json. To the best of my knowledge, there is marc-in-json support for ruby (current ruby-marc), PHP (current File_MARC), marc4j (currently in trunk, soon to be released, I think), and perl (MARC::Record in the next release). I think that covers all the major players except the IndexData yaz- stuff. [Galen, any word on that next release of the perl module?] I, at least, already use marc-in-json in production (It's a great way to store MARC in solr). It would be great if folks would have the confidence to use it, at least as a single-record format. I think for wider adoption we'll need to all have either (a) json pull-parsers to read in a file that contains an array of marc-in-json objects, or (b) a decision to use newline-delimited-json (or some other record-delimiter), so folks can put more than one of these in a file and be able to get them out without running out of memory. -Bill- On Thu, Dec 1, 2011 at 9:11 AM, Ross Singer rossfsin...@gmail.com wrote: Ed, I think this would be great. Obviously, there's zero standardization around MARC/JSON (Andrew Houghton has come the closest by writing up the most RFC-y proposal: http://www.oclc.org/developer/content/marc-json-draft-2010-03-11). I generally fall more in the camp of working code wins, though, which, solely on the basis of MARC parser support, would put my proposal in front. In the end, I don't think it matters which style is adopted; it's an interchange format, any one of them works (and they all, including Bill Dueber's) has their pluses and minuses. The more important thing is that we pick -one- and go with it so we can use it with some confidence. While we're on the subject, if there are any other serializations of MARC that people are legitimately interested in (TurboMARC, for example: https://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records) and wish ruby-marc supported, let me know. Thanks, -Ross. On Thu, Dec 1, 2011 at 5:57 AM, Ed Summers e...@pobox.com wrote: Martin Czygan recently added JSON support to pymarc [1]. Before this gets rolled into a release I was wondering if it might make sense to bring the implementation in line with Ross Singer's proposed JSON serialization for MARC [2]. After quickly looking around it seems to be what got implemented in ruby-marc [3] and PHP's File_MARC [4]. It also looked like there was a MARC::Record branch [5] for doing something similar, but I'm not sure if that has been released yet. It seems like a no-brainer to bring it in line, but I thought I'd ask since I haven't been following the conversation closely. //Ed [1] https://github.com/edsu/pymarc/commit/245ea6d7bceaec7215abe788d61a0b34a6cd849e [2] http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/ [3] https://github.com/ruby-marc/ruby-marc/blob/master/lib/marc/record.rb#L227 [4] http://pear.php.net/package/File_MARC/docs/latest/File_MARC/File_MARC_Record.html#methodtoJSON [5] http://marcpm.git.sourceforge.net/git/gitweb.cgi?p=marcpm/marcpm;a=shortlog;h=refs/heads/marc-json -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] marc in json
I was a strong proponent of NDJ at one point, but I've grown less strident and more weary since then. Brad Baxter has a good overview of some options[1]. I'm assuming it's a given we'd all prefer to work with valid JSON files if the pain-point can be brought down far enough. A couple years have passed since we first talked about this stuff, and the state of JSON pull-parsers is better than it once was: * yajl[2] is a super-fast C library for parsing json and support stream parsing. Bindings for ruby, node, python, and perl are linked to off the home page. I found one PHP binding[3] on github which is broken/abandoned, and no other pull-parser for PHP that I can find. Sadly, the ruby wrapper doesn't actually expose the callbacks necessary for pull-parsing, although there is a pull request[4] and at least one other option[5]. * Perl's JSON::XS supports incremental parsing * the Jackson java library[6] is excellent and has an easy-to-use pull-parser. There are a few simplistic efforts to wrap it for jruby/jython use as well. Pull-parsing is ugly, but no longer astoundingly difficult or slow, with the possible exception of PHP. And output is simple enough. As much as it makes me shudder, I think we're probably better off trying to do pull parsers and have a marc-in-json document be a valid JSON array. We could easily adopt a *convention* of, essentially, one-record-per-line, but wrap it in '[]' to make it valid json. That would allow folks with a pull-parser to write a real streaming reader, and folks without to cheat (ditch the leading and trailing [], and read the rest as one-record-per-line) until such a time as they can start using a more full-featured json parser. 1. http://en.wikipedia.org/wiki/User:Baxter.brad/Drafts/JSON_Document_Streaming_Proposal 2. http://lloyd.github.com/yajl/ 3. https://github.com/sfalvo/php-yajl 4. https://github.com/brianmario/yajl-ruby/pull/50 5. http://dgraham.github.com/json-stream/ 6. http://wiki.fasterxml.com/JacksonHome On Thu, Dec 1, 2011 at 12:56 PM, Michael B. Klein mbkl...@gmail.com wrote: +1 to marc-in-json +1 to newline-delimited records +1 to read support +1 to edsu, rsinger, BillDueber, gmcharlt, and the other module maintainers On Thu, Dec 1, 2011 at 9:31 AM, Keith Jenkins k...@cornell.edu wrote: On Thu, Dec 1, 2011 at 11:56 AM, Gabriel Farrell gsf...@gmail.com wrote: I suspect newline-delimited will win this race. Yes. Everyone please cast a vote for newline-delimited JSON. Is there any consensus on the appropriate mime type for ndj? Keith -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Citation Analysis - like projects for print resources
If I'm understanding you correctly, you're describing citation analysis (sometimes referred to as a part of bibliometrics). It is mostly applied to article data (e.g, the web of science / web of knowledge at ISI) but there are zillions of studies looking at co-citation and co-authorship networks, the long tail of cited works and authors, etc. You can hardly shake a stick at JASIST without hitting two or three of these studies. As you're probably already thinking, getting a hold of the citation information in a machine-readable format is the painful part. Things are made harder by your desire to work with books, since many citation are to individual chapters for edited works, and (of course) books just plain aren't generally available digitally. Article searches (in google scholar or your local academic library) for bibliometrics or citation analysis should get you started on past and future work. On Thu, Nov 17, 2011 at 12:47 PM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: On Nov 17, 2011, at 12:09 PM, Miles Fidelman wrote: Matt Amory wrote: Is anyone involved with, or does anyone know of any project to extract and aggregate bibliography data from individual works to produce some kind of most-cited authors list across a collection? Local/Network/Digital/OCLC or historic? Sorry to be vague, but I'm trying to get my head around whether this is a tired old idea or worth pursuing... Sounds like you're describing citeseer - http://citeseerx.ist.psu.edu/- it's a combination bibliographic and citation index for computer science literature. It includes a good degree of citation analysis. Incredibly useful tool. Another recent project (that I haven't had a chance to play with yet) is Total Impact : http://total-impact.org/about.php It's from some of the folks in altmetrics, who are trying to find better bibliometrics for measuring value: http://altmetrics.org/manifesto/ I don't see a list of what they're scraping I think they're using the publisher's indexes, PubMed and other databases rather than parsing the text themselves ... but the software's available, if you wanted to take a look. Or you could just ask Heather or Jason, they're both approachable and always eager to talk, when I've run into them at meetings. I also seem to remember someone at the DataCite meeting this summer who was involved in a project to parse references in papers ... unfortunately, I don't have that notebook here to check ... but I *think* it was John Kunze. (and I don't think it was part of the person's presentation, but something that I had picked up in the Q/A part) -Joe -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] ISBN Regular Expression
So much duplication. If only there were some sort of organization that might serve as a clearinghouse for this sort of code that's useful to libraries... [Yes, I know the only appropriate response is, Well, Dueber, step up and do something about it. ] On Mon, Oct 24, 2011 at 4:59 PM, Jon Gorman jonathan.gor...@gmail.comwrote: Also, I don't know OpenBook to know your source data, but don't forget a lot of publishers have printed ISBNs in different ways over the past few years. The regex would choke on any hyphens. If users are copying from printed material, they could type them in. For example, one of the books near my desk has the ISBN printed like 0-521-61678-6 if this is user input and nothing is striping characters like that out, it could cause problems. (I think I've also seen spaces used instead of hyphens, but less positive about this). Jon Gorman On Mon, Oct 24, 2011 at 9:44 AM, Jonathan Rochkind rochk...@jhu.edu wrote: John: That's not going to work, an ISBN can end in X as a check digit, which is not [0-9]. You are going to be rejecting valid ISBN's, you have a bug. On 10/24/2011 10:40 AM, John Miedema wrote: Here's a php function I use in OpenBook to test if a user has entered a 10 or 13 digit ISBN. //test if 10 or 13 digits ISBN function openbook_utilities_validISBN($testisbn) { return (ereg (([0-9]{10}), $testisbn, $regs) || ereg (([0-9]{13}), $testisbn, $regs)); } On Fri, Oct 21, 2011 at 1:44 PM, Kozlowski,Brendonbkozlow...@sals.eduwrote: Hi all. I'm somewhat surprised that I've never had to validate an ISBN manually up until now. I suppose that's a testiment to all of the software out there. However, I now find that I need to validate both the 10-digit and 13-digit ISBNs. I realize there's also a check digit and a REGEX cannot check this value - one step at a time. Right now I just want to work on the REGEX. Does anyone know the exact specifications of both forms of an ISBN? The ISBN organization's website didn't seem to be overly clear to me. Alternatively, if anyone has a full working regular expression for this purpose I would definitely not mind if they'd be willing to share. The only thing I'm doing which is abnormal is that I am not requiring the hyphenation or spaces between numbers since some of this data will be coming from a system, and some will be coming from human input. Brendon Kozlowski Web Administrator Saratoga Springs Public Library 49 Henry Street Saratoga Springs, NY, 12866 [518] 584-7860 x217 Please consider the environment before printing this message. To report this message as spam, offensive, or if you feel you have received this in error, please send e-mail to ab...@sals.edu including the entire contents and subject of the message. It will be reviewed by staff and acted upon appropriately. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Examples of Web Service APIs in Academic Public Libraries
The HathiTrust BibAPI and DataAPIs are being used by several on this list (and by me behind the scenes on occasion, although I sometimes cheat because the data are local). Based on our logs, the most common use is to use the BibAPI to check HT availability of an item already in someone's local catalog. http://www.hathitrust.org/data On Sat, Oct 8, 2011 at 1:33 PM, Michel, Jason Paul miche...@muohio.eduwrote: Hello all, I'm a lurker on this listserv and am interested in gaining some insight into your experiences of utilizing web service APIs in either an academic library or public library setting. I'm writing a book for ALA Editions on the use of Web Service APIs in libraries. Each chapter covers a specific API by delineating the technicalities of the API, discussing potential uses of the API in library settings, and step-by-step tutorials. I'm already including examples of how my library (Miami University in Oxford, Ohio) are utilizing these APIs but would like to give the reader more examples from a variety of settings. APIs covered in the book: Flickr, Vimeo, Google Charts, Twitter, Open Library, LibraryThing, Goodreads, OCLC. So, what are you folks doing with APIs? Thanks for any insight! Kind regards, Jason -- Jason Paul Michel User Experience Librarian Miami University Libraries Oxford, Ohio 45044 twitter:jpmichel -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Advice on a class
was hiring a digital *librarian*, I'd also expect them to know Javascript, the language at the heart of the EPUB format. But Javascript is kind of tricky; it's a subtle powerful language with bad syntax and weak libraries. I certainly wouldn't recommend it to start with. Cary Gordon listu...@chillco.com wrote: There are still plenty of opportunities for Cobol coders, but I wouldn't recommend that either. Java is the COBOL of the 21st century, so if you know Java well, there will be a job in that for the next 20-30 years, I'd expect. Until the Singularity happens, anyway. I'd think there will always be lots of enterprise Java jobs around. Bill -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] stemming in author search?
We had stemming on for authors at first (maybe was the VUFind default way back when?) and turned it off as soon as we noticed. The initial complaint was that searching on Rowles gave records for Rowling. and of course it's not hard to find other examples, esp. with the -ing suffix. On Mon, Jun 13, 2011 at 8:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In a Solr-based search, stemming is done at indexing time, into fields with stemmed tokens. It seems typical in library-catalog type applications based on Solr to have the default (or even only) searches be over these stemmed fields, thus 'auto-stemming' to the user. (Search for 'monkey', find 'monkeys' too, and vice versa). I am curious how many people, who have Solr based catalogs (that is, I'm interested in people who have search engines with majority or only content originally from MARC), use such stemmed fields ('auto-stemming') over their _author_ fields as well. There are pro's and con's to this. There are certainly some things in an author field that would benefit from stemming (mostly various kinds of corporate authors, some of whose endings end up looking like english language phrases). There are also very many things in an author field that would not benefit from stemming, and thus when stemming is done it sometimes(/often?) results in false matches, pluralizing an author's last name in an inappropriate way for instance. So, wanna say on the list, if you are using a Solr-based catalog, are you using stemmed fields for your author searches? Curious what people end up doing. If there are any other more complicated clever things you've done than just stem-or-not, let us know that too! Jonathan -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Seth Godin on The future of the library
My short answer: It's too damn expensive to check out everything that's available for free to see if it's worth selecting for inclusion, and library's (at least as I see them) are supposed to be curated, not comprehensive. My long answer: The most obvious issue is that the OPAC is traditionally a listing of holdings, and free ebooks aren't held in any sense that helps disambiguate them from any other random text on the Internet. Certainly the fact that someone bothered to transform it into ebook form isn't indicative of anything. Not everything that's available can be cataloged. I see stuff we paid for not as an arbitrary bias, but simply as a very, very useful way to define the borders of the library. Free is a very recent phenomenon, but it just adds more complexity to the existing problem of deciding what publications are within the library's scope. Library collections are curated, and that curation mission is not simply a side effect of limited funds. The filtering process that goes into deciding what a library will hold is itself an incredibly valuable aspect of the collection. Up until very recently, the most important pre-purchase filter was the fact that some publisher thought she could make some money by printing text on paper, and by doing so also allocated resources to edit/typeset/etc. For a traditionally-published work, we know that real person(s), with relatively transparent goals, has already read it and decided it was worth the gamble to sink some fixed costs into the project. It certainly wasn't a perfect filter, but anyone who claims it didn't add enormous information to the system is being disingenuous. Now that (e)publishing and (e)printing costs have nosedived toward $0.00, that filter is breaking. Even print-on-paper costs have been reduced enormously. But going through the slush pile, doing market research, filtering, editing, marketing -- these things all cost money, and for the moment the traditional publishing houses still do them better and more efficiently than anyone else. And they expect to be paid for their work, and they should. There's a tendency in the library world, I think, to dismiss the value of non-academic professionals and assume random people or librarians can just do the work (see also: web-site development, usability studies, graphic design, instructional design and development), but successful publishers are incredibly good at what they do, and the value they add shouldn't be dismissed (although their business practices should certainly be under scrutiny). Of course, I'm not differentiating free (no money) and free (CC0). One can imagine models where the functions of the publishing house move to a work-for-hire model and the final content is released CC0, but it's not clear who's going to pay them for their time. -Bill- On Thu, May 19, 2011 at 8:04 AM, Andreas Orphanides andreas_orphani...@ncsu.edu wrote: On 5/19/2011 7:36 AM, Mike Taylor wrote: I dunno. How do you assess the whole realm of proprietary stuff? Wouldn't the same approach work for free stuff? -- Mike. A fair question. I think there's maybe at least two parts: marketing and bundling. Marketing is of course not ideal, and likely counterproductive on a number of measures, but at least when a product is marketed you get sales demos. Even if they are designed to make a product or collection look as good as possible, it still gives you some sense of scale, quality, content, etc. I think bundling is probably more important. It's a challenge in the free-stuff realm, but for open access products where there is bundling (for instance, Directory of Open Access Journals) I think you are likely to see wider adoption. Bundling can of course be both good (lower management cost) and bad (potentially diluting collection quality for your target audience). But when there isn't any bundling, which is true for a whole lot of free stuff, you've got to locally gather a million little bits into a collection. I guess what's really happening in the bundling case, at least for free content, is that collection and quality management activities are being outsourced to a third party. This is probably why DOAJ gets decent adoption. But of course, this still requires SOME group to be willing to perform these activities, and for the content/package to remain free, they either have to get some kind of outside funding (e.g., donations) or be willing to volunteer their services. -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Changes coming to the Library::CallNumber::LC perl module
...and that change could be YOU! First thing: I'm abandoning this module; I never use it. If you want to adopt it, lemme know. It's free! Second thing: whomever picks it up might want to consider two major changes. 1. For some reason, I'm only allowing two decimal places in the initial number (e.g, A123.456 is invalid). The comments in the code indicate there might have been a good reason at one point. Heck, I'm sure there was. I just don't remember it. And there are plenty of call numbers with three digits there, esp. in the QAs. And the code I actually use now doesn't enforce that restriction and the sky hasn't fallen, so it should probably go. 2. The output format, which seemed smart at the time, is dumb. A123expands to A 123. Which means you have to url-escape the spaces, and muck with your search query so it doesn't look like two words, and that (in solr, at least) you can't do a wildcard query (in solr, A 123* isn't valid syntax). What I do in the java code is to use an @ sign instead, e.g. A@@123. This makes things easier. The second is obviously a backwards-incompatible change which warrants some discussion. But none of this matters until someone steps up and adopts it. Code is at https://library-callnumber-lc.googlecode.com/ (a move to GitHub might make sense, too) -- step right up and take your chances! -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] What do you wish you had time to learn?
I've thought for a while that libraries would be significantly better places if there was always a big brisket near the reference desk that people could just carve a slice off of and a giant pot of curry in the basement. On Thu, Apr 28, 2011 at 8:55 AM, Andreas Orphanides andreas_orphani...@ncsu.edu wrote: Ranti, I think the call is clear: we need to start a group called Food4Lib. Who's with me?! Ranti Junus ranti.ju...@gmail.com 4/27/2011 11:39 PM On Wed, Apr 27, 2011 at 12:57 PM, Bohyun Kim k...@fiu.edu wrote: Seems that we can use a class in cooking in addition to guitar playing at the next conference : ) Hey, there's a Cooking for Geek authored by Jeff Potter. [1] Perhaps we should invite him to do a workshop and raffle the books. ranti. [1] http://www.cookingforgeeks.com -- Bulk mail. Postage paid. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] What do you wish you had time to learn?
play the guitar real statistics (not have t-test, will travel!) cook a really good roast graph theory map/reduce Hebrew some machine learning (esp. wrt parsing) On Tue, Apr 26, 2011 at 4:15 PM, Ross Singer rossfsin...@gmail.com wrote: map/reduce coffeescript, node.js, other server side javascripts XSLT How to not make a not-completely-hideous-looking web app. -Ross. On Tue, Apr 26, 2011 at 8:30 AM, Edward Iglesias edwardigles...@gmail.com wrote: Hello All, I am doing a presentation at RILA (Rhode Island Library Association) on changing skill sets for Systems Librarians. I did a formal survey a while back (if you participated, thank you) but this stuff changes so quickly I thought I would ask this another way. What do you wish you had time to learn? My list includes CouchDB(NoSQL in general) neo4j nodejs prototype API Mashups R Don't be afraid to include Latin or Greek History. I'm just going for a snapshot of System angst at not knowing everything. Thanks, ~ Edward Iglesias Systems Librarian Central Connecticut State University -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 10:10 AM, Ross Singer rossfsin...@gmail.com wrote: But, yeah, it would be worth running your ideas by a few catalogers to see what they think. And if anyone does this...please please *please* write it up! -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 1:50 PM, Shirley Lincicum shirley.linci...@gmail.com wrote: Ross is essentially correct. Education is an authorized subject term that can be subdivided geographically. Finance is a free-floating subdivision that is authorized for use under subject terms that conform to parameters given in the scope notes in its authority record (680 fields), but it cannot be subdivided geographically. England is an authorized geographic subject term that can be added to any heading that can be subdivided geographically. Wait, so is it possible to know if England means the free-floating geographic entity or the country? Or is that just plain unknowable. Suddenly, my mouth is hungering for something gun-flavored. I know OCLC did some work trying to dis-integrate different types of terms with the FAST stuff, but it's not clear to me how I can leverage that (or anything else) to make LCSH at all useful as a search target or (even better) facet. Has anyone done anything with it?
Re: [CODE4LIB] LCSH and Linked Data
2011/4/8 Karen Miller k-mill...@northwestern.edu I hope I'm not pointing out the obvious, That made me laugh so hard I almost ruptured something. Thank you so much for such a complete (please, god, tell me it's complete...) explanation. It's a little depressing, but at least now I now why I'm depressed :-) -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?
On Wed, Mar 23, 2011 at 10:44 AM, Cary Gordon listu...@chillco.com wrote: You can probably find an curious intern to do it. Oh, for the love of god, please don't go this route. This is why libraries tend to be a huge mishmash of unsupported, one-off crap that some outgoing student did for extra credit six years ago. To ask the obvious question: You're at a real, honest-to-god prestigious college. Why are you trolling code4lib for cheap hosting environments? If IT won't give you a piece of a machine somewhere, or at least set up a Mac running OSX, they're failing to support a critical mission of the college and someone needs to be up in arms about it. If you haven't even asked them, well, maybe you should. -Bill, who spent his first two years in a library dealing with crappy old PHP code from long-gone students -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?
On Wed, Mar 23, 2011 at 11:19 AM, Mark A. Matienzo m...@matienzo.orgwrote: You're definitely welcome here, and I don't think Bill's response was to suggest that you weren't. Not even a little! :-) I was mostly responding to a perception that, for many in the code4lib community, Central IT is a bogeyman to be avoided/deferred to at all costs. Those of us in libraries tend to self-select as the sort of folks that will find a way to get something done, no matter what. I think the profession would benefit from more of us saying, Well, OK. Then that's not going to get done. Go explain it to the dean. Kudos to you for doing stuff on your own time (and your own dime, no less). And please don't let my little rant scare you off. Turning good, wholesome librarians into...er...whatever it is that most of us here are...is what we do best :-) -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] stats for the conference video?
Cha: 16 You must've been watching a different crowd than the rest of us :-) On Thu, Feb 17, 2011 at 8:38 PM, Simon Spero s...@unc.edu wrote: Str: 11 Dex: 3 Con: 8 Int: 16: Wis: 18 Cha: 16 -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Bad numbers in my lightning talk (e.g. 45% of sessions have one action: search)
Basically, I failed to exclude a whole swath of activity I should have ignored. An explanation, the new data, and an excellent link to a corroborating paper by our usability group, is at: http://robotlibrarian.billdueber.com/corrected-code4lib-slides-are-up/ My sincere apologies to everyone. I'm trying to do due-diligence, but anyone passed a copy of my slides to anyone, please make sure they get the better numbers. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] VuFind Beyond MARC slides
Ditto lightning talks? Should we attach slides to the appropriate page (e.g., http://code4lib.org/conference/2011/lightning)? Maybe pull in content from the Wiki to reflect what actual lightning talks happened? On Fri, Feb 11, 2011 at 2:16 PM, Ryan Wick ryanw...@gmail.com wrote: Thanks for posting these. There are already pages on code4lib.org (not the wiki), linked off the schedule, for individual talks. We'd like to host the slides on code4lib.org if possible, but with links at a minimum. Similar with lightning talks. If slides or other links are already on the wiki, I'll try and get them moved over to the (slightly more 'official') code4lib.org pages. If you are able to upload and add your slides to http://code4lib.org/conference/2011/katz please do so. Let me know if you have any questions. Thanks. Ryan Wick On Fri, Feb 11, 2011 at 8:35 AM, Demian Katz demian.k...@villanova.edu wrote: On a similar note, I've posted the slides for my VuFind talk here: http://vufind.org/docs/beyond_marc.ppt Is there someplace in the Wiki where all of the slides are being collected? I assume it would be better if we were all listing these in a central location rather than posting dozens of messages on the mailing list... but I couldn't find an obvious spot to put the link! thanks, Demian -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Rick Johnson Sent: Friday, February 11, 2011 10:42 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Notre Dame Hydra Digital Exhibit Slides Available Thanks for a great conference everyone! Our slides for our presentation on our Hydra Digital Exhibit Plugin as well as the screencast demo are now posted online: http://code4lib.library.nd.edu It seemed like there was interest in the IRC channel for reuse of our plugin. Again the code can be found here: https://github.com/ndliblis/hydra_exhibit I am also extremely interested to here how many places would be interested in a Blacklight only version of the plugin. Also, the main Hydra branch of code we extended can be found here as a baseline: https://github.com/projecthydra/hydrangea Matt Zumwalt also mentioned in the Hydra breakout that a good place to look at the moment for active projects using Hydra can be found on Duraspace's JIRA instance listed under Hydra Software: https://jira.duraspace.org/secure/BrowseProjects.jspa#all Finally, a more formal web presence will be available soon in the coming weeks including more detailed instructions on how to download, install, and try out existing Hydra heads. Thanks! Rick -- -- Rick Johnson Unit Manager, Digital Library Applications and Local Programming Unit Library Information Systems University of Notre Dame Michiana Academic Library Consortium Notre Dame, IN USA 46556 http://www.library.nd.edu 574-631-1086 -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Reminder: Newcomer dinner and Ribbons
Yep, that's me. Meet on the Mezzanine level near the comfy chairs at 5:30. My shirt features a robot riding a dinosaur. On Mon, Feb 7, 2011 at 3:41 PM, Jakub Skoczen ja...@indexdata.dk wrote: To the group that signed up for the Anyetsang's Little Tibet: I heard from Dot that he's not leading anymore, is anyone else going to take over his place or should we regroup? On Mon, Feb 7, 2011 at 8:03 PM, Richard, Joel M richar...@si.edu wrote: Roberto, I chose to meet outside of the Walnut conference room in order to not contribute to a large number of people in the Lobby. I know it's a bit out of the way, but that just means we'll be easier to find. I'll have a sign with large words to make it easy to find me. --Joel On Feb 7, 2011, at 2:52 PM, Roberto Hoyle roberto.j.ho...@dartmouth.edu wrote: On Feb 2, 2011, at 11:11 AM, Richard, Joel M wrote: Just a general question, how are team leaders contacting their attendees? I have no one's email addresses, so for Crazy Horse, I've put mine in the Wiki. FYI, I'm one of the ones who signed up for the Crazy Horse. I assume we'll meet in the lobby at 6? r. -- Cheers, Jakub -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Reminder: Newcomer dinner and Ribbons
Ooops. OK. I'll be there at 5:30, but we won't be leaving until everyone shows up. On Mon, Feb 7, 2011 at 4:52 PM, Birkin James Diana birkin_di...@brown.eduwrote: Bill, I recently signed up for this dinner-trek. 5:30 is fine with me, but just an fyi that the guidelines said 6ish, so I'm concerned others might others be planning to show up then -- or maybe y'all have been in touch along the way. Regardless, I'll be there at 5:30. -Birkin --- Birkin James Diana Programmer, Digital Technologies Brown University Library birkin_di...@brown.edu On Feb 7, 2011, at 3:51 PM, Bill Dueber wrote: Yep, that's me. Meet on the Mezzanine level near the comfy chairs at 5:30. My shirt features a robot riding a dinosaur. On Mon, Feb 7, 2011 at 3:41 PM, Jakub Skoczen ja...@indexdata.dk wrote: To the group that signed up for the Anyetsang's Little Tibet: I heard from Dot that he's not leading anymore, is anyone else going to take over his place or should we regroup? On Mon, Feb 7, 2011 at 8:03 PM, Richard, Joel M richar...@si.edu wrote: Roberto, I chose to meet outside of the Walnut conference room in order to not contribute to a large number of people in the Lobby. I know it's a bit out of the way, but that just means we'll be easier to find. I'll have a sign with large words to make it easy to find me. --Joel On Feb 7, 2011, at 2:52 PM, Roberto Hoyle roberto.j.ho...@dartmouth.edu wrote: On Feb 2, 2011, at 11:11 AM, Richard, Joel M wrote: Just a general question, how are team leaders contacting their attendees? I have no one's email addresses, so for Crazy Horse, I've put mine in the Wiki. FYI, I'm one of the ones who signed up for the Crazy Horse. I assume we'll meet in the lobby at 6? r. -- Cheers, Jakub -- Bill Dueber Library Systems Programmer University of Michigan Library -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A/B Testing Catalogs and Such
I've proposed A/B testing for our OPAC. I managed to avoid the torches, but the pitchforks...youch! On Wed, Jan 26, 2011 at 5:55 PM, Sean Moore thedreadpirates...@gmail.comwrote: There's a lot of resistance in my institution to A/B or multivariate testing any of our live production properties (catalog, website, etc...). I've espoused the virtues of having hard data to back up user activity (if I hear one more well, in my opinion, I'll just go blind), but the reply is always along the lines of, But it will confuse users! I've pointed out the myriad successful and critical business that use these methodologies, but was told that businesses and academia are different. So, my question to you is, which of you academic libraries are using A/B testing; on what potion of your web properties (catalog, discovery interface, website, etc...); and I suppose to spark conversation, which testing suite are you using (Google Website Optimizer, Visual Website Optimizer, a home-rolled non-hosted solution)? I was told if I can prove it's a commonly accepted practice, I can move forward. So help a guy out, and save me from having to read another survey of 12 undergrads that is proof positive of changes I need to make. Thanks! *Sean Moore* Web Application Programmer *Phone*: (504) 314-7784 *Email*: cmoo...@tulane.edu Howard-Tilton Memorial Library http://library.tulane.edu, Tulane University -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] code4lib 2011 Update
Right. The key is to make sure the N band has its own SSID. Mac Laptops, at least, will always glom onto the strongest signal, so if you're broadcasting on G and N with the same name, most of the time the laptop will grab the G because the signals go through walls better. If we can just choose, e.g., Code4Lib2011 N, that problem goes away. On Tue, Jan 18, 2011 at 12:46 PM, Richard, Joel M richar...@si.edu wrote: I think you missed a critical part of that message, Jonathan. (which I didn't write, BTW) it does not mean that you have to have one... Robert is saying that 802.11n is recommended and you'll have a better experience with it. It is not a requirement. Besides, I believe any router that supports the n standards is also backwards compatible to prior standards. --Joel Joel Richard IT Specialist, Web Services Department Smithsonian Institution Libraries | http://www.sil.si.edu/ (202) 633-1706 | (202) 786-2861 (f) | richar...@si.edu On Jan 18, 2011, at 11:15 AM, Jonathan Rochkind wrote: On 1/18/2011 9:05 AM, Richard, Joel M wrote: Our central wireless group has recommended that if everyone has an 802.11n card (5Ghz radio spectrum) in their device that they will likely have a much better experience for connectivity – it does not mean that you have to have one it will just be better download speeds etc. There is ABSOLUTELY no way to guarantee that 100% of 200 conference attendees will have 802.11n cards in their devices. I suspect the vast majority of us will bring the devices we have, and not upgrade our devices just for the conf. I would suggest you make sure IT is assuming that NOT everyone will have 802.11n -- there's no way that's going to happen. Jonathan -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Which O'Reilly books should we give away at Code4Lib 2011?
While both are document stores, there are some major differences in their data model, most notably that mongoDB uses an update-replaces mechanism, while CouchDB allows you to access any version of a document, which brings with it issues of transaction overlaps (who wins?) and having to periodically compact your database. CouchDB uses a REST interface for all interaction; mongo has programming language-specific drivers (although there are also REST interfaces available), which in many cases can increase performance. Their querying approaches are differnet. Mongo is more akin to a define an index and use it when possible at query time. CouchDB is more of a Define a view beforehand and use that view. Oops. I just found a better overview than I can provide, at http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB There are lots of other players in this space, too -- see http://nosql-database.org/ - On Tue, Dec 14, 2010 at 9:12 AM, Thomas Dowling tdowl...@ohiolink.eduwrote: On 12/14/2010 07:58 AM, Luciano Ramalho wrote: I believe CouchDB will take the library world by storm, and the sooner the better. A document database is what we need for many of our applications. CouchDB, with its 100% RESTful API is a highly productive web-services platform with a document oriented data model and built-in peer-to-peer replication. In short, it does very well lots of things we need done. Amen. Does anyone have helpful things to say about choosing between CouchDB and MongoDB? Thomas Dowling tdowl...@ohiolink.edu -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] PHP vs. Python [was: Re: Django]
On Fri, Oct 29, 2010 at 6:28 PM, Peter Schlumpf pschlu...@earthlink.netwrote: What's wrong with the library world developing its own domain language? EVERYTHING!!! We're already in a world of pain because we have our own data formats and ways of dealing with them, all of which have basically stood idle while 30 years of advances computer science and information architecture have whizzed by us with a giant WHOOSHing sound. Having a bunch of non-experts design and implement a language that's destined from the outset to be stuck in a tiny little ghetto of the programming world is a guaranteed way to live with half- or un-supported code, no decent libraries, and yet another legacy of pain we'd have to support. I'm not picking on programming in particular. It's a dumb-ass move EVERY time a library is presented with a problem for which there are experts and decades of research literature, and it choses to ignore all of that and decide to throw a committee of librarians (or whomever else happens to be in the building at the time) at it based on the vague idea that librarians are just that much smarter (or cheaper) than everyone else (I'm looking at you, usability...) -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
I know there are two parts of this discussion (speed on the one hand, applicability/features on teh other), but for the former, running a little benchmark just isn't that hard. Aren't we supposed to, you know, prefer to make decisions based on data? Note: I'm only testing deserialization because there's isn't, as of now, a fast serialization option for ruby-marc. It uses REXML, and it's dog-slow. I already looked marc-in-json vs marc binary at http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/ Benchmark Source: http://gist.github.com/645683 18,883 records as either an XML collection or newline-delimited json. Open the file, read every record, pull out a title. Repeat 5 times for a total of 94,415 records (i.e., just under 100K records total). Under ruby-marc, using the libxml deserializer is the fastest option. If you're using the REXML parser, well, god help us all. ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin9.8.0]. User time reported in seconds. xml w/libxml 227 seconds marc-in-json w/yajl 130 seconds Soquite a bit faster (more than 40%). For a million records (assuming I can just say 10*these_values) you're talking about a difference of 16 minutes due to just reading speed. Assuming, of course, you're running your code on my desktop. Today. For the 8M records I have to deal with, that'd be roughly 8M * ((227-130) / 94,415) = 7806 seconds, or about 130 minutes. S...a lot. Of course, if you're using a slower XML library or a slower JSON library, your numbers will vary quite a bit. REXML is unforgivingly slow, and json/pure (and even 'json') are quite a bit slower than yajl. And don't forget that you need to serialize these things from your source somehow... -Bill- On Mon, Oct 25, 2010 at 4:23 PM, Stephen Meyer sme...@library.wisc.eduwrote: Kyle Banerjee wrote: On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote: Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Data migrations and data dumps are a common use case. Needing to break or make hundreds of thousands or millions of records is not uncommon. kyle To make this concrete, we processes the MARC records from 14 separate ILS's throughout the University of Wisconsin System. We extract, sort on OCLC number, dedup and merge pieces from any campus that has a record for the work. The MARC that we then index and display here http://forward.library.wisconsin.edu/catalog/ocm37443537?school_code=WU is not identical to the version of the MARC record from any of the 4 schools that hold it. We extract 13 million records and dedup down to 8 million every week. Speed is paramount. -sm -- Stephen Meyer Library Application Developer UW-Madison Libraries 436 Memorial Library 728 State St. Madison, WI 53706 sme...@library.wisc.edu 608-265-2844 (ph) Just don't let the human factor fail to be a factor at all. - Andrew Bird, Tables and Chairs -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 9:32 PM, Alexander Johannesen alexander.johanne...@gmail.com wrote: Lots of people around the library world infra-structure will think that since your data is now in XML it has taken some important step towards being inter-operable with the rest of the world, that library data now is part of the real world in *any* meaningful way, but this is simply demonstrably deceivingly not true. Here, I think you're guilty of radically underestimating lots of people around the library world. No one thinks MARC is a good solution to our modern problems, and no one who actually knows what MARC is has trouble understanding MARC-XML as an XML serialization of the same old data -- certainly not anyone capable of meaningful contribution to work on an alternative. You seem to presuppose that there's an enormous pent-up energy poised to sweep in changes to an obviously-better data format, and that the existence of MARC-XML somehow defuses all that energy. The truth is that a high percentage of people that work with MARC data actively think about (or curse) things that are wrong with it and gobs and gobs of ridiculously-smart people work on a variety of alternate solutions (not the least of which is RDA) and get their organizations to spend significant money to do so. The problem we're dealing with is *hard*. Mind-numbingly hard. The library world has several generations of infrastructure built around MARC (by which I mean AACR2), and devising data structures and standards that are a big enough improvement over MARC to warrant replacing all that infrastructure is an engineering and political nightmare. I'm happy to take potshots at the RDA stuff from the sidelines, but I never forget that I'm on the sidelines, and that the people active in the game are among the best and brightest we have to offer, working on a problem that invariably seems more intractable the deeper in you go. If you think MARC-XML is some sort of an actual problem, and that people just need to be shouted at to realize that and do something about it, then, well, I think you're just plain wrong. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen alexander.johanne...@gmail.com wrote: Political? For sure. Engineering? Not so much. Ok. Solve it. Let us know when you're done. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
Sorry. That was rude, and uncalled for. I disagree that the problem is easily solved, even without the politics. There've been lots of attempts to try to come up with a sufficiently expressive toolset for dealing with biblio data, and we're still working on it. If you do think you've got some insight, I'm sure we're all ears, but try to frame it terms of the existing work if you can (RDA, some of the dublin core stuff, etc.) so we have a frame of reference. On Mon, Oct 25, 2010 at 10:18 PM, Bill Dueber b...@dueber.com wrote: On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen alexander.johanne...@gmail.com wrote: Political? For sure. Engineering? Not so much. Ok. Solve it. Let us know when you're done. -- Bill Dueber Library Systems Programmer University of Michigan Library -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] membership recommendations
Make sure to include a line: Code4Lib...$0.00 On Thu, Aug 26, 2010 at 12:56 PM, Adam Wead aw...@rockhall.org wrote: Hi all, I'm budgeting for membership dues and am seeking suggestions for professional organizations that are good to have. As a digital/systems librarian working with music and video in an archive, there are lots to choose from! I'm hoping to chose a couple that cover most of the bases. Thanks in advance for the recommendations. ...adam http://rockhall.com/event/rock-hall-ball/ Join us on Friday, September 3, at the http://rockhall.com/event/rock-hall-ball/ 15th Anniversary Celebration at the Rock and Roll Hall of Fame and Museum. http://rockhall.com/event/rock-hall-ball/! The latest act: Eli Paperboy Reed Rock Roll: (noun) African American slang dating back to the early 20th Century. In the early 1950s, the term came to be used to describe a new form of music, steeped in the blues, rhythm blues, country and gospel. Today, it refers to a wide variety of popular music -- frequently music with an edge and attitude, music with a good beat and --- often --- loud guitars.© 2005 Rock and Roll Hall of Fame and Museum. This communication is a confidential and proprietary business communication. It is intended solely for the use of the designated recipient(s). If this communication is received in error, please contact the sender and delete this communication. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MODS and DCTERMS
On Mon, May 3, 2010 at 2:40 PM, MJ Suhonos m...@suhonos.ca wrote: Yes, even to me as a librarian but not a cataloguer, many (most?) of these elements seem like overkill. I have no doubt there is an edge-case for having this fine level of descriptive detail, but I wonder: a) what proportion of records have this level of description b) what kind of (or how much) user access justifies the effort in creating and preserving it On many levels, I agree. Or I wish I could. If you look at a business model like Amazon, for example, it's easy to imagine that their overriding goal is, Make the easy-to-find stuff ridiculously easy to find. The revenue they get from someone finding an edge-case book is exactly the same as the revenue they get from someone buying Harry Potter. The ROI easy to think about. But I work in an academic library. In a lot of ways, our *primary audience* is some grad student 12 years from now who needs one trivial piece of crap to make it all come together in her head. I know we have thousands of books that have never been looked at, but computing the ROI on someone being able to see them some day is difficult. Maybe it's zero. Maybe not. We just can't tell. Now, none of this is to say that MARC/AACR2 is necessarily the best (or even a good) way to go about making these works findable. I'm just saying that evaluating the edge cases in terms of user access are a complicated business. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)
On Mon, May 3, 2010 at 7:10 PM, Bryan Baldus bryan.bal...@quality-books.com wrote: I can't speak for other users (particularly the generic patron user type), but as a cataloger/librarian user, ...and THERE IT IS, ladies and gentlemen. I've started trying to keep a list of IP addresses I *know* are staff and separate out the statistics. The OPAC isn't for the librarians; the ILS client is. If the client sucks so badly that librarians need the OPAC to do our job (as I was told several times during our roll out of vufind), then the solution is to fix the client, or (alternately) build up a workaround for staff. NOT to overload the OPAC. If librarians need specialized tools, let's just build them without some sort of pretense that they're anything but the tiniest blip on the bell curve of patrons. And, BTW, just because you (and you know who you are!) do 8 hours of reference desk work a week doesn't mean you have a hell of a lot more insight. The patrons that self-select to actually speak to a librarian sitting *in the library* are a freakshow themselves, statistically speaking. [Not meaning to imply that Bryan doesn't know the difference between himself and a normal patron; his post makes it clear that he does. I just took the opportunity to rant.] I'm not saying that patrons don't use browse much (that's what I'm trying to determine). But, to borrow from the 2009 code4lib conference, every time a librarian's work habits inform the design of a public-facing application, God kills a kitten. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
On Mon, May 3, 2010 at 6:34 PM, Karen Coyle li...@kcoyle.net wrote: Quoting Jakob Voss jakob.v...@gbv.de: I bet there are several reasons why OpenURL failed in some way but I think one reason is that SFX got sold to Ex Libris. Afterwards there was no interest of Ex Libris to get a simple clean standard and most libraries ended up in buying a black box with an OpenURL label on it - instead of developing they own systems based on a common standard. I bet you can track most bad library standards to commercial vendors. I don't trust any standard without open specification and a reusable Open Source reference implementation. For what it's worth, that does not coincide with my experience. I'm going to turn this back on Karen and say that much of my pain does come from vendors, but it comes from their shitty data. OpenURL and resolvers would be a much more valuable piece of technology if the vendors would/could get off their collective asses(1) and give us better data. -Bill- (1) By this, of course, I mean if the librarians would grow a pair and demand better data via our contracts -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)
On Mon, May 3, 2010 at 8:39 PM, Jonathan Rochkind rochk...@jhu.edu wrote: So, Bill, you're still not certain yourself exactly what purposes browse is used for by actual non-librarian searchers, if anything? Right. I'm not sure *the extent* to which it's used (data which are necessarily going to be messy and partially driven by how prevalent browse vs search are in the interface), and I certainly don't know what's going through people's heads when they choose to use it (on those occasions when they make a conscious choice to use browse in addition to/instead of search). My attempts to find stuff in the research literature failed me; if anyone has other pointers, I'd love to read them! (If only there was a real librarian around to help poor little me...) -Bill-
Re: [CODE4LIB] ILS short list
On Thu, Apr 8, 2010 at 2:32 PM, Ryan Eby ryan...@gmail.com wrote: Unicorn * Export Built in. MARC21 or flat file formats. Unicode support is available as an extra. ...as an extra??? This is the saddest thing I've ready all day. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Zotero, unapi, and formats?
The unAPI support is also...non-ideal...in that you can't present preferences for the best format to use. For example, the Refworks Tagged format just plain has more tags (and hence more or more-finely-grained information) than other formats (e.g., Endnote), but Zotero will prefer Endnote just because it does. My RIS output is better than my endnote output, but there's no way for me to tell Zotero that. For Mirlyn I ended up just having exactly one format listed in my unapi-server file. Which is dumb. But I'm not sure what else to do. On Tue, Apr 6, 2010 at 10:16 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Yeah, we need some actual documentation on Zotero's use of unAPI in general. Maybe if I can figure it out (perhaps by asking the developer(s)) I'll write some for them. Robert Forkel wrote: well, looks like a combination: in case of mods it checks for the namespace URL, in case of rdf, it looks for a format name of rdf_dc, ... and yes, endnote export would have to have a name of endnote (i ran into this problem as well with names like endnote-utf-8, ...). i think unapi would be more usable if there were at least a recommendation of common format names. On Tue, Apr 6, 2010 at 4:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Wait, does it actually recognize the format by the format _name_ used, and not by a mime content-type? Like unless my unAPI server calls the endnote format endnote, it won't recognize it? That would be odd, and good to know. I thought the unAPI format names were purely arbitrary, but recognized by their association with a mime content-type like application/x- /endnote/-refer. But no, at least as far as Zotero is concerned, you have to pick format shortnames that match what Zotero expects? Robert Forkel wrote: from looking at line 14 here https://www.zotero.org/trac/browser/extension/trunk/translators/unAPI.js i'd say: ad 1. RECOGNIZABLE_FORMATS = [mods, marc, endnote, ris, bibtex, rdf] also see function checkFormats ad 2. the order listed above ad 4.: from my experience the unapi scraper takes precedence over coins On Tue, Apr 6, 2010 at 3:48 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Anyone know if there's any developer documentation for Zotero on it's use of unAPI? Alternately, anyone know where I can find the answers to these questions, or know the answers to these questions themselves? 1. What formats will Zotero use via unAPI. What mime content-types does it use to recognize those formats (sometimes a format has several in use, or no official content-type). 2. What is Zotero's order of preference when multiple formats via unAPI are available? 3. Will Zotero get confused if different documents on the page have different formats available? This can be described with unAPI, but it seems atypical, so not sure if it will confuse Zotero. 4. If both unAPI and COinS are on a given page -- will Zotero use both (resulting in possible double-import for citations exposed both ways). Or only one? Or depends on how you set up the HTML? 5. Somewhere that now I can't find I saw a mention of a Zotero RDF format that Zotero would consume via unAPI. Is there any documentation of this format/vocabulary, how can I find out how to write it? -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] planet code4lib code (was: newbie)
I know some systems (I'm thinking of CPAN and Gemcutter in particular) have feeds of new releases -- maybe we could tap into those and note when registered projects have new releases? I don't know if that's fine-grained enough information for what folks want. On Sun, Mar 28, 2010 at 6:44 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Good point Aaron. Maybe that's possible, but I'm not seeing exactly what the interface would look like. Without worrying about how to implement it, can you say more about what you'd actually want to see as a user? Expand on what you mean by listens for feeds of specific types, I'm not sure what that means. You'd like to see, what? Just initial commits by certain users, and new stable releases on certain projects (or by certain users?). Or you want to have an interface that gives you the ability to choose/search exactly what you want to see from categories like these, accross a wide swatch of projects chosen as of interest? From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Aaron Rubinstein [arubi...@library.umass.edu] Sent: Sunday, March 28, 2010 6:33 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] planet code4lib code (was: newbie) Quoting Jonathan Rochkind rochk...@jhu.edu: Hmm, an aggregated feed of the commit logs (from repos that offer feeds, as most do), of open source projects of interest to the code4lib community. Would that be at all useful? I think that's a start but I'd imagine that just a feed of the commit logs would contain a lot of noise that would drown out what might actually be interesting, like newly published gists, initial commits of projects, new project releases, etc... I'm most familiar with GitHub, which indicates the type of event being published, but I'm sure other code repos do something similar. Would it be possible to put something together using Views that listens for feeds of specific types published by users in the code4lib community? Aaron -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] PHP bashing (was: newbie)
Also...it's pretty good for plugging leaks in ducts. On Thu, Mar 25, 2010 at 11:51 AM, Nate Vack njv...@wisc.edu wrote: On Thu, Mar 25, 2010 at 10:00 AM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: You say that as if duct tape is a bad thing for auto repairs. Not all duct tape repairs are candidates for There, I fixed it![1]. It works just fine for the occassional hose repair. At the risk of taking an off-topic conversation even further into Peanut Heaven, automotive hose repair is actually one of the things duct tape is least well-suited to. The adhesive doesn't bond when wet, it's not strong enough to hold much pressure or vacuum (especially moderate continuous pressure), and it fails very quickly at even moderately high temperatures. And it tends to leave goo all over everything, thus adding headaches to the proper repair you'll still need later. Duct tape is OK for keeping a wire bundle out of your fan or something, but if you try to fix a leak in your radiator hose with it, you'll still be stranded and also have gooey duct tape adhesive all over the place. Extending these points to the ongoing language debate is an exercise that will benefit no one ;-) Cheers (and just get that hose replaced ;-) -Nate -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter [MARC-JSON]
On the one hand, I'm all for following specs. But on the other...should we really be too concerned about dealing with the full flexibility of the 2709 spec, vs. what's actually used? I mean, I hope to god no one is actually creating new formats based on 2709! If there are real-life examples in the wild of, say, multi-character indicators, or subfield codes of more than one character, that's one thing. BTW, in the stuff I proposed, you know a controlfield vs. a datafield because of the length of the array (2 vs 5); it's well-specified, but by the size of the tuple, not by label. On Mon, Mar 15, 2010 at 11:22 AM, Houghton,Andrew hough...@oclc.org wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Monday, March 15, 2010 11:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter [MARC-JSON] I would just ask why you didn't use Bill Dueber's already existing proto-spec, instead of making up your own incomptable one. Because the internal use of our specification predated Bill's blog entry, dated 2010-02-25, by almost a year. Bill's post reminded me that I had not published or publicly discussed our specification. Secondly, Bill's specification looses semantics from ISO 2709, as I previously pointed out. His specification clumps control and data fields into one property named fields. According to ISO 2709, control and data fields have different semantics. You could have a control field tagged as 001 and a data field tagged as 001 which have different semantics. MARC-21 has imposed certain rules for assignment of tags such that this isn't a concern, but other systems based on ISO 2709 may not. Andy. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] questions about 2011 conference proposal
I'm pretty sure the closest real hotel (there are a couple bed breakfasts) is the new Hilton downtown; it's about 3/4 of a mile straight down Kirkwood Ave and probably a 12mn walk. On Mon, Mar 15, 2010 at 9:20 PM, Jonathan Rochkind rochk...@jhu.edu wrote: (Code4Lib listserv, Robert McDonald CC'd). I have a question about the Bloomington Code4Lib conference proposal. (I would personally be quite happy for the conference to be in Bloomington). I note that the actual IU conference center has under 200 rooms. Probably not enough for all attendees even if we take every room. Are the other hotels in Bloomington a quick walk to the conference center, and what are their rates like? (I would ask the same thing about the Vancouver proposal, but they say they can secure $109 rates at two named hotels, which I'm assuming would have enough rooms for us all, and I'm assuming are close enough to the proposed meeting venue to work, although I haven't looked it up on google maps.) Jonathan -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
On Sat, Mar 6, 2010 at 1:57 PM, Houghton,Andrew hough...@oclc.org wrote: A way to fix this issue is to say that use cases #1 and #2 conform to media type application/json and use case #3 conforms to a new media type say: application/marc+json. This new application/marc+json media type now becomes a library centric standard and it avoids breaking a widely deployed Web standard. I'm so sorry -- it never dawned on me that anyone would think that I was asserting that a JSON MIME type should return anything but JSON. For the record, I think that's batshit crazy. JSON needs to return json. I'd been hoping to convince folks that we need to have a standard way to pass records around that doesn't require a streaming parser/writer; not ignore standard MIME-types willy-nilly. My use cases exist almost entirely outside the browse environment (because, my god, I don't want to have to try to deal with MARC21, whatever the serialization, in a browser environment); it sounds like Andy is almost purely worried about working with a MARC21 serialization within a browser-based javascript environment. Anyway, hopefully, it won't be a huge surprise that I don't disagree with any of the quote above in general; I would assert, though, that application/json and application/mac+json should both return JSON (in the same way that text/xml, application/xml, and application/marc+xml can all be expected to return XML). Newline-delmited json is starting to crop up in a few places (e.g. couchdb) and should probably have its own mime type and associated extension. So I would say something like: application/json -- return json (obviously) application/marc+json -- return json application/marc+ndj -- return newline-delimited json In all cases, we should agree on a standard record serialization, though, and the pure-json returns should include something that indicates what the heck it is (hopefully a URI that can act as a distinct namespace-type identifier, including a version in it). The question for me, I think, is whether within this community, anyone who provides one of these types (application/marc+json and application/marc+ndj) should automatically be expected to provide both. I don't have an answer for that. -Bill-
Re: [CODE4LIB] Code4Lib Midwest?
I'm pretty sure I could make it from Ann Arbor! On Fri, Mar 5, 2010 at 10:12 AM, Ken Irwin kir...@wittenberg.edu wrote: I would come from Ohio to wherever we choose. Kalamazoo would suit me just fine; I've not been back there in entirely too long! Ken -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Scott Garrison Sent: Friday, March 05, 2010 8:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib Midwest? +1 ELM, I'm happy to help coordinate in whatever way you need. Also, if we can find a drummer, we could do a blues trio (count me in on bass). I could bring our band's drummer (a HUGE ND fan) down for a day or two if needed--he's awesome. --SG WMU in Kalamazoo - Original Message - From: Eric Lease Morgan emor...@nd.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, March 4, 2010 4:38:53 PM Subject: Re: [CODE4LIB] Code4Lib Midwest? On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote: 2. share demonstrations I'd like to see this be something like a blend between lightning talks and the ask anything session at the last conference This certainly works for me, and the length of time of each talk would/could be directly proportional to the number of people who attend. 4. give a presentation to library staff What sort of presentation did you have in mind, Eric? This also raises the issue of weekday vs. weekend. I'm game for either. Anyone else have a preference? What I was thinking here was a possible presentation to library faculty/staff and/or computing faculty/staff from across campus. The presentation could be one or two cool hacks or solutions that solved wider, less geeky problems. Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested content it would be making journal articles easier to find. This would be an opportunity to show off the good work done by institutions outside Notre Dame. A prophet in their own land is not as convincing as the expert from afar. I was thinking it would happen on a weekday. There would be more stuff going on here on campus, as well as give everybody a break from their normal work week. More specifically, I would suggest such an event take place on a Friday so the poeple who stayed over night would not have to take so many days off of work. 5. have a hack session It would be good to have 2 or 3 projects we can/should work on decided ahead of time (in case no one has any good ideas at the time), and perhaps a couple more inspired by the earlier presentations. True. -- ELM University of Notre Dame -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 12:01 PM, Houghton,Andrew hough...@oclc.org wrote: Too bad I didn't attend code4lib. OCLC Research has created a version of MARC in JSON and will probably release FAST concepts in MARC binary, MARC-XML and our MARC-JSON format among other formats. I'm wondering whether there is some consensus that can be reached and standardized at LC's level, just like OCLC, RLG and LC came to consensus on MARC-XML. Unfortunately, I have not had the time to document the format, although it fairly straight forward, and yes we have an XSLT to convert from MARC-XML to MARC-JSON. Basically the format I'm using is: The stuff I've been doing: http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/ ... is pretty much the same, except: 1. I don't explicitly split up control and data fields. There's a single field list; an item that has two elements is a control field (tag/data); one with four is a data field (tag / ind1 /ind2 / array_of_subfield) 2. Instead of putting a collection in a big json array, I use newline-delimited-json (basically, just stick one record on each line as a single json hash). This has the advantage that it makes streaming much, much easier, and makes doing some other things (e.g., grab the first record or two) much cheaper for even the dumbest json parser). I'm not sure what the state of JSON streaming parsers are; I know Jackson (for Java) can do it, and perl's JSON::XS can...kind of...but it's not great. 3. I include a type (MARC-JSON, MARC-HASH, whatever) and version: [major, minor] in each record. There's already a ton of JSON floating around the library world; labeling what the heck a structure is is just friendly :-) MARC's structure is dumb enough that we collectively basically can't miss; there's only so much you can do with the stuff, and a round-trip to JSON and back is easy to implement. I'm not super-against explicitly labeling the data elements (tag:, :ind1:, etc.) but I don't see where it's necessary unless you're planning on adding out-of-band data to the records/fields/subfields at some point. Which might be kinda cool (e.g., language hints on a per-subfield basis? Tokenization hints for non-whitespace-delimited languages? URIs for unique concepts and authorities where they exist for easy creation of RDF?) I *am*, however, willing to push and push and push for NDJ instead of having to deal with streaming JSON parsing, which to my limited understanding is hard to get right and to my more qualified understanding is a pain in the ass to work with. And anything we do should explicitly be UTF-8 only; converting from MARC-8 is a problem for the server, not the receiver. Support for what I've been calling marc-hash (I like to decouple it from the eventual JSON format in case the serialization preferences change, or at least so implementations don't get stuck with a single JSON library) is already baked into ruby-marc, and obviously implementations are dead-easy no matter what the underlying language is. Anyone from the LoC want to get in on this? -Bill- [ ... ] which represents a collection of MARC records or { ... } which represents a single MARC records that takes the form: { leader : 01192cz a2200301n 4500, controlfield : [ { tag : 001, data : fst01303409 }, { tag : 003, data : OCoLC }, { tag : 005, data : 20100202194747.3 }, { tag : 008, data : 060620nn anznnbabn || ana d } ], datafield : [ { tag : 040, ind1 : , ind2 : , subfield : [ { code : a, data : OCoLC }, { code : b, data : eng }, { code : c, data : OCoLC }, { code : d, data : OCoLC-O }, { code : f, data : fast }, ] }, { tag : 151, ind1 : , ind2 : , subfield : [ { code : a, data : Hawaii }, { code : z, data : Diamond Head } ] } ] } -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote: I decided to stick closer to a MARC-XML type definition since its would be easier to explain how the two specifications are related, rather than take a more radical approach in producing a specification less familiar. Not to say that other approaches are bad, they just have different advantages and disadvantages. I was going for simple and familiar. That makes sense, but please consider adding a format/version (which we get in MARC-XML from the namespace and isn't present here). In fact, please consider adding a format / version / URI, so people know what they've got. I'm also going to again push the newline-delimited-json stuff. The collection-as-array is simple and very clean, but leads to trouble for production (where for most of us we'd have to get the whole freakin' collection in memory first and then call JSON.dump or whatever) or consumption (have to deal with a streaming json parser). The production part is particularly worrisome, since I'd hate for everyone to have to default to writing out a '[', looping through the records, and writing a ']'. Yeah, it's easy enough, but it's an ugly hack that *everyone* would have to do, as opposed to just something like: while (r = nextRecord) { print r.to_json, \n } Unless, of course, writing json to a stream and reading json from a stream is a lot easier than I make it out to be across a variety of languages and I just don't know it, which is entirely possible. The streaming writer interfaces for Perl ( http://search.cpan.org/dist/JSON-Streaming-Writer/lib/JSON/Streaming/Writer.pm) and Java's Jackson ( http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example) are a little more daunting than I'd like them to be. Not wanting to argue unnecessarily, here; just adding input before things get effectively set in stone. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 3:14 PM, Houghton,Andrew hough...@oclc.org wrote: As you point out JSON streaming doesn't work with all clients and I am hesitent to build on anything that all clients cannot accept. I think part of the issue here is proper API design. Sending tens of megabytes back to a client and expecting them to process it seems like a poor API design regardless of whether they can stream it or not. It might make more sense to have a server API send back 10 of our MARC-JSON records in a JSON collection and have the client request an additional batch of records for the result set. In addition, if I remember correctly, JSON streaming or other streaming methods keep the connection to the server open which is not a good thing to do to maintain server throughput. I guess my concern here is that the specification, as you're describing it, is closing off potential uses. It seems fine if, for example, your primary concern is javascript-in-the-browser, and browser-request, pagination-enabled systems might be all you're worried about right now. That's not the whole universe of uses, though. People are going to want to dump these things into a file to read later -- no possibility for pagination in that situation. Others may, in fact, want to stream a few thousand records down the pipe at once, but without a streaming parser that can't happen if it's all one big array. I worry that as specified, the *only* use will be, Pull these down a thin pipe, and if you want to keep them for later, or want a bunch of them, you have to deal with marc-xml. Part of my incentive is to *not* have to use marc-xml, but in this case I'd just be trading one technology I don't like (marc-xml) for two technologies, one of which I don't like (that'd be marc-xml again). I really do understand the desire to make this parallel to marc-xml, but there's a seem between the two technologies that makes that a problematic approach. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 4:38 PM, Houghton,Andrew hough...@oclc.org wrote: Maybe I have been mislead or misunderstood JSON streaming. This is my central point. I'm actually saying that JSON streaming is painful and rare enough that it should be avoided as a requirement for working with any new format. I guess, in sum, I'm making the following assertions: 1. Streaming APIs for JSON, where they exist, are a pain in the ass. And they don't exist everywhere. Without a JSON streaming parser, you have to pull the whole array of documents up into memory, which may be impossible. This is the crux of my argument -- if you disagree with it, then I would assume you disagree with the other points as well. 2. Many people -- and I don't think I'm exaggerating here, honestly -- really don't like using MARC-XML but have to because of the length restrictions on MARC-binary. A useful alternative, based on dead-easy parsing and production, is very appealing. 2.5 Having to deal with a streaming API takes away the dead-easy part. 3. If you accept my assertions about streaming parsers, then dealing with the format you've proposed for large sets is either painful (with a streaming API) or impossible (where such an API doesn't exist) due to memory constraints. 4. Streaming JSON writer APIs are also painful; everything that applies to reading applies to writing. Sans a streaming writer, trying to *write* a large JSON document also results in you having to have the whole thing in memory. 5. People are going to want to deal with this format, because of its benefits over marc21 (record length) and marc-xml (ease of processing), which means we're going to want to deal with big sets of data and/or dump batches of it to a file. Which brings us back to #1, the pain or absence of streaming apis. Write a better JSON parser/writer or use a different language seem like bad solutions to me, especially when a (potentially) useful alternative exists. As I pointed out, if streaming JSON is no harder/unavailable to you than non-streaming json, then this is mostly moot. I assert that for many people in this community it is one or the other, which is why I'm leery of it. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 6:25 PM, Houghton,Andrew hough...@oclc.org wrote: OK, I will bite, you stated: 1. That large datasets are a problem. 2. That streaming APIs are a pain to deal with. 3. That tool sets have memory constraints. So how do you propose to process large JSON datasets that: 1. Comply with the JSON specification. 2. Can be read by any JavaScript/JSON processor. 3. Do not require the use of streaming API. 4. Do not exceed the memory limitations of current JSON processors. What I'm proposing is that we don't process large JSON datasets; I'm proposing that we process smallish JSON documents one at a time by pulling them out of a stream based on an end-of-record character. This is basically what we use for MARC21 binary format -- have a defined structure for a valid record, and separate multiple well-formed record structures with an end-of-record character. This preserves JSON specification adherence at the record level and uses a different scheme to represent collections. Obviously, MARC-XML uses a different mechanism to define a collection of records -- putting well-formed record structures inside a collection tag. So... I'm proposing define what we mean by a single MARC record serialized to JSON (in whatever format; I'm not very opinionated on this point) that preserves the order, indicators, tags, data, etc. we need to round-trip between marc21binary, marc-xml, and marc-json. And then separate those valid records with an end-of-record character -- \n. Unless I've read all this wrong, you've come to the conclusion that the benefit of having a JSON serialization that is valid JSON at both the record and collection level outweighs the pain of having to deal with a streaming parser and writer. This allows a single collection to be treated as any other JSON document, which has obvious benefits (which I certainly don't mean to minimize) and all the drawbacks we've been talking about *ad nauseam *. I go the the other way. I think the pain of dealing with a streaming API outweighs the benefits of having a single valid JSON structure for a collection, and instead have put forward that we use a combination of JSON records and a well-defined end-of-record character (\n) to represent a collection. I recognize that this involves providing special-purpose code which must call for JSON-deserialization on each line, instead of being able to throw the whole stream/file/whatever at your json parser is. I accept that because getting each line of a text file is something I find easy compared to dealing with streaming parsers. And our point of disagreement, I think, is that I believe that defining the collection structure in such a way that we need two steps (get a line; deserialize that line) and can't just call the equivalent of JSON.parse(stream) has benefits in ease of implementation and use that outweigh the loss of having both a single record and a collection of records be valid JSON. And you, I think, don't :-) I'm going to bow out of this now, unless I've got some part of our positions wrong, to let any others that care (which may number zero) chime in. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] HathiTrust API
I didn't put in links for RIS-type formats because, I think, I don't really understand the semantics of the link tag. The RIS output for a record is a tiny percentage of what's in a record -- is it really another representation or is it a different thing altogether? On Wed, Feb 24, 2010 at 3:45 AM, Ed Summers e...@pobox.com wrote: Nice work Bill! I particularly like your use of the link element to enable auto-discovery of these resources: link rel=canonical href=/Record/005550418 link rel=alternate type=application/marc href=/Record/005550418.mrc link rel=alternate type=application/marc+xml href=/Record/005550418.xml link rel=alternate href=/Record/005550418.rdf type=application/rdf+xml / Did you shy away from adding the RIS and Refworks formats as links because it wasn't clear what MIME type to use? I'd be interested in helping flesh out the RDF a bit if you are interested. //Ed On Tue, Feb 23, 2010 at 4:07 PM, Bill Dueber b...@dueber.com wrote: Many of you just saw Albert Betram of the University of Michigan Libraries talk at #c4l10 about HathiTrust APIs available to anyone interested. One of these, the BibAPI, was formed mostly by me on the basis of Imaginary User Needs, not actual use cases. Anyone who has use cases that aren't well-covered by the existing BibAPI should drop me a line and let me know. This is also a good time to mention that catalog.hathitrust.org (and mirlyn.lib.umich.edu) support some limited export facilities by adding an extension to a record URL. SO... http://catalog.hathitrust.org/Record/005550418 Link to the Hathitrust page http://catalog.hathitrust.org/Record/005550418.marc MARC21 binary http://catalog.hathitrust.org/Record/005550418.xml MARC-XML http://catalog.hathitrust.org/Record/005550418.ris RIS tagged format http://catalog.hathitrust.org/Record/005550418.refworks Refworks tagged format http://catalog.hathitrust.org/Record/005550418.rdf Perfunctory RDF document I'd love help getting the RDF more fleshed out, btw. Again -- if you need anything else, or if you, say, wrap a nice jQuery plugin around the BibAPI, please let me know! -Bill- Bill Dueber Library Systems Programmer University of Michigan Library -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] HathiTrust API
OK, I've added links for RIS and Endnote, but it turns out I *don't* know what mime type to use for Refworks. When actually talking to refworks with their callback system, I need to send it as text/plain, and I've been unable to track down what the preferred type is. Anyone know? On Wed, Feb 24, 2010 at 3:45 AM, Ed Summers e...@pobox.com wrote: Nice work Bill! I particularly like your use of the link element to enable auto-discovery of these resources: link rel=canonical href=/Record/005550418 link rel=alternate type=application/marc href=/Record/005550418.mrc link rel=alternate type=application/marc+xml href=/Record/005550418.xml link rel=alternate href=/Record/005550418.rdf type=application/rdf+xml / Did you shy away from adding the RIS and Refworks formats as links because it wasn't clear what MIME type to use? I'd be interested in helping flesh out the RDF a bit if you are interested. //Ed On Tue, Feb 23, 2010 at 4:07 PM, Bill Dueber b...@dueber.com wrote: Many of you just saw Albert Betram of the University of Michigan Libraries talk at #c4l10 about HathiTrust APIs available to anyone interested. One of these, the BibAPI, was formed mostly by me on the basis of Imaginary User Needs, not actual use cases. Anyone who has use cases that aren't well-covered by the existing BibAPI should drop me a line and let me know. This is also a good time to mention that catalog.hathitrust.org (and mirlyn.lib.umich.edu) support some limited export facilities by adding an extension to a record URL. SO... http://catalog.hathitrust.org/Record/005550418 Link to the Hathitrust page http://catalog.hathitrust.org/Record/005550418.marc MARC21 binary http://catalog.hathitrust.org/Record/005550418.xml MARC-XML http://catalog.hathitrust.org/Record/005550418.ris RIS tagged format http://catalog.hathitrust.org/Record/005550418.refworks Refworks tagged format http://catalog.hathitrust.org/Record/005550418.rdf Perfunctory RDF document I'd love help getting the RDF more fleshed out, btw. Again -- if you need anything else, or if you, say, wrap a nice jQuery plugin around the BibAPI, please let me know! -Bill- Bill Dueber Library Systems Programmer University of Michigan Library -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] HathiTrust API
OK, slow it down J-Rock. :-) I'm looking for a MIME type for Refworks Tagged Format, which is NOT RIS. It's a different tagged format. The three most common tagged formats are RIS, Refworks, and Endnote-style-Refer. It's the Refworks one I need help with. And I gave up on the marc-lines-pretend-format stuff; I just send Refworks their preferred tagged format now, I just don't know what MIME type to use. On Wed, Feb 24, 2010 at 9:57 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Do you mean what's the mime-type for RIS files? (RIS != Refworks, I forget what RIS stands for, but it's used by many reference managers, and may have originally been invented by EndNote?) There isn't a registered MIME type for RIS. Googling around, it looks like the preferred one is: application/x-Research-Info-Systems (Guess that's what RIS stands for?) Or wait, you mean the Refworks callback?You can actually give Refworks a variety of types of content in the callback. Are you giving it that weird marc-formatted-a-certain-way-in-a-textfile format? I doubt there's any mime type for that other than text/plain. Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill Dueber [b...@dueber.com] Sent: Wednesday, February 24, 2010 9:47 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] HathiTrust API OK, I've added links for RIS and Endnote, but it turns out I *don't* know what mime type to use for Refworks. When actually talking to refworks with their callback system, I need to send it as text/plain, and I've been unable to track down what the preferred type is. Anyone know? On Wed, Feb 24, 2010 at 3:45 AM, Ed Summers e...@pobox.com wrote: Nice work Bill! I particularly like your use of the link element to enable auto-discovery of these resources: link rel=canonical href=/Record/005550418 link rel=alternate type=application/marc href=/Record/005550418.mrc link rel=alternate type=application/marc+xml href=/Record/005550418.xml link rel=alternate href=/Record/005550418.rdf type=application/rdf+xml / Did you shy away from adding the RIS and Refworks formats as links because it wasn't clear what MIME type to use? I'd be interested in helping flesh out the RDF a bit if you are interested. //Ed On Tue, Feb 23, 2010 at 4:07 PM, Bill Dueber b...@dueber.com wrote: Many of you just saw Albert Betram of the University of Michigan Libraries talk at #c4l10 about HathiTrust APIs available to anyone interested. One of these, the BibAPI, was formed mostly by me on the basis of Imaginary User Needs, not actual use cases. Anyone who has use cases that aren't well-covered by the existing BibAPI should drop me a line and let me know. This is also a good time to mention that catalog.hathitrust.org (and mirlyn.lib.umich.edu) support some limited export facilities by adding an extension to a record URL. SO... http://catalog.hathitrust.org/Record/005550418 Link to the Hathitrust page http://catalog.hathitrust.org/Record/005550418.marc MARC21 binary http://catalog.hathitrust.org/Record/005550418.xml MARC-XML http://catalog.hathitrust.org/Record/005550418.ris RIS tagged format http://catalog.hathitrust.org/Record/005550418.refworks Refworks tagged format http://catalog.hathitrust.org/Record/005550418.rdf Perfunctory RDF document I'd love help getting the RDF more fleshed out, btw. Again -- if you need anything else, or if you, say, wrap a nice jQuery plugin around the BibAPI, please let me know! -Bill- Bill Dueber Library Systems Programmer University of Michigan Library -- Bill Dueber Library Systems Programmer University of Michigan Library -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] HathiTrust API
Many of you just saw Albert Betram of the University of Michigan Libraries talk at #c4l10 about HathiTrust APIs available to anyone interested. One of these, the BibAPI, was formed mostly by me on the basis of Imaginary User Needs, not actual use cases. Anyone who has use cases that aren't well-covered by the existing BibAPI should drop me a line and let me know. This is also a good time to mention that catalog.hathitrust.org (and mirlyn.lib.umich.edu) support some limited export facilities by adding an extension to a record URL. SO... http://catalog.hathitrust.org/Record/005550418 Link to the Hathitrust page http://catalog.hathitrust.org/Record/005550418.marc MARC21 binary http://catalog.hathitrust.org/Record/005550418.xml MARC-XML http://catalog.hathitrust.org/Record/005550418.ris RIS tagged format http://catalog.hathitrust.org/Record/005550418.refworks Refworks tagged format http://catalog.hathitrust.org/Record/005550418.rdf Perfunctory RDF document I'd love help getting the RDF more fleshed out, btw. Again -- if you need anything else, or if you, say, wrap a nice jQuery plugin around the BibAPI, please let me know! -Bill- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] urldecode problem and CAS
I'd first make sure you're not url-encoding your return URL twice. I'd like to not believe that a CAS server wouldn't url-decode before the redirect, but ... On Wed, Jan 27, 2010 at 12:38 PM, Jimmy Ghaphery jghap...@vcu.edu wrote: Yes the original url looks like http://../app.cfm?id=15 and the return url coming back from CAS looks like http://../app.cfm?id%3d15 I am pretty sure this is native to the way CAS returns urls, and probably need to ping some ColdFusion folks on how to deal with the urlencoded return. I'll also message the ColdFusion library group. If anyone out here has CAS experience and can confirm that urlencoded return urls seem normal that would be helpful. Walker, David wrote: So a user arrives at your app. You see that they are not logged in, and so redirect them to the CAS server with a return URL back to your application. Do you have an example of that URL? --Dave == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jimmy Ghaphery [jghap...@vcu.edu] Sent: Wednesday, January 27, 2010 9:18 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] urldecode problem and CAS CODE4LIB, I'm looking for some urldecode help if possible. I have an app that gets a call through a url which looks like this in order to pull up a specific record: http://../app.cfm?id=15 It is password protected and we have recently moved to CAS for authentication. After it gets passed from CAS back to our server it looks like this and tosses an error: http://../app.cfm?id%3d15 The equals sign translated to %3d Any ideas are appreciated. thanks -Jimmy -- Jimmy Ghaphery Head, Library Information Systems VCU Libraries http://www.library.vcu.edu -- -- Jimmy Ghaphery Head, Library Information Systems VCU Libraries http://www.library.vcu.edu -- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?
On Wed, Jan 6, 2010 at 8:53 AM, Joel Marchesoni jma...@email.wcu.eduwrote: I agree with Dan's last point about avoiding using a special IDE to develop with a language. I'll respectfully, but vehemently, disagree. I would say avoid *forcing* everyone working on the project depend on a special IDE -- avoid lockin. Don't avoid use. There's a spectrum of how much an editor/environment can know about a program. At one end is Smalltalk, where the development environment *is* the program. At the other end is something like LISP (and, to an extent, Ruby) where so little can be inferred from the syntax of the code that a smart IDE can't actually know much other than how to match parentheses. For languages where little can be known at compile time, an IDE may not buy you very much other than syntax highlighting and code folding. For Java, C++, etc. an IDE can know damn near everything about your project and radically up your productivity -- variable renaming, refactoring, context-sensitive help, jump-to-definition, method-name completion, etc. It really is a difference that makes a difference. I know folks say they can get the same thing from vim or emacs, but at that level those editors are no less complex (and a good deal more opaque) than something like Eclipse or Netbeans unless you already have a decade of experience with them. If you're starting in a new language, try a couple editors, too. Both Eclipse and Netbeans are free and cross-platform, and have support for a lot of languages. Editors like Notepad++, EditPlus, Textmate jEdit, and BBEdit can all do very nice things with a variety of languages. -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] University of Michigan Solr filters for ISBN/LCCN and High Level Browse code available at github
I've made available the code we use in the solrmarc/solr installation behind http://mirlyn.lib.umich.edu to normalize LCCNs and ISBNs and add our local High Level Browse LC-callnumber-based categorization scheme. The code itself and a downloadable .jar file for the normalizers are available at http://github.com/billdueber/lib.umich.edu-solr-stuff The README has usage examples as well, so you know what to put in your schema.xml. The source is not pretty in the same way the sea is not above the sky, but it all works as best as I can tell and we all know the dangers of waiting to clean up code before release. Patches are, of course, always welcome. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] FW: PURL Server Update 2
Andy, I think there are three issues here: 1. Should the GPO put in place, at least at the moment, some throttling for user agents behaving like dicks? 2. Should III (and others), when acting as a user agent, be such a dick? 3. How do I know if I'm being a dick? The answers folks are offering, I think, are (1) Yes, (2) No, and (3) It's hard to know, but you should always check robots.txt, and you should always throttle yourself to a reasonable level unless you know the target can take the abuse. For the majority of the web, for the majority of the time, basic courtesy and the gentleperson's agreement ensconced in robots.txt works fine -- most folks who write user agents don't want to be dicks. When this informality doesn't work, as you point out, there are solutions you can implement at some edge of your network. Of course, at that point the requests are already flooding through to *somewhere*, so getting things stopped as close to the point of origin is key. On Wed, Sep 2, 2009 at 11:26 AM, Houghton,Andrew hough...@oclc.org wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Thomas Dowling Sent: Wednesday, September 02, 2009 10:25 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] FW: PURL Server Update 2 The III crawler has been a pain for years and Innovative has shown no interest in cleaning it up. It not only ignores robots.txt, but it hits target servers just as fast and hard as it can. If you have a lot of links that a lot of III catalogs check, its behavior is indistinguishable from a DOS attack. (I know because our journals server often used to crash about 2:00am on the first of the month...) I see that I didn't fully make the connection to the point I was making... which is that there are hardware solutions to these issues rather than using robots.txt or sitemap.xml. If a user agent is a problem, then network folks should change the router to ignore the user agent or reduce the number of requests it is allowed to make to the server. In the case you point to with III hitting the server as fast as it can and it looking like a DOS attack to the network which caused the server to crash, then 1) the router hasn't been setup to impose throttling limits on user agents, and 2) the server probably isn't part of a server farm that is being load balanced. In the case of GPO, they mentioned or implied, that they were having contention issues with user agents hitting the server while trying to restore the data. This contention could be mitigated by imposing lower throttling limits in the router on user agents until the data is restored and then raising the limits back to the whatever their prescribed SLA (service level agreement) was. You really don't need to have a document on the server to tell user agents what to do. You can and should impose a network policy on user agents which is far better solution in my opinion. Andy. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Open, public standards v. pay per view standards and usage
On Thu, Jul 16, 2009 at 11:26 AM, Houghton,Andrew hough...@oclc.org wrote: Not saying you're wrong Ross, but it depends. People adopted MARC-XML by looking at the .xsd without an actual specification. Granted it's not a complicated schema however, and there already existed the MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media so it wasn't a big leap to adopt MARC-XML, IMHO. I'm not disagreeing with your overall point, but this is a specious example, I think. Examining a MARC-XML file shows you how to do a mechanical translation from a ridiculously simple non-XML syntax into an XML syntax -- the actual data itself remains completely opaque. The MARC-XML schema + AACR2 gives you what you need. The ISO 208775 schema, for example, include elements like xs:element name=physicalLocation -- and there's no way you're going to know what the hell goes in there without a lot more help. And if you were to have to pay for that help, many would rely on cheat-sheets or pattern-matching and it all goes to hell. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] WARC file format now ISO standard
So, can we expect a leaked final draft of RDA, do you think :-) On Tue, Jun 2, 2009 at 5:47 PM, David Fiander da...@fiander.info wrote: This is a common problem with ISO standards, and the common solution is to do just this: release the final draft before it's approved by ISO as an official standard. That's what the ISO Forth programming language group did as well. - David On Tue, Jun 2, 2009 at 5:35 PM, st...@archive.org st...@archive.org wrote: point well taken. :) there were no significant changes to the WARC format between the last draft and the published standard. you can use Heritrix WARCReader, or WARC Tools warcvalidator to verify that you have created a valid WARC in accordance with the spec. /st...@archive.org On 6/2/09 2:27 PM, Ray Denenberg, Library of Congress wrote: But you have to pay $200 for the document that lists changes from last draft to first official version. (Ok, Ok, it was just a joke. But you do get the point.) - Original Message - From: st...@archive.org st...@archive.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Tuesday, June 02, 2009 5:18 PM Subject: Re: [CODE4LIB] WARC file format now ISO standard hi Karen, understood. the final draft of the spec is available here: http://www.scribd.com/doc/4303719/WARC-ISO-28500-final-draft-v018-Zentveld-080618 and other (similar) versions here: http://archive-access.sourceforge.net/warc/ /st...@archive.org On 6/2/09 2:15 PM, Karen Coyle wrote: Unfortunately, being an ISO standard, to obtain it costs 118 CHF (about $110 USD). Hard to follow a standard you can't afford to read. Is there an online version somewhere? kc st...@archive.org wrote: hi code4lib, if you're archiving web content, please use the WARC format. thanks, /st...@archive.org WARC File Format Published as an International Standard http://netpreserve.org/press/pr20090601.php ISO 28500:2009 specifies the WARC file format: * to store both the payload content and control information from mainstream Internet application layer protocols, such as the Hypertext Transfer Protocol (HTTP), Domain Name System (DNS), and File Transfer Protocol (FTP); * to store arbitrary metadata linked to other stored data (e.g. subject classifier, discovered language, encoding); * to support data compression and maintain data record integrity; * to store all control information from the harvesting protocol (e.g. request headers), not just response information; * to store the results of data transformations linked to other stored data; * to store a duplicate detection event linked to other stored data (to reduce storage in the presence of identical or substantially similar resources); * to be extended without disruption to existing functionality; * to support handling of overly long records by truncation or segmentation, where desired. more info here: http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] exact title searches with z39.50
Like so many library standards, z30.50 is a syntax and a set of rough guidelines. You have no idea what's actually happening on the other end, because it's not specified, and you just have to either find someone you can ask at the target machine or reverse engineer it. On Mon, Apr 27, 2009 at 5:13 PM, Eric Lease Morgan emor...@nd.edu wrote: What are the ways to accomplish exact title searches with z39.50? I'm looping through a list of MARC records trying to determine whether or not we own multiple copies of an item. After reading MARC field 245, subfield a I am creating the following z39.50 query: @attr 1=4 foo bar Unfortunately my local implementation seems to interpret this in a rather regular expression sort of way -- * foo bar *. Does anybody out there know how to create a more exact query? I only want to find titles exactly equalling foo bar. -- Eric Lease Morgan University of Notre Dame -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Something completely different
On Thu, Apr 9, 2009 at 10:26 AM, Mike Taylor m...@indexdata.com wrote: I'm not sure what to make of this except to say that Yet Another XML Bibliographic Format is NOT the answer! I recognize that you're being flippant, and yet think there's an important nugget in here. When you say it that way, it makes it sound as if folks are debating the finer points of OAI-MARC vs MARC-XML -- that it's simply syntactic sugar (although I'm certainly one to argue for the importance of syntactic sugar) over the top of what we already have. What's actually being discussed, of course, is the underlying data model. E-R pairs primarily analyzed by set theory, triples forming directed graphs, whether or not links between data elements can themselves have attributes -- these are all possible characteristics of the fundamental underpinning of a data model to describe the data we're concerned with. The fact that they all have common XML representations is noise, and referencing the currently-most-common xml schema for these things is just convenient shorthand in a community that understands the exemplars. The fact that many in the library community don't understand that syntax is not the same as a data model is how we ended up with RDA. (Mike: I don't know your stuff, but I seriously doubt you're among that group. I'm talkin' in general, here.) Bibliographic data is astoundingly complex, and I believe wholeheartedly that modeling it sufficiently is a very, very hard task. But no matter the underlying model, we should still insist on starting with the basics that computer science folks have been using for decades now: uids (and, these days, guids) for the important attributes, separation of data and display, definition of sufficient data types and reuse of those types whenever possible, separation of identity and value, full normalization of data, zero ambiguity in the relationship diagram as a fundamental tenet, and a rigorous mathematical model to describe how it all fits together. This is hard stuff. But it's worth doing right. -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] ANN: University of Michigan Live vuFind beta!
The University of Michigan University Libraries has gone live with a beta installation of vuFind, currently branded as Mirlyn2-Beta to differentiate it from our existing OPAC interface, Mirlyn. You can take a look at http://mirlyn2-beta.lib.umich.edu/ We've added several enhancements: - Spellcheck when there are no results (possible because we use a recent Solr nightly) -- try searching on 'minnesoa' - Extraction of search specifications into an external file for easier tweaking - Integration of UMich's High Level Browse subject headings (seen here as the Academic Discipline facet) - Inlining of more extensive, real-time availability information in search results - Reworked Refworks export Any comments or bug reports can be sent to me or, even better, via the Tell us what you think button in the upper-right corner. Thanks to everyone in all the various communities that have offered help and feedback! -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library