[CODE4LIB] Unexpected ruby-marc behavior
So I was taking ruby-marc out for a spin in irb, and encountered a bit of a surprise. Running the following: require 'marc' reader = MARC::Reader.new('filename.mrc') reader.each {|record| puts record['245']} produces the expected result, but every subsequent call to reader.each {|record| puts record['245']} returns nil. Am I missing something obvious? I don't remember this being the case before. Thanks! Cory [running ruby-marc off the github repo / os x 10.6.5 / ruby 1.9.2 via rvm / rubygems via homebrew]
Re: [CODE4LIB] Unexpected ruby-marc behavior
Oh, gotcha. Thanks. C On Jan 27, 2011, at 2:11 PM, Ross Singer wrote: No, that's expected behavior (and how it's always been). You'd need to do reader.rewind to put your enumerator cursor back to 0 to run back over the records. It's basically an IO object (since that's what it expects as input) and behaves like one. -Ross. On Thu, Jan 27, 2011 at 2:03 PM, Cory Rockliff rockl...@bgc.bard.edu wrote: So I was taking ruby-marc out for a spin in irb, and encountered a bit of a surprise. Running the following: require 'marc' reader = MARC::Reader.new('filename.mrc') reader.each {|record| puts record['245']} produces the expected result, but every subsequent call to reader.each {|record| puts record['245']} returns nil. Am I missing something obvious? I don't remember this being the case before. Thanks! Cory [running ruby-marc off the github repo / os x 10.6.5 / ruby 1.9.2 via rvm / rubygems via homebrew] --- [This E-mail scanned for viruses by Declude Virus] --- [This E-mail scanned for viruses by Declude Virus] Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 38 West 86th Street, New York, NY 10024 T 212 501 3037 E rockl...@bgc.bard.edu W bgc.bard.edu/library BGC Exhibitions: In the Main Gallery: January 26, 2011–April 17, 2011 Cloisonné: Chinese Enamels from the Yuan, Ming, and Qing Dynasties In the Focus Gallery: January 26, 2011–April 17, 2011 Objects of Exchange: Social and Material Transformation on the Late-Nineteenth-Century Northwest Coast
Re: [CODE4LIB] MARCXML - What is it for?
I've only just had a chance to catch up on this thread. I'm not offended in the least by Turbomarc (anything round-trippable should serve just as well as an internal representation of MARC, right?), but I am a little puzzled--what are the 'special cases' alluded to in the blog post? When would there ever be a non-alphanumeric attribute value in MARCXML? Is this a non-MARC21 thing? C On 10/25/10 3:35 PM, MJ Suhonos wrote: I'll just leave this here: http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records That trade-off ought to offend both camps, though I happen to think it's quite clever. MJ On 2010-10-25, at 3:22 PM, Eric Hellman wrote: I think you'd have a very hard time demonstrating any speed advantage to MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If there exists a MARC parser that has ever been speed-optimized without serious compromise, I'm sure someone on this list will have a good story about it. On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote: Dear Nate, There is a trade-off: do you want very fast processing of data - go for binary data. do you want to share your data globally easily in many (not per se library related) environments - go for XML/RDF. Open your data and do both :-) Pat Sent from my iPhone On 25 Oct 2010, at 20:39, Nate Vacknjv...@wisc.edu wrote: Hi all, I've just spent the last couple of weeks delving into and decoding a binary file format. This, in turn, got me thinking about MARCXML. In a nutshell, it looks like it's supposed to contain the exact same data as a normal MARC record, except in XML form. As in, it should be round-trippable. What's the advantage to this? I can see using a human-readable format for poorly-documented file formats -- they're relatively easy to read and understand. But MARC is well, well-documented, with more than one free implementation in cursory searching. And once you know a binary file's format, it's no harder to parse than XML, and the data's smaller and processing faster. So... why the XML? Curious, -Nate Eric Hellman President, Gluejar, Inc. 41 Watchung Plaza, #132 Montclair, NJ 07042 USA e...@hellman.net http://go-to-hellman.blogspot.com/ @gluejar --- [This E-mail scanned for viruses by Declude Virus] -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu BGC Exhibitions: In the Main Gallery: January 26, 2011– April 17, 2011 Cloisonné: Chinese Enamels from the Yuan, Ming, and Qing Dynasties Organized in collaboration with the Musée des arts Décoratifs, Paris. In the Focus Gallery: January 26, 2011– April 17, 2011 Objects of Exchange: Social and Material Transformation on the Late-Nineteenth-Century Northwest Coast Organized in collaboration with the American Museum of Natural History --- [This E-mail scanned for viruses by Declude Virus]
Re: [CODE4LIB] DIY aggregate index
I'm planning on moving ahead with a proof-of-concept in the next year, after which I will certainly consider writing it up. I really hope I can get the go-ahead from database vendors. It's good to hear that a few institutions have successfully negotiated with them--anyone from Los Alamos, the Scholars Portals, or any other local indexers feel free to give me pointers on smooth-talking the vendors! :) I also hope you're wrong in maintaining, in the article you linked to, that using controlled vocabularies for retrieval will never work well across databases that use different vocabularies. The (admittedly arduous and complex) work of crosswalking library-created controlled vocabularies like LCSH to periodical index thesauri and other formal and less-formal indexing languages out in the wild is *exactly* what I think librarians should be spending their time doing. Catalogers (and I include myself) spend a lot of time making largely irrelevant tweaks to already-existing MARC records before exporting them into our local ILSes, but article-level metadata from vendors is generally served up to the user as-is. I think Roy Tennant, as quoted in your article, is spot-on when he says that our inability to do any preprocessing of the data is a major hindrance. The data sources we subscribe to should be seen as starting points for generating a user experience, rather than letting the vendors decide what the discovery process is going to be like. Cory On 7/1/2010 11:39 AM, Jonathan Rochkind wrote: I am eager to see you try it, Cory. Please consider writing up your results for the Code4Lib Journal. I'd be curious to hear the complete story, from issues of getting metadata, to issues of the technical infrastructure, any metadata normalization you need to do, issues of continuing to get the metadata on a regular basis, etc. Whether you succeed or fail, but especially if you succeed, your project with just a couple databases could serve as a useful pilot for people considering doing it with more. Jonathan -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus]
[CODE4LIB] DIY aggregate index
You know, this leads into something I've been wondering about. You'll all have to pardon my ignorance, as I've never worked in a library with functioning management of e-resources. Do libraries opt for these commercial 'pre-indexed' services simply because they're a good value proposition compared to all the work of indexing multiple resources from multiple vendors into one local index, or is it that companies like iii and Ex Libris are the only ones with enough clout to negotiate access to otherwise-unavailable database vendors' content? Can I assume that if a database vendor has exposed their content to me as a subscriber, whether via z39.50 or a web service or whatever, that I'm free to cache and index all that metadata locally if I so choose? Is this something to be negotiated on a vendor-by-vendor basis, or is it an impossibility? Cory On 6/30/2010 12:37 PM, Walker, David wrote: Hi Cindy, Both the Ebsco and Proquest APIs are definitely available to customers. We're using the Ebsco one in our Xerxes application, for example. ( I'll send you a link off-list, Cindy.) --Dave == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Cindy Harper [char...@colgate.edu] Sent: Wednesday, June 30, 2010 9:11 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Innovative's Synergy Hi All - III is touting their web-services based Synergy product as having the efficiency of a pre-indexed service and the timeliness of a just-in-time service. Does anyone know if the agreements they have made with database vendors to use these web services preclude an organization developing an open-source client to take advantage of those web services? Just curious. I suppose I should direct my question to EBSCO and Proquest directly. Cindy Harper, Systems Librarian Colgate University Libraries char...@colgate.edu 315-228-7363 --- [This E-mail scanned for viruses by Declude Virus] -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus]
Re: [CODE4LIB] DIY aggregate index
Well, this is the thing: we're a small, highly-specialized collection, so I'm not talking about indexing the whole range of content which a university like JHU or even a small liberal arts college would need to--it's really a matter of a few key databases in our field(s). Don't get me wrong, it's still a slightly crazy idea, but I'm dissatisfied enough with existing solutions that I'd like to try it. On 6/30/2010 4:15 PM, Jonathan Rochkind wrote: A little bit of both, I think. A library probably _could_ negotiate access to that content... but it would be a heck of a lot of work. When the staff time to negotiations come in, it becomes a good value proposition, regardless of how much the licensing would cost you. And yeah, then the staff time to actually ingest and normalize and troubleshoot data-flows for all that stuff on the regular basis -- I've heard stories of libraries that tried to do that in the early 90s and it was nightmarish. I wonder if they would, in fact, demand licensing fees. I mean, we're already paying a subscription, and they're already exposing their content as a target for federated search applications (which probably do some caching for performance)... So, actually, I guess i've arrived at convincing myself it's mostly good value proposition, in that a library probably can't afford to do that on their own, with or without licensing issues. -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus]
Re: [CODE4LIB] DIY aggregate index
We're looking at an infrastructure based on Marklogic running on Amazon EC2, so the scale of data to be indexed shouldn't actually be that big of an issue. Also, as I said to Jonathan, I only see myself indexing a handful of highly-relevant resources, so we're talking millions, rather than 100s of millions, of records. On 6/30/2010 4:22 PM, Walker, David wrote: You might also need to factor in an extra server or three (in the cloud or otherwise) into that equation, given that we're talking 100s of millions of records that will need to be indexed. companies like iii and Ex Libris are the only ones with enough clout to negotiate access I don't think III is doing any kind of aggregated indexing, hence their decision to try and leverage APIs. I could be wrong. --Dave == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind [rochk...@jhu.edu] Sent: Wednesday, June 30, 2010 1:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] DIY aggregate index Cory Rockliff wrote: Do libraries opt for these commercial 'pre-indexed' services simply because they're a good value proposition compared to all the work of indexing multiple resources from multiple vendors into one local index, or is it that companies like iii and Ex Libris are the only ones with enough clout to negotiate access to otherwise-unavailable database vendors' content? A little bit of both, I think. A library probably _could_ negotiate access to that content... but it would be a heck of a lot of work. When the staff time to negotiations come in, it becomes a good value proposition, regardless of how much the licensing would cost you. And yeah, then the staff time to actually ingest and normalize and troubleshoot data-flows for all that stuff on the regular basis -- I've heard stories of libraries that tried to do that in the early 90s and it was nightmarish. So, actually, I guess i've arrived at convincing myself it's mostly good value proposition, in that a library probably can't afford to do that on their own, with or without licensing issues. But I'd really love to see you try anyway, maybe I'm wrong. :) Can I assume that if a database vendor has exposed their content to me as a subscriber, whether via z39.50 or a web service or whatever, that I'm free to cache and index all that metadata locally if I so choose? Is this something to be negotiated on a vendor-by-vendor basis, or is it an impossibility? I doubt you can assume that. I don't think it's an impossibility. Jonathan --- [This E-mail scanned for viruses by Declude Virus] -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus]
Re: [CODE4LIB] Fwd: Webinar: Introducing Cultural Objects Name Authority (CONA)
Actually, their licensing terms for non-profits are very reasonable. On 4/19/2010 11:43 AM, Ethan Gruber wrote: I wonder how many thousands of dollars they will charge to use this. On Mon, Apr 19, 2010 at 11:26 AM, Mark A. Matienzom...@matienzo.orgwrote: -- Forwarded message -- From: Erin Coburnecob...@getty.edu Date: Mon, Apr 19, 2010 at 9:54 AM The Museum Computer Network (MCN), Gallery Systems, and the J. Paul Getty Trust are pleased to offer a free Webinar on a new vocabulary under development, the Cultural Objects Name Authority™ (CONA). Introducing the Getty’s new Cultural Objects Name Authority™ (CONA) Tuesday, May 4, 2010 11:30 AM - 1:00 PM EDT The Cultural Objects Name Authority™ (CONA) is a new Getty vocabulary currently under development. It is scheduled for introduction to the contributor community in 2011. CONA will join the other three Getty vocabularies, the Art Architecture Thesaurus® (AAT), the Getty Thesaurus of Geographic Names® (TGN), and the Union List of Artist Names® (ULAN), as a tool for cataloging and retrieval of art information. CONA will contain titles, current location, and other core information for cultural works. The scope of CONA will include architecture and movable works such as paintings, sculpture, prints, drawings, manuscripts, photographs, ceramics, textiles, furniture, and archaeological artifacts. Murtha Baca, Head of Digital Art History Access at the Getty Research Institute, and Patricia Harpring, Managing Editor of the Getty Vocabulary Program, will present an introduction to CONA and will be available for questions. To register, please go to: https://www2.gotomeeting.com/register/307938058 --- [This E-mail scanned for viruses by Declude Virus] -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus]
Re: [CODE4LIB] Fwd: Webinar: Introducing Cultural Objects Name Authority (CONA)
I believe that's $1000 for a five-year license, or $200 a year, for unlimited use of the data as an XML download and/or as a web service. That compares pretty favorably to, e.g., $325 / year minimum for access to RDA Toolkit. The real question here, I think, is not whether the price is right, but whether licensing of this sort is the best course for the Getty to pursue. They do already provide free access to their vocabularies in human-readable form--why not expand that into open access to the underlying data? The Getty vocabularies are far richer, semantically, than LCSH; within their domain, they'd be a great deal more useful as linked data than LCSH is in its id.loc.gov incarnation. I see no reason why publishing the Getty vocabularies as open linked data should disrupt their business model as a whole, either--they could continue to license their data to the commercial vendors who use them in, e.g., collection management systems, while providing this service to the community at large. On 4/19/2010 1:03 PM, Ethan Gruber wrote: They wanted at least $1000 for the geographic terms. Doesn't sound very reasonable to me, to be honest, especially since I was considering developing an application based on their own CDWA schema. On Mon, Apr 19, 2010 at 12:08 PM, Cory Rockliffrockl...@bgc.bard.eduwrote: Actually, their licensing terms for non-profits are very reasonable. On 4/19/2010 11:43 AM, Ethan Gruber wrote: I wonder how many thousands of dollars they will charge to use this. On Mon, Apr 19, 2010 at 11:26 AM, Mark A. Matienzom...@matienzo.org wrote: -- Forwarded message -- From: Erin Coburnecob...@getty.edu Date: Mon, Apr 19, 2010 at 9:54 AM The Museum Computer Network (MCN), Gallery Systems, and the J. Paul Getty Trust are pleased to offer a free Webinar on a new vocabulary under development, the Cultural Objects Name Authority™ (CONA). Introducing the Getty’s new Cultural Objects Name Authority™ (CONA) Tuesday, May 4, 2010 11:30 AM - 1:00 PM EDT The Cultural Objects Name Authority™ (CONA) is a new Getty vocabulary currently under development. It is scheduled for introduction to the contributor community in 2011. CONA will join the other three Getty vocabularies, the Art Architecture Thesaurus® (AAT), the Getty Thesaurus of Geographic Names® (TGN), and the Union List of Artist Names® (ULAN), as a tool for cataloging and retrieval of art information. CONA will contain titles, current location, and other core information for cultural works. The scope of CONA will include architecture and movable works such as paintings, sculpture, prints, drawings, manuscripts, photographs, ceramics, textiles, furniture, and archaeological artifacts. Murtha Baca, Head of Digital Art History Access at the Getty Research Institute, and Patricia Harpring, Managing Editor of the Getty Vocabulary Program, will present an introduction to CONA and will be available for questions. To register, please go to: https://www2.gotomeeting.com/register/307938058 --- [This E-mail scanned for viruses by Declude Virus] -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus] --- [This E-mail scanned for viruses by Declude Virus] -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus]
Re: [CODE4LIB] Fwd: Webinar: Introducing Cultural Objects Name Authority (CONA)
On 4/19/2010 3:02 PM, Cowles, Esme wrote: So of course I'd love them to offer it for free. But realistically, it probably cost them a fortune to develop, and they've got to recoup that somehow. Yes, but I can't imagine they're recouping much from licensing to non-profits--surely the real revenue is generated by licensing to commercial systems vendors. I would think that open access to the vocabularies = development of useful tools around them by third parties = wider adoption of Getty vocabularies = greater collective stake in them = greater likelihood that other institutions will step in to ensure they're maintained. Perhaps there are other issues here, though. -- Cory Rockliff Technical Services Librarian Bard Graduate Center: Decorative Arts, Design History, Material Culture 18 West 86th Street New York, NY 10024 T: (212) 501-3037 rockl...@bgc.bard.edu --- [This E-mail scanned for viruses by Declude Virus]
Re: [CODE4LIB] yaoss4ll
How about putting the data into freebase? http://www.freebase.com/ That would combine the write-access of a wiki with the structure of a database. I was getting ready to compile a very similar dataset myself, so I'd be happy to do some of the requisite munging to get the data into freebase, if the idea appeals to anyone. On 12/22/2009 1:25 PM, John Fereira wrote: Jonathan Rochkind wrote: Putting it on a wiki anyone can edit makes it, perhaps, somewhat more likely that it ends up maintained longer, making it easier for other people to get involved in maintaining it without technological barriers or proprietary feelings getting in the way. I was thinking of something more along the lines of putting it into a CMS (i.e. Drupal) so that voting/ranking/tagging tools could be used to allow the community to rate the viability and discovery of each item. I may actually need to do something like this but for a different domain on a project that I'll be working on over the next year. attachment: rockliff.vcf
Re: [CODE4LIB] character-sets for dummies?
If you're looking for a book-length treatment, 'Unicode Explained' is fairly readable, and the first three chapters are about character encodings in general: http://books.google.com/books?id=PcWU2yxc8WkCprintsec=frontcover On 12/16/2009 12:02 PM, Ken Irwin wrote: Hi all, I'm looking for a good source to help me understand character sets and how to use them. I pretty much know nothing about this - the whole world of Unicode, ASCII, octal, UTF-8, etc. is baffling to me. My immediate issue is that I think I need to integrate data from a variety of character sets into one MySQL table - I expect I need some way to convert from one to another, but I don't really even know how to tell which data are in which format. Our homegrown journal list (akin to SerialsSolutions) includes data ingested from publishers, vendors, the library catalog (III), etc. When I look at the data in emacs, some of it renders like this: Revista de Oncolog\303\255a [slashes-and-digits instead of diacritics] And other data looks more like: Revista de Música Latinoamericana[weird characters instead of diacritics] My MySQL table is currently set up with the collation set to: utf8-bin , and the titles from the second category (weird characters display in emacs) render properly when the database data is output to the a web browser. The data from the former example (\###) renders as an I don't know what character this is placeholder in Firefox and IE. So, can someone please point me toward any or all of the following? · A good primer for understanding all of this stuff · A method for converting all of my data to the same character set so it plays nicely in the database · The names of which character-sets I might be working with here Many thanks! Ken --- [This E-mail scanned for viruses by Declude Virus] attachment: rockliff.vcf