[CODE4LIB] Unexpected ruby-marc behavior

2011-01-27 Thread Cory Rockliff
So I was taking ruby-marc out for a spin in irb, and encountered a bit of a 
surprise. Running the following:

require 'marc'
reader = MARC::Reader.new('filename.mrc')
reader.each {|record| puts record['245']}

produces the expected result, but every subsequent call to reader.each 
{|record| puts record['245']} returns nil.

Am I missing something obvious? I don't remember this being the case before.

Thanks!

Cory

[running ruby-marc off the github repo / os x 10.6.5 / ruby 1.9.2 via rvm / 
rubygems via homebrew]


Re: [CODE4LIB] Unexpected ruby-marc behavior

2011-01-27 Thread Cory Rockliff
Oh, gotcha. Thanks.

C

On Jan 27, 2011, at 2:11 PM, Ross Singer wrote:

 No, that's expected behavior (and how it's always been).  You'd need
 to do reader.rewind to put your enumerator cursor back to 0 to run
 back over the records.
 
 It's basically an IO object (since that's what it expects as input)
 and behaves like one.
 
 -Ross.
 
 On Thu, Jan 27, 2011 at 2:03 PM, Cory Rockliff rockl...@bgc.bard.edu wrote:
 So I was taking ruby-marc out for a spin in irb, and encountered a bit of a 
 surprise. Running the following:
 
 require 'marc'
 reader = MARC::Reader.new('filename.mrc')
 reader.each {|record| puts record['245']}
 
 produces the expected result, but every subsequent call to reader.each 
 {|record| puts record['245']} returns nil.
 
 Am I missing something obvious? I don't remember this being the case before.
 
 Thanks!
 
 Cory
 
 [running ruby-marc off the github repo / os x 10.6.5 / ruby 1.9.2 via rvm / 
 rubygems via homebrew]
 
 ---
 [This E-mail scanned for viruses by Declude Virus]
 
 ---
 [This E-mail scanned for viruses by Declude Virus]
 
 

Cory Rockliff
Technical Services Librarian

Bard Graduate Center:
Decorative Arts, Design History, Material Culture
38 West 86th Street, New York, NY 10024
T 212 501 3037
E rockl...@bgc.bard.edu
W bgc.bard.edu/library

BGC Exhibitions:
In the Main Gallery: January 26, 2011–April 17, 2011
Cloisonné: Chinese Enamels from the Yuan, Ming, and Qing Dynasties

In the Focus Gallery: January 26, 2011–April 17, 2011 
Objects of Exchange: Social and Material Transformation on the 
Late-Nineteenth-Century Northwest Coast


Re: [CODE4LIB] MARCXML - What is it for?

2010-10-28 Thread Cory Rockliff
 I've only just had a chance to catch up on this thread. I'm not 
offended in the least by Turbomarc (anything round-trippable should 
serve just as well as an internal representation of MARC, right?), but I 
am a little puzzled--what are the 'special cases' alluded to in the blog 
post? When would there ever be a non-alphanumeric attribute value in 
MARCXML? Is this a non-MARC21 thing?


C

On 10/25/10 3:35 PM, MJ Suhonos wrote:

I'll just leave this here:

http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records

That trade-off ought to offend both camps, though I happen to think it's quite 
clever.

MJ

On 2010-10-25, at 3:22 PM, Eric Hellman wrote:


I think you'd have a very hard time demonstrating any speed advantage to MARC 
over MARCXML. XML parsers have been speed optimized out the wazoo; If there 
exists a MARC parser that has ever been speed-optimized without serious 
compromise, I'm sure someone on this list will have a good story about it.

On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote:


Dear Nate,

There is a trade-off: do you want very fast processing of data -  go for binary 
data. do you want to share your data globally easily in many (not per se library 
related) environments -  go for XML/RDF.
Open your data and do both :-)

Pat

Sent from my iPhone

On 25 Oct 2010, at 20:39, Nate Vacknjv...@wisc.edu  wrote:


Hi all,

I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.

In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
round-trippable.

What's the advantage to this? I can see using a human-readable format
for poorly-documented file formats -- they're relatively easy to read
and understand. But MARC is well, well-documented, with more than one
free implementation in cursory searching. And once you know a binary
file's format, it's no harder to parse than XML, and the data's
smaller and processing faster.

So... why the XML?

Curious,
-Nate

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

e...@hellman.net
http://go-to-hellman.blogspot.com/
@gluejar

---
[This E-mail scanned for viruses by Declude Virus]






--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

BGC Exhibitions:
In the Main Gallery:
January 26, 2011– April 17, 2011
Cloisonné: Chinese Enamels from the Yuan, Ming, and Qing Dynasties
Organized in collaboration with the Musée des arts Décoratifs, Paris.
In the Focus Gallery:
January 26, 2011– April 17, 2011
Objects of Exchange: Social and Material Transformation on the 
Late-Nineteenth-Century Northwest Coast
Organized in collaboration with the American Museum of Natural History

---
[This E-mail scanned for viruses by Declude Virus]


Re: [CODE4LIB] DIY aggregate index

2010-07-01 Thread Cory Rockliff
I'm planning on moving ahead with a proof-of-concept in the next year, 
after which I will certainly consider writing it up.


I really hope I can get the go-ahead from database vendors. It's good to 
hear that a few institutions have successfully negotiated with 
them--anyone from Los Alamos, the Scholars Portals, or any other local 
indexers feel free to give me pointers on smooth-talking the vendors! :)


I also hope you're wrong in maintaining, in the article you linked to, 
that using controlled vocabularies for retrieval will never work well 
across databases that use different vocabularies. The (admittedly 
arduous and complex) work of crosswalking library-created controlled 
vocabularies like LCSH to periodical index thesauri and other formal and 
less-formal indexing languages out in the wild is *exactly* what I think 
librarians should be spending their time doing. Catalogers (and I 
include myself) spend a lot of time making largely irrelevant tweaks to 
already-existing MARC records before exporting them into our local 
ILSes, but article-level metadata from vendors is generally served up to 
the user as-is.


I think Roy Tennant, as quoted in your article, is spot-on when he says 
that our inability to do any preprocessing of the data is a major 
hindrance. The data sources we subscribe to should be seen as starting 
points for generating a user experience, rather than letting the vendors 
decide what the discovery process is going to be like.


Cory

On 7/1/2010 11:39 AM, Jonathan Rochkind wrote:
I am eager to see you try it, Cory. Please consider writing up your 
results for the Code4Lib Journal. I'd be curious to hear the complete 
story, from issues of getting metadata, to issues of the technical 
infrastructure, any metadata normalization you need to do, issues of 
continuing to get the metadata on a regular basis, etc.
Whether you succeed or fail, but especially if you succeed, your 
project with just a couple databases could serve as a useful pilot 
for people considering doing it with more.


Jonathan


--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]


[CODE4LIB] DIY aggregate index

2010-06-30 Thread Cory Rockliff
You know, this leads into something I've been wondering about. You'll 
all have to pardon my ignorance, as I've never worked in a library with 
functioning management of e-resources.


Do libraries opt for these commercial 'pre-indexed' services simply 
because they're a good value proposition compared to all the work of 
indexing multiple resources from multiple vendors into one local index, 
or is it that companies like iii and Ex Libris are the only ones with 
enough clout to negotiate access to otherwise-unavailable database 
vendors' content?


Can I assume that if a database vendor has exposed their content to me 
as a subscriber, whether via z39.50 or a web service or whatever, that 
I'm free to cache and index all that metadata locally if I so choose? Is 
this something to be negotiated on a vendor-by-vendor basis, or is it an 
impossibility?


Cory

On 6/30/2010 12:37 PM, Walker, David wrote:

Hi Cindy,

Both the Ebsco and Proquest APIs are definitely available to customers.  We're 
using the Ebsco one in our Xerxes application, for example.  ( I'll send you a 
link off-list, Cindy.)

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Cindy Harper 
[char...@colgate.edu]
Sent: Wednesday, June 30, 2010 9:11 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Innovative's Synergy

Hi All - III is touting their web-services based Synergy product as having
the efficiency of a pre-indexed service and the timeliness of a just-in-time
service.  Does anyone know if the agreements they have made with database
vendors to use these web services preclude an organization developing an
open-source client to take advantage of those web services?  Just curious.
I suppose I should direct my question to EBSCO and Proquest directly.


Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363
---
[This E-mail scanned for viruses by Declude Virus]



   



--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]


Re: [CODE4LIB] DIY aggregate index

2010-06-30 Thread Cory Rockliff
Well, this is the thing: we're a small, highly-specialized collection, 
so I'm not talking about indexing the whole range of content which a 
university like JHU or even a small liberal arts college would need 
to--it's really a matter of a few key databases in our field(s). Don't 
get me wrong, it's still a slightly crazy idea, but I'm dissatisfied 
enough with existing solutions that I'd like to try it.


On 6/30/2010 4:15 PM, Jonathan Rochkind wrote:
A little bit of both, I think. A library probably _could_ negotiate 
access to that content... but it would be a heck of a lot of work. 
When the staff time to negotiations come in, it becomes a good value 
proposition, regardless of how much the licensing would cost you.  And 
yeah, then the staff time to actually ingest and normalize and 
troubleshoot data-flows for all that stuff on the regular basis -- 
I've heard stories of libraries that tried to do that in the early 90s 
and it was nightmarish.


I wonder if they would, in fact, demand licensing fees. I mean, we're 
already paying a subscription, and they're already exposing their 
content as a target for federated search applications (which probably do 
some caching for performance)...
So, actually, I guess i've arrived at convincing myself it's mostly 
good value proposition, in that a library probably can't afford to 
do that on their own, with or without licensing issues.

--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]


Re: [CODE4LIB] DIY aggregate index

2010-06-30 Thread Cory Rockliff
We're looking at an infrastructure based on Marklogic running on Amazon 
EC2, so the scale of data to be indexed shouldn't actually be that big 
of an issue. Also, as I said to Jonathan, I only see myself indexing a 
handful of highly-relevant resources, so we're talking millions, rather 
than 100s of millions, of records.


On 6/30/2010 4:22 PM, Walker, David wrote:

You might also need to factor in an extra server or three (in the cloud or 
otherwise) into that equation, given that we're talking 100s of millions of 
records that will need to be indexed.

   

companies like iii and Ex Libris are the only ones with
enough clout to negotiate access
 

I don't think III is doing any kind of aggregated indexing, hence their 
decision to try and leverage APIs.  I could be wrong.

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan 
Rochkind [rochk...@jhu.edu]
Sent: Wednesday, June 30, 2010 1:15 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] DIY aggregate index

Cory Rockliff wrote:
   

Do libraries opt for these commercial 'pre-indexed' services simply
because they're a good value proposition compared to all the work of
indexing multiple resources from multiple vendors into one local index,
or is it that companies like iii and Ex Libris are the only ones with
enough clout to negotiate access to otherwise-unavailable database
vendors' content?

 

A little bit of both, I think. A library probably _could_ negotiate
access to that content... but it would be a heck of a lot of work. When
the staff time to negotiations come in, it becomes a good value
proposition, regardless of how much the licensing would cost you.  And
yeah, then the staff time to actually ingest and normalize and
troubleshoot data-flows for all that stuff on the regular basis -- I've
heard stories of libraries that tried to do that in the early 90s and it
was nightmarish.

So, actually, I guess i've arrived at convincing myself it's mostly
good value proposition, in that a library probably can't afford to do
that on their own, with or without licensing issues.

But I'd really love to see you try anyway, maybe I'm wrong. :)

   

Can I assume that if a database vendor has exposed their content to me
as a subscriber, whether via z39.50 or a web service or whatever, that
I'm free to cache and index all that metadata locally if I so choose? Is
this something to be negotiated on a vendor-by-vendor basis, or is it an
impossibility?

 

I doubt you can assume that.  I don't think it's an impossibility.

Jonathan
---
[This E-mail scanned for viruses by Declude Virus]



   



--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]


Re: [CODE4LIB] Fwd: Webinar: Introducing Cultural Objects Name Authority (CONA)

2010-04-19 Thread Cory Rockliff

Actually, their licensing terms for non-profits are very reasonable.

On 4/19/2010 11:43 AM, Ethan Gruber wrote:

I wonder how many thousands of dollars they will charge to use this.

On Mon, Apr 19, 2010 at 11:26 AM, Mark A. Matienzom...@matienzo.orgwrote:

   

-- Forwarded message --
From: Erin Coburnecob...@getty.edu
Date: Mon, Apr 19, 2010 at 9:54 AM

The Museum Computer Network (MCN), Gallery Systems, and the J. Paul
Getty Trust are pleased to offer a free Webinar on a new vocabulary
under development, the Cultural Objects Name Authority™ (CONA).

Introducing the Getty’s new Cultural Objects Name Authority™ (CONA)
Tuesday, May 4, 2010 11:30 AM - 1:00 PM EDT

The Cultural Objects Name Authority™ (CONA) is a new Getty vocabulary
currently under development. It is scheduled for introduction to the
contributor community in 2011. CONA will join the other three Getty
vocabularies, the Art  Architecture Thesaurus® (AAT), the Getty
Thesaurus of Geographic Names® (TGN), and the Union List of Artist Names®
(ULAN), as a tool for cataloging and retrieval of art information. CONA
will contain titles, current location, and other core information for
cultural works. The scope of CONA will include architecture and movable
works such as paintings, sculpture, prints, drawings, manuscripts,
photographs, ceramics, textiles, furniture, and archaeological
artifacts. Murtha Baca, Head of Digital Art History Access at the Getty
Research Institute, and Patricia Harpring, Managing Editor of the Getty
Vocabulary Program, will present an introduction to CONA and will be
available for questions.

To register, please go to:
https://www2.gotomeeting.com/register/307938058

 

---
[This E-mail scanned for viruses by Declude Virus]



   



--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]


Re: [CODE4LIB] Fwd: Webinar: Introducing Cultural Objects Name Authority (CONA)

2010-04-19 Thread Cory Rockliff
I believe that's $1000 for a five-year license, or $200 a year, for 
unlimited use of the data as an XML download and/or as a web service. 
That compares pretty favorably to, e.g., $325 / year minimum for access 
to RDA Toolkit.


The real question here, I think, is not whether the price is right, but 
whether licensing of this sort is the best course for the Getty to 
pursue. They do already provide free access to their vocabularies in 
human-readable form--why not expand that into open access to the 
underlying data? The Getty vocabularies are far richer, semantically, 
than LCSH; within their domain, they'd be a great deal more useful as 
linked data than LCSH is in its id.loc.gov incarnation.


I see no reason why publishing the Getty vocabularies as open linked 
data should disrupt their business model as a whole, either--they could 
continue to license their data to the commercial vendors who use them 
in, e.g., collection management systems, while providing this service to 
the community at large.



On 4/19/2010 1:03 PM, Ethan Gruber wrote:

They wanted at least $1000 for the geographic terms.  Doesn't sound very
reasonable to me, to be honest, especially since I was considering
developing an application based on their own CDWA schema.

On Mon, Apr 19, 2010 at 12:08 PM, Cory Rockliffrockl...@bgc.bard.eduwrote:

   

Actually, their licensing terms for non-profits are very reasonable.


On 4/19/2010 11:43 AM, Ethan Gruber wrote:

 

I wonder how many thousands of dollars they will charge to use this.

On Mon, Apr 19, 2010 at 11:26 AM, Mark A. Matienzom...@matienzo.org
   

wrote:
 



   

-- Forwarded message --
From: Erin Coburnecob...@getty.edu
Date: Mon, Apr 19, 2010 at 9:54 AM

The Museum Computer Network (MCN), Gallery Systems, and the J. Paul
Getty Trust are pleased to offer a free Webinar on a new vocabulary
under development, the Cultural Objects Name Authority™ (CONA).

Introducing the Getty’s new Cultural Objects Name Authority™ (CONA)
Tuesday, May 4, 2010 11:30 AM - 1:00 PM EDT

The Cultural Objects Name Authority™ (CONA) is a new Getty vocabulary
currently under development. It is scheduled for introduction to the
contributor community in 2011. CONA will join the other three Getty
vocabularies, the Art   Architecture Thesaurus® (AAT), the Getty
Thesaurus of Geographic Names® (TGN), and the Union List of Artist Names®
(ULAN), as a tool for cataloging and retrieval of art information. CONA
will contain titles, current location, and other core information for
cultural works. The scope of CONA will include architecture and movable
works such as paintings, sculpture, prints, drawings, manuscripts,
photographs, ceramics, textiles, furniture, and archaeological
artifacts. Murtha Baca, Head of Digital Art History Access at the Getty
Research Institute, and Patricia Harpring, Managing Editor of the Getty
Vocabulary Program, will present an introduction to CONA and will be
available for questions.

To register, please go to:
https://www2.gotomeeting.com/register/307938058



 

---
[This E-mail scanned for viruses by Declude Virus]





   


--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]

 

---
[This E-mail scanned for viruses by Declude Virus]



   



--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]


Re: [CODE4LIB] Fwd: Webinar: Introducing Cultural Objects Name Authority (CONA)

2010-04-19 Thread Cory Rockliff

On 4/19/2010 3:02 PM, Cowles, Esme wrote:

So of course I'd love them to offer it for free.  But realistically, it 
probably cost them a fortune to develop, and they've got to recoup that somehow.

Yes, but I can't imagine they're recouping much from licensing to 
non-profits--surely the real revenue is generated by licensing to 
commercial systems vendors.


I would think that open access to the vocabularies = development of 
useful tools around them by third parties = wider adoption of Getty 
vocabularies = greater collective stake in them = greater likelihood 
that other institutions will step in to ensure they're maintained.


Perhaps there are other issues here, though.

--
Cory Rockliff
Technical Services Librarian
Bard Graduate Center: Decorative Arts, Design History, Material Culture
18 West 86th Street
New York, NY 10024
T: (212) 501-3037
rockl...@bgc.bard.edu

---
[This E-mail scanned for viruses by Declude Virus]


Re: [CODE4LIB] yaoss4ll

2009-12-22 Thread Cory Rockliff

How about putting the data into freebase? http://www.freebase.com/

That would combine the write-access of a wiki with the structure of a 
database.


I was getting ready to compile a very similar dataset myself, so I'd be 
happy to do some of the requisite munging to get the data into freebase, 
if the idea appeals to anyone.


On 12/22/2009 1:25 PM, John Fereira wrote:

Jonathan Rochkind wrote:
Putting it on a wiki anyone can edit makes it, perhaps, somewhat more 
likely that it ends up maintained longer, making it easier for other 
people to get involved in maintaining it without technological 
barriers or proprietary feelings getting in the way.
I was thinking of something more along the lines of putting it into a 
CMS (i.e. Drupal) so that voting/ranking/tagging tools could be used 
to allow the community to rate the viability and discovery of each item.


I may actually need to  do something like this but for a different 
domain on a project that I'll be working on over the next year.


attachment: rockliff.vcf

Re: [CODE4LIB] character-sets for dummies?

2009-12-16 Thread Cory Rockliff
If you're looking for a book-length treatment, 'Unicode Explained' is 
fairly readable, and the first three chapters are about character 
encodings in general:


http://books.google.com/books?id=PcWU2yxc8WkCprintsec=frontcover

On 12/16/2009 12:02 PM, Ken Irwin wrote:

Hi all,

I'm looking for a good source to help me understand character sets and how to 
use them. I pretty much know nothing about this - the whole world of Unicode, 
ASCII, octal, UTF-8, etc. is baffling to me.

My immediate issue is that I think I need to integrate data from a variety of 
character sets into one MySQL table - I expect I need some way to convert from 
one to another, but I don't really even know how to tell which data are in 
which format.

Our homegrown journal list (akin to SerialsSolutions) includes data ingested 
from publishers, vendors, the library catalog (III), etc. When I look at the 
data in emacs, some of it renders like this:
  Revista de Oncolog\303\255a  [slashes-and-digits instead of 
diacritics]
And other data looks more like:
  Revista de Música Latinoamericana[weird characters instead of diacritics]

My MySQL table is currently set up with the collation set to: utf8-bin , and the titles 
from the second category (weird characters display in emacs) render properly when the 
database data is output to the a web browser. The data from the former example (\###) 
renders as an I don't know what character this is placeholder in Firefox and 
IE.

So, can someone please point me toward any or all of the following?

· A good primer for understanding all of this stuff

· A method for converting all of my data to the same character set so 
it plays nicely in the database

· The names of which character-sets I might be working with here

Many thanks!

Ken
---
[This E-mail scanned for viruses by Declude Virus]



   


attachment: rockliff.vcf