Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Bob Duncan
At 06:52 PM 05/08/2008, Tim wrote: So, I took a long slow look at ten of the examples from Godmar's file. Nothing I saw disabused me of my opinion: "No preview" pages on Google Book Search are very weak tea. Are they worthless? Not always. But they usually are. And, unfortunately, you generally

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Gabriel Sean Farrell
On Fri, May 09, 2008 at 11:58:03AM -0700, Casey Durfee wrote: > On Fri, May 9, 2008 at 11:14 AM, Bess Sadler <[EMAIL PROTECTED]> wrote: > > > > > Casey, you say you're getting indexing times of 1000 records / > > second? That's amazing! I really have to take a closer look at > > MarcThing. Could py

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Casey Durfee
On Fri, May 9, 2008 at 11:14 AM, Bess Sadler <[EMAIL PROTECTED]> wrote: > > Casey, you say you're getting indexing times of 1000 records / > second? That's amazing! I really have to take a closer look at > MarcThing. Could pymarc really be that much faster than marc4j? Or > are we comparing apples

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Jason Ronallo
On Fri, May 9, 2008 at 2:23 PM, Joe Hourcle <[EMAIL PROTECTED]> wrote: > OpenLibrary has other datasets that you might be able to use / combine / > whatever to meet your requirements: > > http://openlibrary.org/dev/docs/data This'll get you the other MARC dumps that have been made available

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Joe Hourcle
On Fri, 9 May 2008, Bess Sadler wrote: Those of us involved in the Blacklight and VuFind projects are spending lots of time recently thinking about marc records indexing. We're about to start running some performance tests, and we want to create unit tests for our marc to solr indexer, and also

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Bess Sadler
On May 9, 2008, at 1:42 PM, Jonathan Rochkind wrote: The Blacklight code is not currently using XML or XSLT. It's indexing binary MARC files. I don't know it's speed, but I hear it's pretty fast. Right, I'm talking about the java indexer we're working on, which we're hoping to turn into a plug

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Jonathan Rochkind
The Blacklight code is not currently using XML or XSLT. It's indexing binary MARC files. I don't know it's speed, but I hear it's pretty fast. But for the kind of test set I want, even waiting half an hour is too long. I want a test set where I can make a change to my indexing configuration and t

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Jonathan Rochkind
I think you start with a smaller set, but then when you find idiosyncratic records that were NOT represented in your smaller set, you add representative samples to the sample set. The sample set organically grows. Certainly at some point you've got to test on a larger set too. But I think there's

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Kyle Banerjee
> Sounds like you have some experience of this, Kyle! > Do you have a list of "the screwball stuff"? Even an offhand one would > be interesting... I don't have the list with me, but just to rattle a few things off, some extra short records rank high because so much of a search term matches the who

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Dan Scott
I agree with Kyle that a big, wide set of records is better for testing purposes. In processing records for Evergreen imports, I've found that there are often just a handful that throw marc4j for a loop. I suppose I should cull those and attach them to bug reports... instead I've taken the path

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Casey Durfee
I strongly agree that we need something like this. The LoC records that Casey donated are a great resource but far from ideal from this purpose. They're pretty homogeneous. I do think it needs to be bigger than 10,000 though. 100,000 would be a better target. And I would like to see a UNIMARC/D

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Jodi Schneider
>This is much harder to do than might appear on the surface. 10K is a >really small set, and the issue is that unless people know how to >create a set that has really targets the problem areas, you will >inevitably miss important stuff. At the end of the day, it's the >screwball stuff you didn't th

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Walter Lewis
Bess Sadler wrote: 3. Are there features missing from the above list that would make this more useful? One of the things that Bill Moen showed at Access a couple of years ago (Edmonton?) was what he and others were calling a "radioactive" Marc record. One that had no "normal" payload but, IIRC

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Kyle Banerjee
> According to the combined brainstorming of Jonathan Rochkind and > myself, the ideal record set should: > > 1. contain about 10k records, enough to really see the features, but > small enough that you could index it in a few minutes on a typical > desktop... > 5. contain a distribution of typica

[CODE4LIB] iteration. interface ideas? unintended consequences.

2008-05-09 Thread Jodi Schneider
>What I dislike here is your abnegation of the responsibility to care >about the choices students make. If you're not considering the value >of all resources-including the book-you're not playing the library >game, the educator game or the Google game. You're just throwing stuff >on screens because

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Tim Spalding
> I agree that showing the user evaluative resources that are not any good > is not a service to the user. When there are no good evaluative > resources available, we should not show bad ones to the user. I think we actually agree on what should happen. We disagree on the theory behind that :) >

[CODE4LIB] marc records sample set

2008-05-09 Thread Bess Sadler
Those of us involved in the Blacklight and VuFind projects are spending lots of time recently thinking about marc records indexing. We're about to start running some performance tests, and we want to create unit tests for our marc to solr indexer, and also people wanting to download and play with

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Custer, Mark
Isn't the whole point of this to get the user to the book? Knowledge about the book should/will come from research and reading, not bad metadata... or even hastily automated extraneous info... and in fact, I'd say that most MARC metadata is only there to get a user to the book, not to describe i

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Larry Campbell
What I dislike here is your assumption that you know better than your users what's "good" for them/what they want/what they OUGHT to want/what they need/etc. Providing them with available information in a reasonably accessible way, and then trusting them -- whether they're undergrads, grad student

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Jonathan Rochkind
I agree that showing the user evaluative resources that are not any good is not a service to the user. When there are no good evaluative resources available, we should not show bad ones to the user. In either case though, with or without evaluative resources, we tell the user where the book is lo

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Tim Spalding
> Most of our users will start out in an electronic environment whether we like > it or not > (most of us on THIS list like it)---and will decide, based on what they > find there, on their own, without us making the decision for > them---whether to obtain (or attempt to obtain) a copy of the phys

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Jonathan Rochkind
And indeed, that is exactly how I plan to make use of the Google metadata link, and have been suggesting is the best way to make use of it since I entered this conversation: As part of a set of links to 'additional information' about a resource, with no special prominence given to the Google link

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Jonathan Rochkind
Ah, but in our actual world we _don't_ have two choices, to send the user to the physical book or to an electronic metadata surrogate. We _don't_ get to force the user to look at the book. Most of our users will start out in an electronic environment whether we like it or not (most of us on THIS l

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Tim Spalding
If the Google link were part of a much larger set of unstressed links, I'd be more inclined to favor it. Lots of linking is a good thing. But a single no-info Google link from a low-information OPAC page seems to compound the deficiencies of one paradigm with that of another. On the subject of "la

[CODE4LIB] Sakai and Emory Reserves Direct

2008-05-09 Thread Marianne Giltrud
A while ago someone posted to this list feedback on reserves applications. Emory Reserves Direct was one of the open source recommendations. We are migrating to from Blackboard to Sakai open source course management and would like to add some Copyright components. Is there anyone that has exper

Re: [CODE4LIB] coverage of google book viewability API

2008-05-09 Thread Custer, Mark
For the most part, I completely agree. That said, it's a very tangled web out there, and on occasion those "no preview" views can still lead a user to a "full view" that's offered elsewhere. Here's just one example: http://books.google.com/books?id=kdiYGQAACAAJ (from there, a user can click on t

Re: [CODE4LIB] Latest OpenLibrary.org release

2008-05-09 Thread Rob Sanderson
On Thu, 2008-05-08 at 11:41 -0400, Godmar Back wrote: > On Thu, May 8, 2008 at 11:25 AM, Dr R. Sanderson > <[EMAIL PROTECTED]> wrote: > > > > Like what? The current API seems to be concerned with search. Search > > is what SRU does well. If it was concerned with harvest, I (and I'm > > sure m