At 06:52 PM 05/08/2008, Tim wrote:
So, I took a long slow look at ten of the examples from Godmar's file.
Nothing I saw disabused me of my opinion: "No preview" pages on Google
Book Search are very weak tea.
Are they worthless? Not always. But they usually are. And,
unfortunately, you generally
On Fri, May 09, 2008 at 11:58:03AM -0700, Casey Durfee wrote:
> On Fri, May 9, 2008 at 11:14 AM, Bess Sadler <[EMAIL PROTECTED]> wrote:
>
> >
> > Casey, you say you're getting indexing times of 1000 records /
> > second? That's amazing! I really have to take a closer look at
> > MarcThing. Could py
On Fri, May 9, 2008 at 11:14 AM, Bess Sadler <[EMAIL PROTECTED]> wrote:
>
> Casey, you say you're getting indexing times of 1000 records /
> second? That's amazing! I really have to take a closer look at
> MarcThing. Could pymarc really be that much faster than marc4j? Or
> are we comparing apples
On Fri, May 9, 2008 at 2:23 PM, Joe Hourcle
<[EMAIL PROTECTED]> wrote:
> OpenLibrary has other datasets that you might be able to use / combine /
> whatever to meet your requirements:
>
> http://openlibrary.org/dev/docs/data
This'll get you the other MARC dumps that have been made available
On Fri, 9 May 2008, Bess Sadler wrote:
Those of us involved in the Blacklight and VuFind projects are
spending lots of time recently thinking about marc records indexing.
We're about to start running some performance tests, and we want to
create unit tests for our marc to solr indexer, and also
On May 9, 2008, at 1:42 PM, Jonathan Rochkind wrote:
The Blacklight code is not currently using XML or XSLT. It's indexing
binary MARC files. I don't know it's speed, but I hear it's pretty
fast.
Right, I'm talking about the java indexer we're working on, which
we're hoping to turn into a plug
The Blacklight code is not currently using XML or XSLT. It's indexing
binary MARC files. I don't know it's speed, but I hear it's pretty fast.
But for the kind of test set I want, even waiting half an hour is too
long. I want a test set where I can make a change to my indexing
configuration and t
I think you start with a smaller set, but then when you find
idiosyncratic records that were NOT represented in your smaller set, you
add representative samples to the sample set. The sample set organically
grows.
Certainly at some point you've got to test on a larger set too. But I
think there's
> Sounds like you have some experience of this, Kyle!
> Do you have a list of "the screwball stuff"? Even an offhand one would
> be interesting...
I don't have the list with me, but just to rattle a few things off,
some extra short records rank high because so much of a search term
matches the who
I agree with Kyle that a big, wide set of records is better for testing
purposes. In processing records for Evergreen imports, I've found that there
are often just a handful that throw marc4j for a loop. I suppose I should cull
those and attach them to bug reports... instead I've taken the path
I strongly agree that we need something like this. The LoC records that
Casey donated are a great resource but far from ideal from this purpose.
They're pretty homogeneous. I do think it needs to be bigger than 10,000
though. 100,000 would be a better target. And I would like to see a
UNIMARC/D
>This is much harder to do than might appear on the surface. 10K is a
>really small set, and the issue is that unless people know how to
>create a set that has really targets the problem areas, you will
>inevitably miss important stuff. At the end of the day, it's the
>screwball stuff you didn't th
Bess Sadler wrote:
3. Are there features missing from the above list that would make
this more useful?
One of the things that Bill Moen showed at Access a couple of years ago
(Edmonton?) was what he and others were calling a "radioactive" Marc
record. One that had no "normal" payload but, IIRC
> According to the combined brainstorming of Jonathan Rochkind and
> myself, the ideal record set should:
>
> 1. contain about 10k records, enough to really see the features, but
> small enough that you could index it in a few minutes on a typical
> desktop...
> 5. contain a distribution of typica
>What I dislike here is your abnegation of the responsibility to care
>about the choices students make. If you're not considering the value
>of all resources-including the book-you're not playing the library
>game, the educator game or the Google game. You're just throwing stuff
>on screens because
> I agree that showing the user evaluative resources that are not any good
> is not a service to the user. When there are no good evaluative
> resources available, we should not show bad ones to the user.
I think we actually agree on what should happen. We disagree on the
theory behind that :)
>
Those of us involved in the Blacklight and VuFind projects are
spending lots of time recently thinking about marc records indexing.
We're about to start running some performance tests, and we want to
create unit tests for our marc to solr indexer, and also people
wanting to download and play with
Isn't the whole point of this to get the user to the book? Knowledge
about the book should/will come from research and reading, not bad
metadata... or even hastily automated extraneous info... and in fact,
I'd say that most MARC metadata is only there to get a user to the book,
not to describe i
What I dislike here is your assumption that you know better than your
users what's "good" for them/what they want/what they OUGHT to want/what
they need/etc. Providing them with available information in a reasonably
accessible way, and then trusting them -- whether they're undergrads,
grad student
I agree that showing the user evaluative resources that are not any good
is not a service to the user. When there are no good evaluative
resources available, we should not show bad ones to the user.
In either case though, with or without evaluative resources, we tell the
user where the book is lo
> Most of our users will start out in an electronic environment whether we like
> it or not
> (most of us on THIS list like it)---and will decide, based on what they
> find there, on their own, without us making the decision for
> them---whether to obtain (or attempt to obtain) a copy of the phys
And indeed, that is exactly how I plan to make use of the Google
metadata link, and have been suggesting is the best way to make use of
it since I entered this conversation: As part of a set of links to
'additional information' about a resource, with no special prominence
given to the Google link
Ah, but in our actual world we _don't_ have two choices, to send the
user to the physical book or to an electronic metadata surrogate. We
_don't_ get to force the user to look at the book. Most of our users
will start out in an electronic environment whether we like it or not
(most of us on THIS l
If the Google link were part of a much larger set of unstressed links,
I'd be more inclined to favor it. Lots of linking is a good thing. But
a single no-info Google link from a low-information OPAC page seems to
compound the deficiencies of one paradigm with that of another.
On the subject of "la
A while ago someone posted to this list feedback on reserves
applications. Emory Reserves Direct was one of the open source
recommendations. We are migrating to from Blackboard to Sakai open
source course management and would like to add some Copyright
components. Is there anyone that has exper
For the most part, I completely agree. That said, it's a very tangled
web out there, and on occasion those "no preview" views can still lead a
user to a "full view" that's offered elsewhere. Here's just one
example:
http://books.google.com/books?id=kdiYGQAACAAJ
(from there, a user can click on t
On Thu, 2008-05-08 at 11:41 -0400, Godmar Back wrote:
> On Thu, May 8, 2008 at 11:25 AM, Dr R. Sanderson
> <[EMAIL PROTECTED]> wrote:
> >
> > Like what? The current API seems to be concerned with search. Search
> > is what SRU does well. If it was concerned with harvest, I (and I'm
> > sure m
27 matches
Mail list logo