Re: [RDA-L] Access to the knowledge of cataloging

James Weinheimer Sat, 07 Dec 2013 05:08:12 -0800

On 12/6/2013 11:12 PM, Kevin M Randall wrote:
<snip>

James Weinheimer wrote:

To be fair, the original version of FRBR came out before (or at least
not long afterward) the huge abandonment by the public of our OPACs.
Google had barely even begun to exist when FRBR appeared. Still, there
could have been a chapter on the newest developments back then. But
even
today, nowhere in it is there the slightest mention of "keyword" or
"relevance ranking" much less anything about Web2.0 or the semantic
web
or linked data or full-text or Lucene indexing (like what we see in the
Worldcat displays). It's as if those things never happened.

There's no mention of that stuff because it is *irrelevant* to what FRBR is 
about.  It has absolutely nothing to do with what technologies or techniques 
are being used to access the data.  It's about the *data itself* that are 
objects of those keyword searches, or relevance raking, or Lucene indexing, or 
whatever other as-yet-undeveloped means of discovery there may be.  How many 
times does this have to be said?

</snip>

There is one point where we can agree: it is irrelevant. And that isprecisely why FRBR is also irrelevant to how the vast majority of thepublic searches every single day. It is also irrelevant to implementingthe user tasks, since those can be done today. FRBR is irrelevant forlinked data. Also (apparently) irrelevant is how much it will cost tochange to FRBR structures.

But saying that FRBR is about the data itself, I must disagree. We havegobs of data now, and it is already deeply structured. FRBR does notchange any of that. There will still be the same data and it will stillbe as deeply structured. FRBR instead offers an alternative data *model*that is designed for *relational databases*. We currently have anothermodel where all the bibliographic information is put into a single"manifestation" record and holdings information goes into anotherrecord. FRBR proposes to take out data that is now in the"manifestation" record and put certain parts of it into a "work"instance, while other data will go into an "expression" instance.

So why did they want to do that? Designers of relational databases wantto make their databases as efficient as they can, and one way to do thatis by eliminating as much duplication as possible. This is what FRBRproposes. It is clearest to show this with an example: Currently if wehave a non-fiction book with multiple manifestations and this book hasthree subject headings, the subjects will be repeated in eachmanifestation record. With FRBR, the subjects will all go into the*work* instance, and as a result, each manifestation does not needseparate subjects because the manifestation will reference the workinstance and get the subjects in that way.

What is the advantage? A few. First, the size of the database is reduced(very important with relational databases!), plus if you want to changesomething, such as add a new subject, you would add that subject onlyonce into the work instance and that extra subject would automaticallybe referenced in all the manifestations. The same goes for deletingsubjects or adding or deleting creators. Nevertheless, the *data itself*remains unchanged and there is not even any additional access with theFRBR data model. It simply posits an alternative data *model* and onethat I agree would be *far more* efficient in a relational database. Butas I have been at pains to point out, something that may at first seemrather benign such as introducing a new data model, has many seriousconsequences that should be considered before adopting such a model.Something that makes the database designers happy may be a monster foreveryone who uses it: both the people who input into the database andthe people who search it. But the designers remain happy. This is what Isay we are looking at now with FRBR.

Strangely enough, we have different technology today, with Lucene-typeindexing such as we see in Google and Worldcat with the facets andeverything is flattened out into different indexes, since this is howthe indexing works. (The best explanation I have found so far is athttp://www.slideshare.net/mcjenkins/the-search-engine-index-presentationbut it also becomes pretty dense pretty quickly) Essentially what Lucenedoes is make an index (much like the index at the back of a book) out ofthe documents it finds. It indexes text by word, by phrase, and otherways as well. It also adds links to each document where the index termhas been used and ranks each term using various methods.

The advantage is: when you do a search, it does not have to scan throughthe entire database (like a relational database does), it just looks upyour terms in its index, collates them together and presents thesearcher with the result, and it does this blazingly fast as anybody cansee when they search Google. The Google index is over 100,000,000gigabytes!http://www.google.it/intl/en/insidesearch/howsearchworks/crawling-indexing.html

If we want to discuss the usefulness of our catalog records as data,that is indeed a very interesting topic and I have discussed that in mypodcast: "Cataloging Matters No. 17: Catalog Records as Data"http://blog.jweinheimer.net/2013/01/cataloging-matters-no-17-catalog.html


--
James Weinheimer weinheimer.ji...@gmail.com
First Thus http://catalogingmatters.blogspot.com/
First Thus Facebook Page https://www.facebook.com/FirstThus

Cooperative Cataloging Ruleshttp://sites.google.com/site/opencatalogingrules/Cataloging Matters Podcastshttp://blog.jweinheimer.net/p/cataloging-matters-podcasts.html

Re: [RDA-L] Access to the knowledge of cataloging

Reply via email to