On 12/6/2013 11:12 PM, Kevin M Randall wrote:
<snip>
James Weinheimer wrote:
To be fair, the original version of FRBR came out before (or at least
not long afterward) the huge abandonment by the public of our OPACs.
Google had barely even begun to exist when FRBR appeared. Still, there
could have been a chapter on the newest developments back then. But
even
today, nowhere in it is there the slightest mention of "keyword" or
"relevance ranking" much less anything about Web2.0 or the semantic
web
or linked data or full-text or Lucene indexing (like what we see in the
Worldcat displays). It's as if those things never happened.
There's no mention of that stuff because it is *irrelevant* to what FRBR is
about. It has absolutely nothing to do with what technologies or techniques
are being used to access the data. It's about the *data itself* that are
objects of those keyword searches, or relevance raking, or Lucene indexing, or
whatever other as-yet-undeveloped means of discovery there may be. How many
times does this have to be said?
</snip>
There is one point where we can agree: it is irrelevant. And that is
precisely why FRBR is also irrelevant to how the vast majority of the
public searches every single day. It is also irrelevant to implementing
the user tasks, since those can be done today. FRBR is irrelevant for
linked data. Also (apparently) irrelevant is how much it will cost to
change to FRBR structures.
But saying that FRBR is about the data itself, I must disagree. We have
gobs of data now, and it is already deeply structured. FRBR does not
change any of that. There will still be the same data and it will still
be as deeply structured. FRBR instead offers an alternative data *model*
that is designed for *relational databases*. We currently have another
model where all the bibliographic information is put into a single
"manifestation" record and holdings information goes into another
record. FRBR proposes to take out data that is now in the
"manifestation" record and put certain parts of it into a "work"
instance, while other data will go into an "expression" instance.
So why did they want to do that? Designers of relational databases want
to make their databases as efficient as they can, and one way to do that
is by eliminating as much duplication as possible. This is what FRBR
proposes. It is clearest to show this with an example: Currently if we
have a non-fiction book with multiple manifestations and this book has
three subject headings, the subjects will be repeated in each
manifestation record. With FRBR, the subjects will all go into the
*work* instance, and as a result, each manifestation does not need
separate subjects because the manifestation will reference the work
instance and get the subjects in that way.
What is the advantage? A few. First, the size of the database is reduced
(very important with relational databases!), plus if you want to change
something, such as add a new subject, you would add that subject only
once into the work instance and that extra subject would automatically
be referenced in all the manifestations. The same goes for deleting
subjects or adding or deleting creators. Nevertheless, the *data itself*
remains unchanged and there is not even any additional access with the
FRBR data model. It simply posits an alternative data *model* and one
that I agree would be *far more* efficient in a relational database. But
as I have been at pains to point out, something that may at first seem
rather benign such as introducing a new data model, has many serious
consequences that should be considered before adopting such a model.
Something that makes the database designers happy may be a monster for
everyone who uses it: both the people who input into the database and
the people who search it. But the designers remain happy. This is what I
say we are looking at now with FRBR.
Strangely enough, we have different technology today, with Lucene-type
indexing such as we see in Google and Worldcat with the facets and
everything is flattened out into different indexes, since this is how
the indexing works. (The best explanation I have found so far is at
http://www.slideshare.net/mcjenkins/the-search-engine-index-presentation
but it also becomes pretty dense pretty quickly) Essentially what Lucene
does is make an index (much like the index at the back of a book) out of
the documents it finds. It indexes text by word, by phrase, and other
ways as well. It also adds links to each document where the index term
has been used and ranks each term using various methods.
The advantage is: when you do a search, it does not have to scan through
the entire database (like a relational database does), it just looks up
your terms in its index, collates them together and presents the
searcher with the result, and it does this blazingly fast as anybody can
see when they search Google. The Google index is over 100,000,000
gigabytes!
http://www.google.it/intl/en/insidesearch/howsearchworks/crawling-indexing.html
If we want to discuss the usefulness of our catalog records as data,
that is indeed a very interesting topic and I have discussed that in my
podcast: "Cataloging Matters No. 17: Catalog Records as Data"
http://blog.jweinheimer.net/2013/01/cataloging-matters-no-17-catalog.html
--
James Weinheimer weinheimer.ji...@gmail.com
First Thus http://catalogingmatters.blogspot.com/
First Thus Facebook Page https://www.facebook.com/FirstThus
Cooperative Cataloging Rules
http://sites.google.com/site/opencatalogingrules/
Cataloging Matters Podcasts
http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html