Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-16 Thread James Weinheimer
On 15/05/2012 17:53, Jonathan Rochkind wrote:
snip
 Frankly, I no longer have much confidence that the library cataloging
 community is capable of any necessary changes in any kind of timeline
 fast enough to save us.

 Those that believe no significant changes to library cataloging or
 metadata practices are neccesary will have a chance to see if they are
 right.

 I believe that inaction -- in ability to make significant changes in
 the way our data is currently recorded and maintained to accomodate
 contemporary needs -- will instead result in the end of the library
 cataloging/metadata tradition, and the end of library involvement in
 metadata control, if not the end of libraries.  I find it deeply
 depressing. But I no longer find much hope that any other outcome is
 possible, and begin to think any time I spend trying to help arrive at
 change is just wasted time.
/snip

I think many share your fears. I certainly do, but it is important not
to give up hope. The problem as I see it is that while everyone agrees
that we should move forward, we don't even know which direction
forward is. Some believe it is east, others west, others north, others
up, others down. Nobody knows. Is the basic problem in libraries the
way our data is currently recorded and maintained? For those who
believe this, then it would mean that if libraries changed their format
and cataloging practices, things would be better.

But this will be expensive and disruptive. That is a simple fact. And
undertaking something like that during such severe economic times makes
it even more difficult. So, it seems entirely logical that people ask
whether this *really will* help or whether those resources would be
better used to do something else. In fact, this is such a natural
question, not asking it makes people raise their eyebrows and wonder if
there really is an answer. This is why I keep raising the point of the
business case. It is a fundamental, basic task.

And another fact is, if we want to make our records more widely
available in types of formats that others could use, it can be done
right now. Harvard is doing it with their API:
http://blogs.law.harvard.edu/dplatechdev/2012/04/24/going-live-with-harvards-catalog/
They say their records are now available in JSON using schema.org, in DC
or in MARC, although all I have seen is MARC so far. Still, Kudos to
them! It is a wonderful beginning!

So it is a fact that the library community does not have to wait for
RDA, FRBR or even the changes to MARC to repurpose their data. Would it
be perfect? Of course not! When has that ever had anything to do with
anything? Everyone expects things to change constantly, especially
today. A few years of open development using tools such as this would
make the way forward much clearer than it is now. Then we could start
to see what the public wants and needs and begin to design for *them*
instead of for *us*. If we find that there is absolutely no interest in
open development of library tools, that would say a lot too.

To maintain that RDA and FRBR are going to make any difference to the
public, or that they are necessary to get into the barely-nascent and
highly controversial Linked Data, is simply too much to simply accept.
Each represents changes, that's for sure, but theoretical ones that
happen almost entirely behind the scenes, and all whose value has yet to
be proven. All this in spite of the incredible developments going on
right under our noses! Therefore, it seems only natural to question
whether RDA, FRBR and Linked Data truly represent the direction
forward or are they actually going in some other direction.

On a more positive note, I think there are incredible opportunities for
libraries and librarians today.

-- 
*James Weinheimer* weinheimer.ji...@gmail.com
*First Thus* http://catalogingmatters.blogspot.com/
*Cooperative Cataloging Rules*
http://sites.google.com/site/opencatalogingrules/
*Cataloging Matters Podcasts*
http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html


Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-16 Thread J. McRee Elrod
On 15/05/2012 17:53, Jonathan Rochkind wrote:

 Frankly, I no longer have much confidence that the library cataloging
 community is capable of any necessary changes in any kind of timeline
 fast enough to save us.

There is no question that change is needed.  The question is, are RDA
records coded in MARC21 the needed change?


   __   __   J. McRee (Mac) Elrod (m...@slc.bc.ca)
  {__  |   / Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
  ___} |__ \__


Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-15 Thread James Weinheimer
On 15/05/2012 02:52, Karen Coyle wrote:
snip
 let's say you have a record with 3 subject headings:

 Working class -- France
 Working class -- Dwellings -- France
 Housing -- France

 In a card catalog, these would result in 3 separate cards and
 therefore should you look all through the subject card catalog you
 would see the book in question 3 times.

 In a keyword search limited to subject headings, most systems would
 retrieve this record once and display it once. That has to do with how
 the DBMS resolves from indexes to records. So even though a keyword
 may appear more than once in a record, the record is only retrieved once. 
/snip

I don't believe that is correct. That kind of search result should be a
programming decision: whether to dedupe or not. It seems to me that a
record with France three times in the record could easily display
three times in a search result if you want it to. With relevance
ranking, or ranking by date, etc. it makes little sense to display the
same record three different times, although I am sure you could. Having
a record display more often makes sense only with some kind of browse
heading display but I have never seen that with a keyword result.

This is a great example of how our current subject heading strings just
don't function today, and they haven't ever since keyword was
introduced. Computerized records work much better with descriptors than
with traditional headings, for instance, your example would be something
like:
Topical Subjects: Working class, Dwellings, Housing
Geographic Subject Area: France.

Here, there is no question since France appears only once in the
subjects.

Seen in this light, our subject headings are obsolete but nevertheless,
I believe our subject headings with subdivisions provides important
options found nowhere else, as I tried to show in the posting I
mentioned in my previous message. But really, how the subject headings
function must be reconsidered from their foundations, otherwise they
really are obsolete.

The dictionary catalog really is dead, at least as concerns the public.

-- 
*James Weinheimer* weinheimer.ji...@gmail.com
*First Thus* http://catalogingmatters.blogspot.com/
*Cooperative Cataloging Rules*
http://sites.google.com/site/opencatalogingrules/
*Cataloging Matters Podcasts*
http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html


Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-15 Thread Brenndorfer, Thomas




From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle

Sent: May 14, 2012 8:53 PM

To: RDA-L@LISTSERV.LAC-BAC.GC.CA

Subject: Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] 
[BIBFRAME] RDA, DBMS and RDF



All that to say that if we are not going to display our records in 
alphabetical order by their headings, then I'm not sure if creating headings 
during cataloging makes all that much sense. Or at least, not the kinds of 
headings that we do create, which are designed to be viewed in alphabetical 
order. You are supposed to see Hamlet before you see



Hamlet. French.

Hamlet. German.

Hamlet. German. 1919



Maybe you don't see Hamlet first, but the logic of adding on to the right 
hand side of the heading implies that the order conveys something to the user 
that facilitates finding what he is looking for.



Thus, I question to creation of headings that are designed to be encountered 
in alphabetical order unless we adopt an ordered display around those 
headings. And if we think it is important to adopt such a display, we need to 
understand the implications for system design.





There are numerous effects of the alphabetical browse display of headings in 
online systems that force catalogers and systems designers to make all sorts of 
unexpected decisions and difficult choices and workarounds. And even at that, 
the conventions that bring us those headings are often found out of context. 
For example, some of those headings with extra bits at the end exist to 
differentiate entities, and otherwise appear arbitrary without much relation to 
the headings around them which omit the extra bits.



End-users have their complaints browsing a catalog index. They complain when 
they expect to find different records attached to each unique heading, but 
instead find that the record happened to have several headings that all began 
with the same words.



Multiple indexes in online catalogs fracture and distort the intended effect of 
browsing headings. For the four ILS's I've worked with and customized I've had 
to make choices about MARC index mapping that would mitigate these issues:



1.  Author Browse may or may not contain name-title headings for works and 
expressions. These headings could be pulled from related or analytical or 
series added entries. Should subject name-title headings be included? What 
about title SEE references to these headings? One system I used actually 
reconstructed the 1XX+240 heading on-the-fly. And what about persons and 
corporate bodies as subjects? Shouldn't the user benefit from seeing all 
related works together?

This is why FRBR is so important. So much of the indexing is built around a 
cacophony of different implicit relationships, with little that is explicit to 
the end-users in terms of building expectations of what should be found with 
what. Being clear about the relationships matter, because that information 
needs to survive as catalogs records and indexes are torn apart and rebuilt in 
any number of different ways - we can't assume the implicit logic that exists 
when all card catalog and heading data are found together in context.


2.  Title Browse often doesn't include authority information such as SEE and 
SEE ALSO references, so much of the information available in authority records 
is effectively lost. Should Title Browse draw in all titles, such as series 
titles or subject titles? I always mapped these together because I felt it 
wrong for an end-user to decide upon a title AND a relationship when searching 
(i.e., the end-user knows the title, but may not know it's a series title - why 
expect the end-user to be forced to choose between Title Browse and Series 
Browse?)


3.  Subject Browse - similar to the issue above about end-users being forced to 
choose indexes, an end-user needs to differentiate William Shakespeare as 
author from William Shakespeare as subject ahead of time to find all the 
records attached to that name. The records are not found together with a single 
search in many cases. In an early system I had with minimal authority control, 
there were actually two system generated authority records for William 
Shakespeare - one as an author and one as a subject. There is a benefit to 
maintenance when one record per entity is updated, but the end-user may not 
encounter all the benefits because of the bewildering choices of indexes and 
the truncated and chopped up displays of bibliographic and authority data in 
online catalogs.



Once web-based catalogs appeared, there were choices that could be made as well 
when a heading is clicked.



In the case of a related name-title work heading, I had three choices in one 
system:



A.  Click the heading and bring up only those records attached to the heading.

B.  Click the heading and have a keyword search initiated using all the words 
in the heading (not good with long and unique

Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-15 Thread Jonathan Rochkind

On 5/14/2012 8:52 PM, Karen Coyle wrote:


No, that is not what I meant. Of course you can retrieve records in a
given order, and we do all the time. It's about using the headings in
the MARC records to establish that order. So here's the question I put
to Mac:


Sure you can use the headings in the MARC records to establish record 
retrieval order in an rdbms.  All of our ILS/OPACs that return MARC 
records in headings order and are based on rdbms DO it.


If the literal headings aren't structured right so that the rdbms' 
natural order will be right, the standard software solution is to 
automatically construct a 'sort key' from the headings. This is a pretty 
standard solution used all the time in many scenarios, it's not a 
significant burden or problem.


I am a bit mystified by your arguments here about what rdbms can or 
can't do, and am not sure what you are trying to do with them.  They 
don't match what software engineers using rdbms actually do.  Also, you 
keep saying dbms (database management system), when I think you mean 
to be specifically talking about rdbms (RELATIONAL dbms); dbms is a more 
general term that can apply to just about anything that stores data 
persistently, but your arguments (which I don't agree with) seem to 
specifically be about databases that use SQL and are based on relational 
algebra -- that's rdbms specifically, not 'dbms'.


I certainly agree that the way our data is currently recorded and 
maintained in MARC is not suitable for contemporary desired uses, as 
I've suggested many times before on this list and others and tried to 
explain why; it's got little to do with rdbms though.


Jonathan


Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-15 Thread James Weinheimer
On 15/05/2012 16:50, Jonathan Rochkind wrote:
snip
 I certainly agree that the way our data is currently recorded and
 maintained in MARC is not suitable for contemporary desired uses, as
 I've suggested many times before on this list and others and tried to
 explain why; it's got little to do with rdbms though.
/snip

Although MARC needs to change, and has needed it for a very long time, I
don't see how changing the format would improve the subject headings.
The semantics are there already, so searching would remain the same. It
is the display of the multiple search result which has disintegrated. I
think there are lots of ways that the displays could be improved for the
public--primarily by making them more flexible and could be experimented
with now--but even then, there will need to be a major push from public
services to get the public to use and understand what the subject
searches are. All of it has been effectively forgotten by the public.

For a whole lot of reasons, library subject searches will always be
substantively different from what what people retrieve from a full-text
search result and while librarians can understand this, it is a lot
harder for the public.

-- 
*James Weinheimer* weinheimer.ji...@gmail.com
*First Thus* http://catalogingmatters.blogspot.com/
*Cooperative Cataloging Rules*
http://sites.google.com/site/opencatalogingrules/
*Cataloging Matters Podcasts*
http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html


Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-15 Thread Jonathan Rochkind

On 5/15/2012 11:34 AM, James Weinheimer wrote:

Although MARC needs to change, and has needed it for a very long time, I
don't see how changing the format would improve the subject headings.


I did not mean to say that changing from MARC to somethign else, by 
itself, would do anything at all to subject headings.


I chose my phrase carefully, the way our data is currently recorded and 
maintained in MARC.  Several things about the way our data is currently 
and recorded and maintained (which we currently do in MARC) ought to be 
changed. Subject headings aren't even one of the main ones, although the 
way they are done could certainly be improved to be more powerful in 
software environments.


It is a large and complicated topic. One we've spent collectively years 
arguing about on this list.


Frankly, I no longer have much confidence that the library cataloging 
community is capable of any necessary changes in any kind of timeline 
fast enough to save us.


Those that believe no significant changes to library cataloging or 
metadata practices are neccesary will have a chance to see if they are 
right.


I believe that inaction -- in ability to make significant changes in 
the way our data is currently recorded and maintained to accomodate 
contemporary needs -- will instead result in the end of the library 
cataloging/metadata tradition, and the end of library involvement in 
metadata control, if not the end of libraries.  I find it deeply 
depressing. But I no longer find much hope that any other outcome is 
possible, and begin to think any time I spend trying to help arrive at 
change is just wasted time.


Jonathan


Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

2012-05-14 Thread Karen Coyle
Note to the majority of readers on RDA-L: you should feel no guilt in 
skipping the rest of this thread. It has veered off into a technical 
discussion that you may simply have no time (or use) for - kc


On 5/14/12 12:50 PM, Simon Spero wrote:


On Mon, May 14, 2012 at 10:45 AM, Karen Coyle li...@kcoyle.net
mailto:li...@kcoyle.net wrote:
 What happened with the MARC format is that when we moved it into
actual databases it turned out that certain things that people
expected or wanted didn't really work well. For example, many
librarians expected that you could *[a]* /replicate a card catalog
display/ with *[b]* /records/ /displaying in order by the/
/heading that was searched/. That is really hard to do (*[c]* /and
not possible to do efficiently/) using*[d]* /DBMS/ functionality,
which is based on *[e]* /retrieved sets/ not /linear ordering/,
and*[f] */especially using keyword searching/.  [emphasis and
labels  added]


BLUF: Not all DBMS  are Relational;  it is possible to efficiently 
retrieve records in order from many different types of DBMS, including 
Relational databases.


[c] and [d] make the claim that it is impossible to retrieve records 
efficiently in some desired order using DBMS functionality.  This is 
justified by [e] which claims that the source of this necessary 
inefficiency is that DBMS functionality is based on retrieved sets 
not linear ordering.


No, that is not what I meant. Of course you can retrieve records in a 
given order, and we do all the time. It's about using the headings in 
the MARC records to establish that order. So here's the question I put 
to Mac:


***

let's say you have a record with 3 subject headings:

Working class -- France
Working class -- Dwellings -- France
Housing -- France

In a card catalog, these would result in 3 separate cards and therefore 
should you look all through the subject card catalog you would see the 
book in question 3 times.


In a keyword search limited to subject headings, most systems would 
retrieve this record once and display it once. That has to do with how 
the DBMS resolves from indexes to records. So even though a keyword may 
appear more than once in a record, the record is only retrieved once.


In your catalog, which displays the subject headings on a line with the 
author and title

1) will each of these subject headings appear in the display?
2) does that mean that the bibliographic record (represented by the 
author and title) will display 3 times in the list of retrievals?


***

I could add to that: if the record had four subject headings:

Working class -- France
Working class -- Dwellings -- France
Housing -- France
Housing -- Europe

Then under what circumstances in your system design would the user see 
all four subject entries (heading plus bib data) in a single display?


That's part of the question. The card catalog had a separate physical 
entry for each entry point or heading associated with the 
bibliographic description. Do we have a reasonably efficient way to 
imitate this behavior using keyword (or keyword in heading, or 
left-anchored string searching) in an online library catalog? (followed 
by: is there any reason to do that?)


But I think another part is the difference between retrieval, in the 
database sense of the term (give me all of the records with the word 
*france* in a subject heading) vs. the kind of alphabetical linear 
access that the card catalog provided, which allows you to begin at:


France -- United States -- Commerce

and soon arrive at

Frances E. Willard Union (Yakima, Wash.)

I don't think you can get from one to the other in most online catalogs 
because the set of records that you can see is determined by the search 
that retrieves only those records with *france* in it.


I've designed a browse in DBMSs using a left-anchored search that 
retrieves one heading (the first one hit) in a heading index followed by 
a long series of get next commands. Naturally, next has to also be 
next in alphabetical order, so the index you are traversing has to be in 
alphabetical order. I should say: alphabetical order that is retained 
even as records are added, modified or deleted. I think this may be more 
feasible in some DBMSs than others.


However, what is obviously missing here is a display of the bib record 
that goes with the heading (all of that ISBD stuff). It's possible 
that DBMS's can do this fine today, but in my olden days when I 
suggested to the DBA that we'd need to get next, display that heading, 
then retrieve and display the bibliographic record that went with it, 20 
times in order to create a page of display, I practically had to revive 
the DBA with a bucket of cold water.


Mac's system also cannot take the display from France--US--etc to 
Frances E. Willard because the headings it has to work with have been 
retrieved on a keyword search, thus only headings with the term *france* 
in them are displayed. It also does