Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
On 15/05/2012 17:53, Jonathan Rochkind wrote: snip Frankly, I no longer have much confidence that the library cataloging community is capable of any necessary changes in any kind of timeline fast enough to save us. Those that believe no significant changes to library cataloging or metadata practices are neccesary will have a chance to see if they are right. I believe that inaction -- in ability to make significant changes in the way our data is currently recorded and maintained to accomodate contemporary needs -- will instead result in the end of the library cataloging/metadata tradition, and the end of library involvement in metadata control, if not the end of libraries. I find it deeply depressing. But I no longer find much hope that any other outcome is possible, and begin to think any time I spend trying to help arrive at change is just wasted time. /snip I think many share your fears. I certainly do, but it is important not to give up hope. The problem as I see it is that while everyone agrees that we should move forward, we don't even know which direction forward is. Some believe it is east, others west, others north, others up, others down. Nobody knows. Is the basic problem in libraries the way our data is currently recorded and maintained? For those who believe this, then it would mean that if libraries changed their format and cataloging practices, things would be better. But this will be expensive and disruptive. That is a simple fact. And undertaking something like that during such severe economic times makes it even more difficult. So, it seems entirely logical that people ask whether this *really will* help or whether those resources would be better used to do something else. In fact, this is such a natural question, not asking it makes people raise their eyebrows and wonder if there really is an answer. This is why I keep raising the point of the business case. It is a fundamental, basic task. And another fact is, if we want to make our records more widely available in types of formats that others could use, it can be done right now. Harvard is doing it with their API: http://blogs.law.harvard.edu/dplatechdev/2012/04/24/going-live-with-harvards-catalog/ They say their records are now available in JSON using schema.org, in DC or in MARC, although all I have seen is MARC so far. Still, Kudos to them! It is a wonderful beginning! So it is a fact that the library community does not have to wait for RDA, FRBR or even the changes to MARC to repurpose their data. Would it be perfect? Of course not! When has that ever had anything to do with anything? Everyone expects things to change constantly, especially today. A few years of open development using tools such as this would make the way forward much clearer than it is now. Then we could start to see what the public wants and needs and begin to design for *them* instead of for *us*. If we find that there is absolutely no interest in open development of library tools, that would say a lot too. To maintain that RDA and FRBR are going to make any difference to the public, or that they are necessary to get into the barely-nascent and highly controversial Linked Data, is simply too much to simply accept. Each represents changes, that's for sure, but theoretical ones that happen almost entirely behind the scenes, and all whose value has yet to be proven. All this in spite of the incredible developments going on right under our noses! Therefore, it seems only natural to question whether RDA, FRBR and Linked Data truly represent the direction forward or are they actually going in some other direction. On a more positive note, I think there are incredible opportunities for libraries and librarians today. -- *James Weinheimer* weinheimer.ji...@gmail.com *First Thus* http://catalogingmatters.blogspot.com/ *Cooperative Cataloging Rules* http://sites.google.com/site/opencatalogingrules/ *Cataloging Matters Podcasts* http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html
Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
On 15/05/2012 17:53, Jonathan Rochkind wrote: Frankly, I no longer have much confidence that the library cataloging community is capable of any necessary changes in any kind of timeline fast enough to save us. There is no question that change is needed. The question is, are RDA records coded in MARC21 the needed change? __ __ J. McRee (Mac) Elrod (m...@slc.bc.ca) {__ | / Special Libraries Cataloguing HTTP://www.slc.bc.ca/ ___} |__ \__
Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
On 15/05/2012 02:52, Karen Coyle wrote: snip let's say you have a record with 3 subject headings: Working class -- France Working class -- Dwellings -- France Housing -- France In a card catalog, these would result in 3 separate cards and therefore should you look all through the subject card catalog you would see the book in question 3 times. In a keyword search limited to subject headings, most systems would retrieve this record once and display it once. That has to do with how the DBMS resolves from indexes to records. So even though a keyword may appear more than once in a record, the record is only retrieved once. /snip I don't believe that is correct. That kind of search result should be a programming decision: whether to dedupe or not. It seems to me that a record with France three times in the record could easily display three times in a search result if you want it to. With relevance ranking, or ranking by date, etc. it makes little sense to display the same record three different times, although I am sure you could. Having a record display more often makes sense only with some kind of browse heading display but I have never seen that with a keyword result. This is a great example of how our current subject heading strings just don't function today, and they haven't ever since keyword was introduced. Computerized records work much better with descriptors than with traditional headings, for instance, your example would be something like: Topical Subjects: Working class, Dwellings, Housing Geographic Subject Area: France. Here, there is no question since France appears only once in the subjects. Seen in this light, our subject headings are obsolete but nevertheless, I believe our subject headings with subdivisions provides important options found nowhere else, as I tried to show in the posting I mentioned in my previous message. But really, how the subject headings function must be reconsidered from their foundations, otherwise they really are obsolete. The dictionary catalog really is dead, at least as concerns the public. -- *James Weinheimer* weinheimer.ji...@gmail.com *First Thus* http://catalogingmatters.blogspot.com/ *Cooperative Cataloging Rules* http://sites.google.com/site/opencatalogingrules/ *Cataloging Matters Podcasts* http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html
Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle Sent: May 14, 2012 8:53 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF All that to say that if we are not going to display our records in alphabetical order by their headings, then I'm not sure if creating headings during cataloging makes all that much sense. Or at least, not the kinds of headings that we do create, which are designed to be viewed in alphabetical order. You are supposed to see Hamlet before you see Hamlet. French. Hamlet. German. Hamlet. German. 1919 Maybe you don't see Hamlet first, but the logic of adding on to the right hand side of the heading implies that the order conveys something to the user that facilitates finding what he is looking for. Thus, I question to creation of headings that are designed to be encountered in alphabetical order unless we adopt an ordered display around those headings. And if we think it is important to adopt such a display, we need to understand the implications for system design. There are numerous effects of the alphabetical browse display of headings in online systems that force catalogers and systems designers to make all sorts of unexpected decisions and difficult choices and workarounds. And even at that, the conventions that bring us those headings are often found out of context. For example, some of those headings with extra bits at the end exist to differentiate entities, and otherwise appear arbitrary without much relation to the headings around them which omit the extra bits. End-users have their complaints browsing a catalog index. They complain when they expect to find different records attached to each unique heading, but instead find that the record happened to have several headings that all began with the same words. Multiple indexes in online catalogs fracture and distort the intended effect of browsing headings. For the four ILS's I've worked with and customized I've had to make choices about MARC index mapping that would mitigate these issues: 1. Author Browse may or may not contain name-title headings for works and expressions. These headings could be pulled from related or analytical or series added entries. Should subject name-title headings be included? What about title SEE references to these headings? One system I used actually reconstructed the 1XX+240 heading on-the-fly. And what about persons and corporate bodies as subjects? Shouldn't the user benefit from seeing all related works together? This is why FRBR is so important. So much of the indexing is built around a cacophony of different implicit relationships, with little that is explicit to the end-users in terms of building expectations of what should be found with what. Being clear about the relationships matter, because that information needs to survive as catalogs records and indexes are torn apart and rebuilt in any number of different ways - we can't assume the implicit logic that exists when all card catalog and heading data are found together in context. 2. Title Browse often doesn't include authority information such as SEE and SEE ALSO references, so much of the information available in authority records is effectively lost. Should Title Browse draw in all titles, such as series titles or subject titles? I always mapped these together because I felt it wrong for an end-user to decide upon a title AND a relationship when searching (i.e., the end-user knows the title, but may not know it's a series title - why expect the end-user to be forced to choose between Title Browse and Series Browse?) 3. Subject Browse - similar to the issue above about end-users being forced to choose indexes, an end-user needs to differentiate William Shakespeare as author from William Shakespeare as subject ahead of time to find all the records attached to that name. The records are not found together with a single search in many cases. In an early system I had with minimal authority control, there were actually two system generated authority records for William Shakespeare - one as an author and one as a subject. There is a benefit to maintenance when one record per entity is updated, but the end-user may not encounter all the benefits because of the bewildering choices of indexes and the truncated and chopped up displays of bibliographic and authority data in online catalogs. Once web-based catalogs appeared, there were choices that could be made as well when a heading is clicked. In the case of a related name-title work heading, I had three choices in one system: A. Click the heading and bring up only those records attached to the heading. B. Click the heading and have a keyword search initiated using all the words in the heading (not good with long and unique
Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
On 5/14/2012 8:52 PM, Karen Coyle wrote: No, that is not what I meant. Of course you can retrieve records in a given order, and we do all the time. It's about using the headings in the MARC records to establish that order. So here's the question I put to Mac: Sure you can use the headings in the MARC records to establish record retrieval order in an rdbms. All of our ILS/OPACs that return MARC records in headings order and are based on rdbms DO it. If the literal headings aren't structured right so that the rdbms' natural order will be right, the standard software solution is to automatically construct a 'sort key' from the headings. This is a pretty standard solution used all the time in many scenarios, it's not a significant burden or problem. I am a bit mystified by your arguments here about what rdbms can or can't do, and am not sure what you are trying to do with them. They don't match what software engineers using rdbms actually do. Also, you keep saying dbms (database management system), when I think you mean to be specifically talking about rdbms (RELATIONAL dbms); dbms is a more general term that can apply to just about anything that stores data persistently, but your arguments (which I don't agree with) seem to specifically be about databases that use SQL and are based on relational algebra -- that's rdbms specifically, not 'dbms'. I certainly agree that the way our data is currently recorded and maintained in MARC is not suitable for contemporary desired uses, as I've suggested many times before on this list and others and tried to explain why; it's got little to do with rdbms though. Jonathan
Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
On 15/05/2012 16:50, Jonathan Rochkind wrote: snip I certainly agree that the way our data is currently recorded and maintained in MARC is not suitable for contemporary desired uses, as I've suggested many times before on this list and others and tried to explain why; it's got little to do with rdbms though. /snip Although MARC needs to change, and has needed it for a very long time, I don't see how changing the format would improve the subject headings. The semantics are there already, so searching would remain the same. It is the display of the multiple search result which has disintegrated. I think there are lots of ways that the displays could be improved for the public--primarily by making them more flexible and could be experimented with now--but even then, there will need to be a major push from public services to get the public to use and understand what the subject searches are. All of it has been effectively forgotten by the public. For a whole lot of reasons, library subject searches will always be substantively different from what what people retrieve from a full-text search result and while librarians can understand this, it is a lot harder for the public. -- *James Weinheimer* weinheimer.ji...@gmail.com *First Thus* http://catalogingmatters.blogspot.com/ *Cooperative Cataloging Rules* http://sites.google.com/site/opencatalogingrules/ *Cataloging Matters Podcasts* http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html
Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
On 5/15/2012 11:34 AM, James Weinheimer wrote: Although MARC needs to change, and has needed it for a very long time, I don't see how changing the format would improve the subject headings. I did not mean to say that changing from MARC to somethign else, by itself, would do anything at all to subject headings. I chose my phrase carefully, the way our data is currently recorded and maintained in MARC. Several things about the way our data is currently and recorded and maintained (which we currently do in MARC) ought to be changed. Subject headings aren't even one of the main ones, although the way they are done could certainly be improved to be more powerful in software environments. It is a large and complicated topic. One we've spent collectively years arguing about on this list. Frankly, I no longer have much confidence that the library cataloging community is capable of any necessary changes in any kind of timeline fast enough to save us. Those that believe no significant changes to library cataloging or metadata practices are neccesary will have a chance to see if they are right. I believe that inaction -- in ability to make significant changes in the way our data is currently recorded and maintained to accomodate contemporary needs -- will instead result in the end of the library cataloging/metadata tradition, and the end of library involvement in metadata control, if not the end of libraries. I find it deeply depressing. But I no longer find much hope that any other outcome is possible, and begin to think any time I spend trying to help arrive at change is just wasted time. Jonathan
Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF
Note to the majority of readers on RDA-L: you should feel no guilt in skipping the rest of this thread. It has veered off into a technical discussion that you may simply have no time (or use) for - kc On 5/14/12 12:50 PM, Simon Spero wrote: On Mon, May 14, 2012 at 10:45 AM, Karen Coyle li...@kcoyle.net mailto:li...@kcoyle.net wrote: What happened with the MARC format is that when we moved it into actual databases it turned out that certain things that people expected or wanted didn't really work well. For example, many librarians expected that you could *[a]* /replicate a card catalog display/ with *[b]* /records/ /displaying in order by the/ /heading that was searched/. That is really hard to do (*[c]* /and not possible to do efficiently/) using*[d]* /DBMS/ functionality, which is based on *[e]* /retrieved sets/ not /linear ordering/, and*[f] */especially using keyword searching/. [emphasis and labels added] BLUF: Not all DBMS are Relational; it is possible to efficiently retrieve records in order from many different types of DBMS, including Relational databases. [c] and [d] make the claim that it is impossible to retrieve records efficiently in some desired order using DBMS functionality. This is justified by [e] which claims that the source of this necessary inefficiency is that DBMS functionality is based on retrieved sets not linear ordering. No, that is not what I meant. Of course you can retrieve records in a given order, and we do all the time. It's about using the headings in the MARC records to establish that order. So here's the question I put to Mac: *** let's say you have a record with 3 subject headings: Working class -- France Working class -- Dwellings -- France Housing -- France In a card catalog, these would result in 3 separate cards and therefore should you look all through the subject card catalog you would see the book in question 3 times. In a keyword search limited to subject headings, most systems would retrieve this record once and display it once. That has to do with how the DBMS resolves from indexes to records. So even though a keyword may appear more than once in a record, the record is only retrieved once. In your catalog, which displays the subject headings on a line with the author and title 1) will each of these subject headings appear in the display? 2) does that mean that the bibliographic record (represented by the author and title) will display 3 times in the list of retrievals? *** I could add to that: if the record had four subject headings: Working class -- France Working class -- Dwellings -- France Housing -- France Housing -- Europe Then under what circumstances in your system design would the user see all four subject entries (heading plus bib data) in a single display? That's part of the question. The card catalog had a separate physical entry for each entry point or heading associated with the bibliographic description. Do we have a reasonably efficient way to imitate this behavior using keyword (or keyword in heading, or left-anchored string searching) in an online library catalog? (followed by: is there any reason to do that?) But I think another part is the difference between retrieval, in the database sense of the term (give me all of the records with the word *france* in a subject heading) vs. the kind of alphabetical linear access that the card catalog provided, which allows you to begin at: France -- United States -- Commerce and soon arrive at Frances E. Willard Union (Yakima, Wash.) I don't think you can get from one to the other in most online catalogs because the set of records that you can see is determined by the search that retrieves only those records with *france* in it. I've designed a browse in DBMSs using a left-anchored search that retrieves one heading (the first one hit) in a heading index followed by a long series of get next commands. Naturally, next has to also be next in alphabetical order, so the index you are traversing has to be in alphabetical order. I should say: alphabetical order that is retained even as records are added, modified or deleted. I think this may be more feasible in some DBMSs than others. However, what is obviously missing here is a display of the bib record that goes with the heading (all of that ISBD stuff). It's possible that DBMS's can do this fine today, but in my olden days when I suggested to the DBA that we'd need to get next, display that heading, then retrieve and display the bibliographic record that went with it, 20 times in order to create a page of display, I practically had to revive the DBA with a bucket of cold water. Mac's system also cannot take the display from France--US--etc to Frances E. Willard because the headings it has to work with have been retrieved on a keyword search, thus only headings with the term *france* in them are displayed. It also does