Re: [RDA-L] Browse and search BNB open data
05.08.2011 00:36, Karen Coyle: John Attig: Access points are treated rather strangely in RDA. The access point is not itself an element, but is a construct made up of other elements, which contains instructions about what and when to include various elements in an access point. That actually makes sense from a data design point of view. It means that compound things can be built up of simple things, and that means that you have flexibility in what you can build. (read: tinker-toys, or, for the younger set, Legos) Very important indeed, but elementary for any data technician. Not quite so for those who have been raised on AACR+MARC. They find it strange, as John seems to indicate. Why is that so? Starting out from the mental image of MARC, one may find it natural that everything that can be accessed in a search must be recorded in some data field, and exactly in the way it is needed for the access. This notion needs to be shattered. It has led to such extremes that, for instance, in authority records you have 53 variant names, each and every one of them carrying the same dates for that person. The access points for the variant names can, however, easily be contructed out of a name field plus a date field - the latter always the same. MARC derives from the requirements of card printing. There, each heading (access point in the card catalog) had to be complete and correctly formed as part of the record. This is no longer true, and has never been true in data processing systems: 1. Headings can be constructed out of arbitrary elements, they need not be stored as monolithic strings inside the record 2. New access points can be constructed that had never been possible in card catalogs. All kinds of combinations and reformattings of field contents can be programmed, no need to have every access point prepared in advance and stored in its own field. For example, extract the publisher's name out of the 260 and remove certain particles from it, and then get the date out of the fixed fields to make a useful index entry (access point) like name:date This is easy to understand, but as a consequence, the rules, and thus the data model, will become more abstract and more difficult to understand. But maybe only for someone who has been brought up on the notions of the card catalog and later those of MARC. For someone with a background in abstract data structures, John Attig's clarifications are no surprise at all. One more reason, one might think, to get rid of MARC ASAP. Not really, though. Firstly, because it is utterly unrealistic, and second. because MARC is flexible enough to be used in new software applications that do new tricks with the old stuff AND are able to deal with some new data elements in novel ways. It is not the worst of ideas to look at the additions Germans and Austrians have thought up for their MARC dialect. It will allow us to continue with our scenario 2 applications as they are long since in operation, and the further step to scenario 1, if at all necessary and useful, would not be very difficult either. We are not using MARC internally, and are not going to, but our internal formats are no less complex. They are only not rooted in the mental image of the card. B.Eversberg
Re: [RDA-L] Browse and search BNB open data
On 04/08/2011 21:33, Karen Coyle wrote: snip But the rule is that mostly, you use the publication date of the first manifestation of the expression. (I can't find the rule for this right now, since I don't have access to a lot) The only example I can find right now is King Kong: http://lccn.loc.gov/90715189, where if you look at the related titles, you will see 1933, while the date of publication of this item is 1984. King Kong (Motion picture : 1933) Aha! Thanks. Although... isn't this an even more arcane bit of data than the first date of the work? And many (including you) were doubtful that catalogers could supply that. /snip Not really, because focusing on the manifestation assumes that there has been something published somewhere. Most of the time this is fairly simple, because often, your (later) item discusses the earlier version and saves you a lot of time. If your item does not supply this information, too bad, but by following the rule of Seek and ye shall find, which sometimes might take quite a bit of work, by using Worldcat, the NUC, and all kinds of other catalogs out there, plus a bit of ingenuity, you can normally find a record or citation to that first item published. Besides, most normal catalogers do such an amount of research very rarely. It wouldn't surprise me that if the lack of real consistency in these fields reflects the cataloger's lack of time, plus the general feeling that few patrons understand, use, or want uniform titles so it is not worthwhile spending the time. (I don't necessarily agree, as I discuss below, but the feeling is out there) Comparing this to hunting out a first date of something as vague as the work, which would have to be done much more often and would probably always require research, is quite a different matter. snip In general I am having a hard time understanding how we will treat these kinds of composite headings in any future data carrier. They seem to be somewhat idiosyncratic, in that what data gets added is up to the cataloger, depends on the context, and probably cannot be generated algorithmically. This whole part about headings (access points in RDA, I believe) has me rather stumped from a design point of view. At the same time, if all of the individual elements are available, and one links manifestations of a single expression, then some system feature may be able to display this distinction to the user without the use of individual cataloger-formed headings. This would also mean that the records can be created without being dependent on a particular context, which should make sharing of data even more accurate. /snip In defense of catalogers, the entire system was originally designed for a card/print world where everyone had no choice except to browse, and the method worked fairly well back then. This is shown in Princeton's scanned catalog for Cicero's Pro Milone (http://imagecat1.princeton.edu/cgi-bin/ECC/cards.pl/disk3/0892/B4159?d=fp=Cicero,+Marcus+Tullius--Individual+works--Pro+Archia+%3Eg=52977.50n=47r=1.00thisname=.0047.tiff) and browsing forward from there, you can see how the uniform titles worked, and kept things more or less in order. (At Princeton, most of the uniform titles were handwritten in pencil in the top right hand corner and unfortunately pencil came out very poorly in the scans. Still, I think you can make out the titles and dates.) You will see that the language translations are mostly mixed together, although one includes the qualifier Greek. In spite of this, the final product worked fairly well though, because it was pretty easy--once you got to Cicero. Pro Archia to browse through the cards. Still, I think that instead of trying to shoehorn our data, which was created for another time, to function more or less crudely in the new environment, it would be far more more progressive to reconsider how to use the power of the current systems we have at our disposal. Uniform titles are a great case in point. As we saw in the Princeton catalog, even when they weren't done perfectly, uniform titles worked pretty well in a physical environment where browsing was the only way of finding things, but they fell apart in a computerized/keyword environment, just as much of the rest of the catalog. (For those interested in more on this, see my posting on Autocat http://catalogingmatters.blogspot.com/2010_10_01_archive.html) Today using Worldcat, I can search for au: homer and ti: odyssey http://www.worldcat.org/search?q=ti%3Aodyssey+au%3Ahomer and get a very handy, useful list that I can do a lot with: limit to books, by language, by dates, by translators, novel sorting etc. Today, Zebra-type indexing extracts the headings and other information and makes them available for further refinements, so we get something that so far as I am concerned, is far better than how the clunky, old card/printed catalog ever worked. (Compare the Cicero example
Re: [RDA-L] Browse and search BNB open data
Quoting Bernhard Eversberg e...@biblio.tu-bs.de: One more reason, one might think, to get rid of MARC ASAP. Not really, though. Firstly, because it is utterly unrealistic, and second. because MARC is flexible enough to be used in new software applications that do new tricks with the old stuff AND are able to deal with some new data elements in novel ways. The MARC format, aka ISO 2709, may have that flexibility, but I'm not convinced that the way that we have used MARC lives up to that. The atomized data that we do have, which is found in the fixed fields and some of the 0XX fields, is often not filled in when it should be. The same is true for structured headings, like the current uniform title. It is easy to find records for translations that do not have a uniform title for the original. Music catalogers are diligent about the music uniform title, but considerably less diligent in filling in the 047 which is a structured form of musical composition, or the 048 for number of instruments or voices. The fact is that the computable area of our record has been treated as secondary. And don't anyone come back and tell me that it's because systems don't do anything with it. It's a chicken and egg problem: systems can't do anything with it unless the data has been provided consistently, and the data isn't provided consistently because systems don't do anything with it. The foundation of this problem is that catalogers are being asked to create two parallel sets of data: one that is visible to the users, and one that should satisfy machine needs. We should be doing everything we can with a single set of data because it is just human nature that doing things twice will mean that something -- especially the less visible thing -- doesn't get done. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
[RDA-L] Fwd: [RDA-L] Browse and search BNB open data
-- Forwarded message -- From: Gene Fieg gf...@cst.edu Date: Fri, Aug 5, 2011 at 12:42 PM Subject: Re: [RDA-L] Browse and search BNB open data To: kco...@kcoyle.net Sometimes that MARC data isn't there because of local policies as well. As for systems not being able to use the data, the system people here finally changed a 7XX 02 from alternate title to Contains... On Fri, Aug 5, 2011 at 12:11 PM, Karen Coyle li...@kcoyle.net wrote: Quoting Bernhard Eversberg e...@biblio.tu-bs.de: One more reason, one might think, to get rid of MARC ASAP. Not really, though. Firstly, because it is utterly unrealistic, and second. because MARC is flexible enough to be used in new software applications that do new tricks with the old stuff AND are able to deal with some new data elements in novel ways. The MARC format, aka ISO 2709, may have that flexibility, but I'm not convinced that the way that we have used MARC lives up to that. The atomized data that we do have, which is found in the fixed fields and some of the 0XX fields, is often not filled in when it should be. The same is true for structured headings, like the current uniform title. It is easy to find records for translations that do not have a uniform title for the original. Music catalogers are diligent about the music uniform title, but considerably less diligent in filling in the 047 which is a structured form of musical composition, or the 048 for number of instruments or voices. The fact is that the computable area of our record has been treated as secondary. And don't anyone come back and tell me that it's because systems don't do anything with it. It's a chicken and egg problem: systems can't do anything with it unless the data has been provided consistently, and the data isn't provided consistently because systems don't do anything with it. The foundation of this problem is that catalogers are being asked to create two parallel sets of data: one that is visible to the users, and one that should satisfy machine needs. We should be doing everything we can with a single set of data because it is just human nature that doing things twice will mean that something -- especially the less visible thing -- doesn't get done. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet -- Gene Fieg Cataloger/Serials Librarian Claremont School of Theology gf...@cst.edu -- Gene Fieg Cataloger/Serials Librarian Claremont School of Theology gf...@cst.edu
Re: [RDA-L] Browse and search BNB open data
03.08.2011 17:42, McRee Elrod: How anyone comparing the XML and MARC versions could prefer the XML is beyond me. We find it simple to crosswalk from MARC to XML for anyone who wants it, but not back again. The latter is what we had to do in order to construct our database. Sure you can't get full MARC21 out of the stuff, but as BL has said, the current version is only a beginning. (Notwithstanding, I think you *can* find a thing or two in the database as it is.) The broader issue of whether or not XML will indeed have to be looked at. XML has been around for quite a while, and it has been showered with much enthusiasm. Not only that, but many an ambitious attempt has been made at doing metadata in a big way in XML, by more than a few good fellows eager to prove something. Well, we are all set to applaud the first compelling success. Why not take our solidly non-XML BNB database as a benchmark to surpass in a big way with an XML implementation? Doing new tricks not otherwise doable. But seriously, XML is certainly inadequate as a medium for data input and editing. A software interface will have to shield the raw XML entirely from the view of catalogers. And that's rather curious because XML is praised for being able to use human-readable tagging. But as not only Mac has found, how readable actually is an XML record when compared with a MARC record? The verbal tags only make the clueless think they understand what they read, but tag numbers, besides being language independent, can convey much more meaning and, as we all know, become a shorthand language that is more precise and faster for actual communication than cumbersome verbal tags as we see them in any attempts of XML metadata. XML may be many things, but it is not economical, in more than one way. This may be old-school views. Just prove me wrong. Only in practice, not in theory. Okay then, what now? What's going to be the medium and paradigm for the MARC successor? This question needs an answer, and soon, if RDA is to have a future and if this future is to begin in early 2013. B.Eversberg
Re: [RDA-L] Browse and search BNB open data
Karen Coyle wrote, ... recent Code4Lib journal: http://journal.code4lib.org/articles/5468 One of the difficulties of deciding what we do and do not want to keep in MARC, or what we want to move over to the RDA environment, is that we have no dictionary of everything that MARC covers. For example, what standard identifiers are available in MARC? They are scattered all over the format,... Yours is a worthwhile endeavor, no doubt. You may try a database which, although as good as current current, has been in existence for a long time and under a somewhat old-fashioned interface. And it covers not just MARC but several other formats as well, even Unimarc and the old BNBMARC and a few more obscure ones. You get into the alphabetical list of field and subfield names directly like this, (add your keyword to the end of it) http://www.biblio.tu-bs.de/db/formate/page.php?urG=KWDurA=24urS= There's also a MARC tag index: http://www.biblio.tu-bs.de/db/formate/page.php?urG=MRCurA=24urS=... The alphabetical listing contains all sorts of words, even German ones, but all the MARC terms are marked M21 plus the actual MARC tag. May it help, B.Eversberg
Re: [RDA-L] Browse and search BNB open data
James Weinheimer, speculating on the effects of moving MARC data to RDF XML, said at one point Compare this [loss of subfielding in 6XX fields] to losing the subfields in the 1xx/7xx, where the consequences would appear to be much fewer. I'm not expert in XML, but I would surmise that losing the ability to distinguish the title subelements from the name of a person in what is now a MARC 700 12 field (i.e. an analytical added entry) would have detrimental effects for retrieval of music materials. There are times when it's desirable to provide title phrase or keyword access to the title subelements in such fields. Simply saying that those subelements will occupy an XML title field (somewhat equivalent to a MARC 740 02 added entry) runs the risk, I fear, of losing the link between the name of the person and the title. That link is currently broken in our library's public catalog in terms of search redirection, and the lack of the link causes all sorts of mischief. Or I could be misunderstanding the entire thing and exposing my Luddite self. It's happened before. Mark Scharff, Music Cataloger Gaylord Music Library Washington University in St. Louis mscha...@wustl.edumailto:mscha...@wustl.edu Mark Scharff, Music Cataloger Gaylord Music Library Washington University in St. Louis mscha...@wustl.edumailto:mscha...@wustl.edu
Re: [RDA-L] Browse and search BNB open data
Karen, Thanks for sharing the article. It is really fascinating, although depressing. It is obviously a huge, very difficult and tedious undertaking, and from your experience, it seems that it will require the work of many people over many years. When I think about the fixed fields, I remember when I was at Princeton and how I reworked the online MARC format from LC, which was very difficult to work with at that time. I started work with the variable fields, and it was a lot of work, but I did it. Then I started on the fixed fields, thinking that the hard part was over, but I remember how my arm hurt at the end (working with the mouse) while it did not hurt with the variable fields. I was shocked by how incredibly complex the fixed fields are. My own two cents: the fixed fields are a lot of work for little payback. They can be cut way back. Anyway, it's too bad all of this wasn't started long ago but you have to play the cards you are dealt! My real concern is that we haven't got years to do this and we need to create something that works now, saves money now, and can be demonstrated as soon as possible. The BNB mapping is interesting, even though so much is lost--still, it is a start, and I think it's great. I'll continue to think about your work, which is definitely important, and what to do. I do have one point, which I am not sure is completely clear from your documents. In http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, you mention that 1923 in the 240, Odyssey. English. 1923, repeats the date of publication. This is correct but also incorrect(! I know that kind of statement is awful!) and therefore, is not really repeated information. What the date in the 240 is supposed to represent, although it is highly inconsistent in practice, is to break a conflict with another uniform title (i.e. 1xx/240 combination). They do this mostly with a publication date (unfortunately), and I would prefer something more meaningful, e.g. the name of the translator, and if necessary, edition statement, or something more meaningful than a publication date. But the rule is that mostly, you use the publication date of the first manifestation of the expression. (I can't find the rule for this right now, since I don't have access to a lot) The only example I can find right now is King Kong: http://lccn.loc.gov/90715189, where if you look at the related titles, you will see 1933, while the date of publication of this item is 1984. King Kong (Motion picture : 1933) It has to be qualified somehow and I guess this is better than King Kong (Motion picture : Fay Wray screaming) although this would have much more meaning to people. My next podcast will deal with some of these distinctions in a funny way (I hope!). It should come out very soon, so watch for it! Ciao, Jim On 03/08/2011 19:07, Karen Coyle wrote: Quoting James Weinheimer weinheimer.ji...@gmail.com: While there is an undoubted loss in semantics, with the future evolution of MARC format, we can ask: do such losses have any practical consequences? Although I think many subfields (although not the information) could disappear without any essential loss, some will have important consequences to different communities. Jim, this is much of the motivation for the work that I have been doing to try to identify the actual elements of MARC21 -- elements in the semantic sense, trying to ignore the MARC21 structure (which results in much repetition, etc.) A report on my study is available in the recent Code4Lib journal: http://journal.code4lib.org/articles/5468 One of the difficulties of deciding what we do and do not want to keep in MARC, or what we want to move over to the RDA environment, is that we have no dictionary of everything that MARC covers. For example, what standard identifiers are available in MARC? They are scattered all over the format, so it's hard to know. What about things like language and date? Those appear in different fields with somewhat different meanings. My assumption is that a complete inventory of MARC elements is essential for any move away from MARC. Unfortunately, I have gotten now to the 1xx-8xx fields (the study so far is 00x and 0xx, that's already pretty complex!) and may not have the energy to complete the study on my own. However, what I have done so far at least sets down some possible principles to follow. I'm doing it all on the futurelib wiki so my process is as transparent as I can make it: http://futurelib.pbworks.com/w/page/29114548/MARC%20elements kc -- James Weinheimer weinheimer.ji...@gmail.com First Thus: http://catalogingmatters.blogspot.com/ Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/
Re: [RDA-L] Browse and search BNB open data
Quoting James Weinheimer weinheimer.ji...@gmail.com: Karen, Thanks for sharing the article. It is really fascinating, although depressing. It is obviously a huge, very difficult and tedious undertaking, and from your experience, it seems that it will require the work of many people over many years. I'd like there to be more folks involved, but it doesn't take years -- if you are willing to make some decisions that work even though they aren't perfect. I've got it all in a database and, while tedious, it's not Herculean. I was able to do the fixed fields entirely as extraction from the database. When I think about the fixed fields, I remember when I was at Princeton and how I reworked the online MARC format from LC, which was very difficult to work with at that time. I started work with the variable fields, and it was a lot of work, but I did it. Then I started on the fixed fields, thinking that the hard part was over, but I remember how my arm hurt at the end (working with the mouse) while it did not hurt with the variable fields. I was shocked by how incredibly complex the fixed fields are. My own two cents: the fixed fields are a lot of work for little payback. They can be cut way back. What I did with the fixed fields is very simple: each fixed field element is a data element with a list of valid values. I didn't try to decide if those values overlap with values in the variable fields, or to deduplicate between elements. I did ignore the 006 since it is used only to make certain 008 elements repeatable, and therefore adds no new information (as a field... in records it does add more, but so does any element that is repeatable). What is turning out to be interesting with the 1xx-8xx fields is how they align with RDA (which I should have expected -- maybe it's different discovering it for yourself). Also interesting is where they differ. That part I would love to be able to discuss with folks with a cataloging background, but we should take it off list. I can add people to the futurelib wiki as editors and we can create pages that discuss certain issues. Of course, this may interfere with things like having free time, sleeping, eating or maintaining human relationships. :-) kc Anyway, it's too bad all of this wasn't started long ago but you have to play the cards you are dealt! My real concern is that we haven't got years to do this and we need to create something that works now, saves money now, and can be demonstrated as soon as possible. The BNB mapping is interesting, even though so much is lost--still, it is a start, and I think it's great. I'll continue to think about your work, which is definitely important, and what to do. I do have one point, which I am not sure is completely clear from your documents. In http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, you mention that 1923 in the 240, Odyssey. English. 1923, repeats the date of publication. This is correct but also incorrect(! I know that kind of statement is awful!) and therefore, is not really repeated information. What the date in the 240 is supposed to represent, although it is highly inconsistent in practice, is to break a conflict with another uniform title (i.e. 1xx/240 combination). They do this mostly with a publication date (unfortunately), and I would prefer something more meaningful, e.g. the name of the translator, and if necessary, edition statement, or something more meaningful than a publication date. But the rule is that mostly, you use the publication date of the first manifestation of the expression. (I can't find the rule for this right now, since I don't have access to a lot) The only example I can find right now is King Kong: http://lccn.loc.gov/90715189, where if you look at the related titles, you will see 1933, while the date of publication of this item is 1984. King Kong (Motion picture : 1933) It has to be qualified somehow and I guess this is better than King Kong (Motion picture : Fay Wray screaming) although this would have much more meaning to people. My next podcast will deal with some of these distinctions in a funny way (I hope!). It should come out very soon, so watch for it! Ciao, Jim On 03/08/2011 19:07, Karen Coyle wrote: Quoting James Weinheimer weinheimer.ji...@gmail.com: While there is an undoubted loss in semantics, with the future evolution of MARC format, we can ask: do such losses have any practical consequences? Although I think many subfields (although not the information) could disappear without any essential loss, some will have important consequences to different communities. Jim, this is much of the motivation for the work that I have been doing to try to identify the actual elements of MARC21 -- elements in the semantic sense, trying to ignore the MARC21 structure (which results in much repetition, etc.) A report on my
Re: [RDA-L] Browse and search BNB open data
On a different note and more details: Quoting James Weinheimer weinheimer.ji...@gmail.com: What the date in the 240 is supposed to represent, although it is highly inconsistent in practice, is to break a conflict with another uniform title (i.e. 1xx/240 combination). They do this mostly with a publication date (unfortunately), Same as with authority-controlled names, right? and I would prefer something more meaningful, e.g. the name of the translator, and if necessary, edition statement, or something more meaningful than a publication date. ditto something more meaningful than date of birth. http://kcoyle.blogspot.com/2007/09/name-authority-control-aka-name.html But the rule is that mostly, you use the publication date of the first manifestation of the expression. (I can't find the rule for this right now, since I don't have access to a lot) The only example I can find right now is King Kong: http://lccn.loc.gov/90715189, where if you look at the related titles, you will see 1933, while the date of publication of this item is 1984. King Kong (Motion picture : 1933) Aha! Thanks. Although... isn't this an even more arcane bit of data than the first date of the work? And many (including you) were doubtful that catalogers could supply that. In general I am having a hard time understanding how we will treat these kinds of composite headings in any future data carrier. They seem to be somewhat idiosyncratic, in that what data gets added is up to the cataloger, depends on the context, and probably cannot be generated algorithmically. This whole part about headings (access points in RDA, I believe) has me rather stumped from a design point of view. At the same time, if all of the individual elements are available, and one links manifestations of a single expression, then some system feature may be able to display this distinction to the user without the use of individual cataloger-formed headings. This would also mean that the records can be created without being dependent on a particular context, which should make sharing of data even more accurate. kc It has to be qualified somehow and I guess this is better than King Kong (Motion picture : Fay Wray screaming) although this would have much more meaning to people. My next podcast will deal with some of these distinctions in a funny way (I hope!). It should come out very soon, so watch for it! Ciao, Jim On 03/08/2011 19:07, Karen Coyle wrote: Quoting James Weinheimer weinheimer.ji...@gmail.com: While there is an undoubted loss in semantics, with the future evolution of MARC format, we can ask: do such losses have any practical consequences? Although I think many subfields (although not the information) could disappear without any essential loss, some will have important consequences to different communities. Jim, this is much of the motivation for the work that I have been doing to try to identify the actual elements of MARC21 -- elements in the semantic sense, trying to ignore the MARC21 structure (which results in much repetition, etc.) A report on my study is available in the recent Code4Lib journal: http://journal.code4lib.org/articles/5468 One of the difficulties of deciding what we do and do not want to keep in MARC, or what we want to move over to the RDA environment, is that we have no dictionary of everything that MARC covers. For example, what standard identifiers are available in MARC? They are scattered all over the format, so it's hard to know. What about things like language and date? Those appear in different fields with somewhat different meanings. My assumption is that a complete inventory of MARC elements is essential for any move away from MARC. Unfortunately, I have gotten now to the 1xx-8xx fields (the study so far is 00x and 0xx, that's already pretty complex!) and may not have the energy to complete the study on my own. However, what I have done so far at least sets down some possible principles to follow. I'm doing it all on the futurelib wiki so my process is as transparent as I can make it: http://futurelib.pbworks.com/w/page/29114548/MARC%20elements kc -- James Weinheimer weinheimer.ji...@gmail.com First Thus: http://catalogingmatters.blogspot.com/ Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/ -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] Browse and search BNB open data
On 8/4/2011 3:33 PM, Karen Coyle wrote: In general I am having a hard time understanding how we will treat these kinds of composite headings in any future data carrier. They seem to be somewhat idiosyncratic, in that what data gets added is up to the cataloger, depends on the context, and probably cannot be generated algorithmically. I'm thinking these sorts of headings should essentially be treated as opaque identifiers -- they were meant to basically serve the purpose identifiers, thus adding more-or-less arbitrary (some date you choose with meaning, your choice!) characters on the end to disambiguate, same as you'd add a more-or-less arbitrary path component on to the end of a URI to make sure it's unique, but expecting the finished URI string to be treated basically as an opaque identifier. So if you're stumped, I'd suggest seeing if there's a way to punt and treat these kind of headings as single un-subfielded opaque identifiers (they're not URI's, they're 'local' identifiers, but they're a kind of identifier. Well, 'local' in the sense of local to a particular authority file, particular community, or sometimes actually particular local system). Of course, that may cause it's own problems, if you just combine ALL uniform titles subfields into one big opaque 'identifier' string, might be losing useful semantic information that is in some of the other subfields. It's tricky, our legacy data is very legacy. (I don't know what that means, but I'm sticking to it.) So maybe just the subfields you have to punt on, don't worry about, just call em disambiguating suffix or disambiguating date suffix or something. Since that's all they are. Either way, I think it's probably important and useful to conceptualize our legacy headings as legacy semi-opaque 'identifiers'. For instance, it's absolutely vital, to make use of this data with legacy systems, that once you've deconstructed these things down to semantic elements, the system is still able to reconstruct them into the exact literal combined string 'identifier'. So either your encoding has to somehow preserve order (perhaps there is an implicit order to each element, if the marc fields for these 'headings' work that way, I'm not sure) -- or perhaps there needs to be another 'heading' data element that will include the complete assembled heading string 'identifier', even though that is duplication of information.
Re: [RDA-L] Browse and search BNB open data
On 8/4/2011 3:33 PM, Karen Coyle wrote: In general I am having a hard time understanding how we will treat these kinds of composite headings in any future data carrier. They seem to be somewhat idiosyncratic, in that what data gets added is up to the cataloger, depends on the context, and probably cannot be generated algorithmically. This whole part about headings (access points in RDA, I believe) has me rather stumped from a design point of view. At the same time, if all of the individual elements are available, and one links manifestations of a single expression, then some system feature may be able to display this distinction to the user without the use of individual cataloger-formed headings. This would also mean that the records can be created without being dependent on a particular context, which should make sharing of data even more accurate. I'm glad that Karen brought this up again. I missed the discussion in which she asked about access points in RDA; by the time I caught up, the discussion had moved on. Access points are treated rather strangely in RDA. The access point is not itself an element, but is a construct made up of other elements, which contains instructions about what and when to include various elements in an access point. [Note: In this, RDA follows the FRBR model, which lacks elements for access points. On the other hand, FRAD treats the access point as an entity in its own right, separate from the person, family, corporate body, work, expression, manifestation, or item that it represents. At some point, RDA may decide to adopt this FRAD structure (assuming that it survives the reconciliation of the FR models).] In our discussions of the question of how to treat access points, the JSC was advised that there were certain structural complexities that we should not attempt to build into the RDA element set, but should rely on the encoding to bring together the various elements into the access point construct. In MARC, we are accustomed to using subfields to encode the specific data elements and fields to wrap them up into an ordered construct. Similarly, in XML, one would expect to use some sort of wrapper to enclose all the elements that make up the access point. In order to do this, I suspect that one needs to treat the access point construct as if it were an element, even if the RDA element set does not treat it as such. Beyond these technical issues, this discussion raises questions about the way in which access points are constructed and used. a) The instructions on what to include in an access point represents our collective experience of what is important for uniquely identifying a given entity. There seems to be some value in gathering all these elements together for indexing and display as an aggregation of identifying information. b) While it is true that the individual elements are sufficient for finding relevant resources and don't need to be aggregated in a precoordinated way in order to work, I would argue that finding, identifying, and selecting relevant resources is sometimes best supported by browsing an alphabetical list of access points that are constructed in a way that reveals the structure of the things being browsed. Examples might be an alphabetical display of hierarchical entities such as corporate bodies, or an organized sequence of headings representing works and expressions. We may not NEED access points, but they can sure be helpful on occasion. c) In order to work, some thought needs to be given to the structure of the data, so that the sequence of access points reveals that structure. Traditionally we have done this by hand-crafting precoordinated access points according to instructions that aim to provide the best result that can be anticipated and applied globally. This may not be the best way of doing things. d) While many of us are skeptical of the ability of algorithms to create such structured access points automatically, it is certainly worth the attempt. If there could be a clear set of objectives for the exercise, algorithms might in fact be possible, bringing together relevant elements and arranging them in a significant order to form the access points. Even better, it might be possible to (i) offer different options for sequencing the elements -- sorting first by language or by format, for example -- and/or (ii) work in real time to formulate the best way of sequencing a given result set. Catalogers tend to resist giving up their hand-crafted headings, but that tends to be because they are not offered attractive alternatives. What I suggested above seems to be such an attractive alternative. John Attig Authority Control Librarian Penn State University jx...@psu.edu
Re: [RDA-L] Browse and search BNB open data
-Original Message- From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of John Attig Sent: August 4, 2011 4:09 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] Browse and search BNB open data ... Access points are treated rather strangely in RDA. The access point is not itself an element, but is a construct made up of other elements, which contains instructions about what and when to include various elements in an access point. [Note: In this, RDA follows the FRBR model, which lacks elements for access points. On the other hand, FRAD treats the access point as an entity in its own right, separate from the person, family, corporate body, work, expression, manifestation, or item that it represents. At some point, RDA may decide to adopt this FRAD structure (assuming that it survives the reconciliation of the FR models).] There are a few areas where the distinction between the access point as an element and as an entity can be confusing. The equivalent of the undifferentiated name indicator in FRAD is an attribute of the entity controlled access point. In RDA 8.11, it's an attribute for a Person entity-- it's used when the core elements for a Person are not sufficient for differentiation. [I'm aware of the error in RDA that is being corrected-- the placement of this element in 8.11 suggests it applies also to corporate bodies and families when it does not]. However, in RDA 9.19.1.1, there is an instruction to use the undifferentiated name indicator when the access point cannot make use of any suitable addition to differentiate persons with the same name. Those instructions suggest there's a relationship between the core elements and the elements that go into forming an authorized access point. The relationship between the two processes (recording core elements and constructing access points) is not spelled out sufficiently to leave the impression that there really is not a conflict between the two instructions. There are some other areas where the connections between the elements and the access point suggest implications that are not readily apparent from the basic instructions. For example, when settling upon a preferred name or preferred title, RDA consistently instructs to do so in light of their use as the basis for the authorized access point (example at RDA 9.2.2.1). That suggests that in environments that don't use authorized access points, decisions still need to be made that support the ongoing existence of authorized access points. In the RDA Element Set View (under the Tools tab in the RDA Toolkit), one FRAD entity is listed, and that is Name. It has attributes such as Date of usage and Scope of usage-- attributes that don't really make sense applied to the Person entity. Rather, they belong to the Name entity (and in FRAD, the Name entity is also separate from the Person entity and the Access point entity-- there are relationships between these entities that are spelled out in FRAD). While the main text of RDA subsumes entities like access point and name into the instructions for the main entity, such as Person, there are points at which it seems that the FRAD approach might be useful. As an example, access points in FRAD have relationships to Rules and Agencies, as well as a set of attributes such as language and script. These additional bits of information would make sense clustered together as attributes and relationships around the respective entities. Thomas Brenndorfer Guelph Public Library
Re: [RDA-L] Browse and search BNB open data
Quoting John Attig jx...@psu.edu: John, thank you so much -- this is very helpful. Wonderful, even. Access points are treated rather strangely in RDA. The access point is not itself an element, but is a construct made up of other elements, which contains instructions about what and when to include various elements in an access point. That actually makes sense from a data design point of view. It means that compound things can be built up of simple things, and that means that you have flexibility in what you can build. (read: tinker-toys, or, for the younger set, Legos) In our discussions of the question of how to treat access points, the JSC was advised that there were certain structural complexities that we should not attempt to build into the RDA element set, but should rely on the encoding to bring together the various elements into the access point construct. Here I want to point out that there can be a useful difference between your data elements, data model and your instance data. Your data elements can be atomistic, your data model can allow building of various molecules from the atoms, and your instance data can make use of the whole in many different ways. In MARC, we are accustomed to using subfields to encode the specific data elements and fields to wrap them up into an ordered construct. Similarly, in XML, one would expect to use some sort of wrapper to enclose all the elements that make up the access point. In order to do this, I suspect that one needs to treat the access point construct as if it were an element, even if the RDA element set does not treat it as such. I could also imagine that happening in an application layer. Without any change in the underlying data there could be different interpretations -- I usually call them views -- of the data. [Your last para states this very well; see below] But the key thing is that by not having the individual elements bound into the RDA complex elements you have freed the sub-elements from that structure, and they can be used in various ways if desired. In a situation where the only way to express date of expression is in an access point, you have restricted that data element (which may be of interest for other reasons) to that one situation. The more I look at RDA as elements the more I admire the separation of data content from record structure. This gives us many more possibilities for system developers. Beyond these technical issues, this discussion raises questions about the way in which access points are constructed and used. a) The instructions on what to include in an access point represents our collective experience of what is important for uniquely identifying a given entity. There seems to be some value in gathering all these elements together for indexing and display as an aggregation of identifying information. Yes, and that should be possible. b) While it is true that the individual elements are sufficient for finding relevant resources and don't need to be aggregated in a precoordinated way in order to work, I would argue that finding, identifying, and selecting relevant resources is sometimes best supported by browsing an alphabetical list of access points that are constructed in a way that reveals the structure of the things being browsed. Examples might be an alphabetical display of hierarchical entities such as corporate bodies, or an organized sequence of headings representing works and expressions. We may not NEED access points, but they can sure be helpful on occasion. I think this becomes a system efficiency question rather than a meaning question. At what point do systems need to manage these strings for the most efficient use? Is it easier to create them automatically in case they are needed, storing the data in multiple places? Or will there be a reason to bring this data together on the fly? I don't think we need to answer that at this point, but I would like to suggest that it would be ideal to begin to develop use cases. Use cases state a situation (user is looking for xyz), and what you would like the outcome to be (user gets/sees/is asked for...). There amy be more than one way to do this. You have included a use case here in suggesting an alphabetical display. There are undoubtedly search and find use cases related to this information (user wants bookX but only in Spanish), etc. Since a system should attempt to satisfy a variety of use cases, this would help me (and maybe other systems developers/thinkers) to understand the range of services we want to get out of this data. In essence, the data as input should not be considered to be the same as the strings that the user will see. It was so, often, in MARC, but that was kind of a throw-back to the card days. Today one designs user interfaces and services BEFORE defining the data structure. The
Re: [RDA-L] Browse and search BNB open data
Brenndorfer, Thomas tbrenndor...@library.guelph.on.ca wrote: In RDA 8.11, it's an attribute for a Person entity-- it's used when the core elements for a Person are not sufficient for differentiation. [I'm aware of the error in RDA that is being corrected-- the placement of this element in 8.11 suggests it applies also to corporate bodies and families when it does not]. Tangent: the last line of 8.6 should be cleared up in the same manner. And I think the last line under 10.10.1.1 will get the boot as well. On the other hand, I wouldn't mind future-proofing corporate and, especially, family names by allowing undifferentiated status markers for these. -- Mark K. Ehlert Minitex Coordinator University of Minnesota Bibliographic Technical 15 Andersen Library Services (BATS) Unit 222 21st Avenue South Phone: 612-624-0805 Minneapolis, MN 55455-0439 http://www.minitex.umn.edu/
Re: [RDA-L] Browse and search BNB open data
02.08.2011 18:34, J. McRee Elrod: http://www.allegro-c.de/db/a30/bl.htm Am I correct that there is no MARC display available? OK, for what it's worth and for good measure, I've added that in; no big deal since we've got what it takes. Now, MARC appears directly underneath the regular display. But only as complete and as correct as the stuff that was released. The format made available by BL is an XML schema of their own design, documented here: http://www.bl.uk/bibliographic/datafree.html (under Data model draft schema) A sample XML record: rdf:Description dcterms:titleThe elves and the emperor/dcterms:title dcterms:creator rdf:Description rdfs:labelRobinson, Hilary, 1962-/rdfs:label /rdf:Description /dcterms:creator dcterms:contributor rdf:Description rdfs:labelSanfilippo, Simona./rdfs:label /rdf:Description /dcterms:contributor dcterms:type rdf:Description rdfs:labeltext/rdfs:label /rdf:Description /dcterms:type dcterms:type rdf:Description rdfs:labelmonographic/rdfs:label /rdf:Description /dcterms:type isbd:P1016 rdf:Description rdfs:labelLondon/rdfs:label /rdf:Description /isbd:P1016 dcterms:publisher rdf:Description rdfs:labelWayland/rdfs:label /rdf:Description /dcterms:publisher dcterms:issued2009/dcterms:issued dcterms:language rdf:Description rdf:value rdf:datatype=http://purl.org/dc/terms/ISO639-2;eng/rdf:value /rdf:Description /dcterms:language dcterms:extent rdf:Description rdfs:label31 p/rdfs:label /rdf:Description /dcterms:extent dcterms:descriptionOriginally published: 2008./dcterms:description dcterms:subject skos:Concept skos:notation rdf:datatype=ddc:Notation428.6/skos:notation skos:inScheme rdf:resource=http://dewey.info/scheme/e22; / /skos:Concept /dcterms:subject dcterms:isPartOf rdf:Description rdfs:labelFairytale jumbles/rdfs:label /rdf:Description /dcterms:isPartOf dcterms:isPartOf rdf:Description rdfs:labelStart reading. Purple band 8/rdfs:label /rdf:Description /dcterms:isPartOf dcterms:identifier(Uk)015346892/dcterms:identifier dcterms:identifierGBA979108/dcterms:identifier bibo:isbn9780750255233/bibo:isbn bibo:isbn0750255234/bibo:isbn dcterms:identifierURN:ISBN:9780750255233/dcterms:identifier dcterms:identifierURN:ISBN:0750255234/dcterms:identifier /rdf:Description which translates like this: =LDR 01234cam a22002771i 45e0 =001 015346892 =007 ta =008 \\991231s2009n\\\eng\d =020 \\$a9780750255233 =040 $ea =082 00$a428.6 =100 1\$aRobinson, Hilary (1962-) =245 04$aThe elves and the emperor /$cHilary Robinson =260 \\$aLondon :$bWayland,$c2009 =300 \\$a31 p =440 \0$aFairytale jumbles =440 \0$aStart reading. Purple band 8 =500 \\$aOriginally published: 2008. =700 12$aSanfilippo, Simona B.E.
Re: [RDA-L] Browse and search BNB open data
On 03/08/2011 08:34, Bernhard Eversberg wrote: snip 02.08.2011 18:34, J. McRee Elrod: http://www.allegro-c.de/db/a30/bl.htm Am I correct that there is no MARC display available? OK, for what it's worth and for good measure, I've added that in; no big deal since we've got what it takes. Now, MARC appears directly underneath the regular display. But only as complete and as correct as the stuff that was released. The format made available by BL is an XML schema of their own design, documented here: http://www.bl.uk/bibliographic/datafree.html (under Data model draft schema) /snip This is interesting. From the table http://www.bl.uk/bibliographic/pdfs/marctordfxmlmappingsv0-3-2.pdf, we see how some of the semantics of the MARC format are lost in the conversion. As we evolve away from the MARC format, I am sure the direction will be toward simplification, so it seems valuable to discuss what could be eliminated from MARC with the fewest consequences. From a very quick review of that table, I see the 534 being translated to dcterms:description, losing some handy subfields, and all of the subfields in the 100/700 fields mapping to dcterms:creator. Also, all of the subfields in the 6xx fields are being placed into dcterms:subject, and there is a loss of the subfield description avxyz. I need to emphasize that this is discussing losing the specific subfield *coding*, NOT losing the information, e.g. 100 0_*|a *Benedict*|b *XVI,*|c *Pope,*|d *1927- as opposed to *dcterms:creator*Benedict**XVI,**Pope,**1927-*/dcterms:creator* In practical terms for all the various metadata communities, where precisely is the loss here? While there is an undoubted loss in semantics, with the future evolution of MARC format, we can ask: do such losses have any practical consequences? Although I think many subfields (although not the information) could disappear without any essential loss, some will have important consequences to different communities. For instance, we see in the mapping the complete elimination of 245$c, which would obviously have important consequences for *librarians* (i.e. necessary for determination of a copy), although the loss of 245$c would be much less dire for the users. Loss of subfields with some of the most consequences would seem to be the subfields in the 6xx fields, since those semantics *could* lead to novel computer manipulation, sorting by chronology, geographic, and all kinds of other ways. Also, the distinctions of: 650$aHistory$xBibliography 650$aHistory$vBibliography 650$aBibliography$xHistory would be lost. Compare this to losing the subfields in the 1xx/7xx, where the consequences would appear to be much fewer. Yet, compare this to what others want: even more semantics, for example, to encode 300$a even further to specify pages or leaves or whatever. e.g. 300 a pages 245 /pages leaves 56 /leaves /a /300 etc. There are definite advantages with this level of coding but on the negative side, it is more work, prone to many more errors, and is more difficult to train new people, especially as there will be the push to simplify. I think these questions will begin to be asked (finally!), and answered too. This project from the British Library may be a great catalyst for the discussion. -- James Weinheimer weinheimer.ji...@gmail.com First Thus: http://catalogingmatters.blogspot.com/ Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/
Re: [RDA-L] Browse and search BNB open data
Am 03.08.2011 10:55, schrieb James Weinheimer: There are definite advantages with this level of coding but on the negative side, it is more work, prone to many more errors, and is more difficult to train new people, especially as there will be the push to simplify. I think these questions will begin to be asked (finally!), and answered too. This project from the British Library may be a great catalyst for the discussion. The BL has teamed up with Talis to develop and improve their open data activities. Here's more about that, together with a nice diagram any cataloger might love to mount on their office wall: http://consulting.talis.com/2011/07/british-library-data-model-overview/ I understand that the current release is only a first step, and together with Talis they will produce an improved version in the near future. B.Eversberg
Re: [RDA-L] Browse and search BNB open data
Bernard and all, In order to clarify the current situation, The British Library would like to take this opportunity to outline the range of free/open BNB options and encourage anyone seeking details to check http://www.bl.uk/bibliographic/datafree.html for further information. We would like to emphasise the experimental nature of this work and the likelihood that datasets we make available will be subject to change over time. As a result, we would recommend that those wishing to use the most up to date version of the BNB dataset obtain it directly from the BL. Older versions available from other sites have now been superseded and we will be contacting organisations we identify mounting these to offer updated versions. The current BNB options are: 1) BNB as linked data (the latest free data release, in association with Talis) - Available under a CC0 license using: SPARQL, Describe and Search endpoints. This dataset has been updated from an initial preview version of around 400,000 records to cover over 2.6 million monographs (80,249,538 triples) ; we hope to also offer a dump of the file via FTP shortly using the new data model (available at http://www.bl.uk/bibliographic/pdfs/datamodelv1_01.pdf) and schema (available at: (http://www.bl.uk/bibliographic/pdfs/britishlibrarytermsv1-01.pdf) 2) BNB in basic RDF/XML via FTP (the dataset currently under discussion) - Available under a CC0 license to individual researchers or organisations not requiring MARC21 data but wishing to data mine, mash up or otherwise interrogate the data set in bulk. An updated version is currently being produced which will be available via FTP directly from the BL - please contact metad...@bl.uk for access details. 3) BNB Z39.50 MARC21 Access - A free registration based service for non-commercial use under terms outlined on the British Library free data web page at: http://www.bl.uk/bibliographic/datafree.html If you have any queries about any of the BNB data offerings, please contact us at metad...@bl.uk Thank you Best regards Corine Corine Deliot on behalf of Metadata Services, The British Library. email: metad...@bl.uk From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Bernhard Eversberg Sent: 03 August 2011 10:14 To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] Browse and search BNB open data Am 03.08.2011 10:55, schrieb James Weinheimer: There are definite advantages with this level of coding but on the negative side, it is more work, prone to many more errors, and is more difficult to train new people, especially as there will be the push to simplify. I think these questions will begin to be asked (finally!), and answered too. This project from the British Library may be a great catalyst for the discussion. The BL has teamed up with Talis to develop and improve their open data activities. Here's more about that, together with a nice diagram any cataloger might love to mount on their office wall: http://consulting.talis.com/2011/07/british-library-data-model-overview/ I understand that the current release is only a first step, and together with Talis they will produce an improved version in the near future. B.Eversberg
Re: [RDA-L] Browse and search BNB open data
In article 4e38ebe3.5090...@biblio.tu-bs.de, you wrote: OK, for what it's worth and for good measure, I've added that in; no big deal since we've got what it takes. Bless your sweet heart. Did you notice the not for commercial purposes in the BL posting? We are not even going to ask. No matter how much we give back, as outsourcer we are made to feel dirty. How anyone comparing the XML and MARC versions could prefer the XML is beyond me. We find it simple to crosswalk from MARC to XML for anyone who wants it, but not back again. Mac __ __ J. McRee (Mac) Elrod (m...@slc.bc.ca) {__ | / Special Libraries Cataloguing HTTP://www.slc.bc.ca/ ___} |__ \__
Re: [RDA-L] Browse and search BNB open data
Quoting James Weinheimer weinheimer.ji...@gmail.com: While there is an undoubted loss in semantics, with the future evolution of MARC format, we can ask: do such losses have any practical consequences? Although I think many subfields (although not the information) could disappear without any essential loss, some will have important consequences to different communities. Jim, this is much of the motivation for the work that I have been doing to try to identify the actual elements of MARC21 -- elements in the semantic sense, trying to ignore the MARC21 structure (which results in much repetition, etc.) A report on my study is available in the recent Code4Lib journal: http://journal.code4lib.org/articles/5468 One of the difficulties of deciding what we do and do not want to keep in MARC, or what we want to move over to the RDA environment, is that we have no dictionary of everything that MARC covers. For example, what standard identifiers are available in MARC? They are scattered all over the format, so it's hard to know. What about things like language and date? Those appear in different fields with somewhat different meanings. My assumption is that a complete inventory of MARC elements is essential for any move away from MARC. Unfortunately, I have gotten now to the 1xx-8xx fields (the study so far is 00x and 0xx, that's already pretty complex!) and may not have the energy to complete the study on my own. However, what I have done so far at least sets down some possible principles to follow. I'm doing it all on the futurelib wiki so my process is as transparent as I can make it: http://futurelib.pbworks.com/w/page/29114548/MARC%20elements kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] Browse and search BNB open data
Gernhard Eversberg posted to RDA-L: http://www.allegro-c.de/db/a30/bl.htm Thank you. Your skill in making resources available is remarkable. Am I correct that there is no MARC display available? I'm copying to Autocat, so that the resource will be more widely known. __ __ J. McRee (Mac) Elrod (m...@slc.bc.ca) {__ | / Special Libraries Cataloguing HTTP://www.slc.bc.ca/ ___} |__ \__