Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)
On 5/19/2012 10:52 AM, Karen Coyle wrote: This is what worries me about FRBR and the assumptions that every bibliographic record will be made up of at least four and probably more like 6-8 table joins. If every record to be displayed requires a join of a Manifestation, an Expression, and a Work FRBR is an ontology, I don't think it makes any demands on how a system stores the data. We frequently 'denormalize' data in indexes/runtime systems, for performance. This is an ordinary thing to do. How the records are stored 'canonically' for cooperative cataloging and transfer does not need to be how they are stored in the system at 'runtime' for performance -- the latter is an implementation detail. Compare to _current_ practice -- some data is part of an 'authority' record, some is part of a 'bibliographic' record. Does that mean that every system needs to do a 'join' on every display? Not neccesarily, it depends on how the system prepares the data internally to support it's use cases in a performant way. Usually there is some 'denormalization'. Compare to RDF data model -- if RDF data were really stored internally _only_ in a triple store format -- this can in fact be even _more_ of a performance problem, effectively requiring many _more_ joins to display anything at all. RDF triples are kind of ultra normalized, any performance problems due to joins in an rdbms are _even more so_ with RDF. But this isn't a fatal flaw, actual systems will de-normalize and cache data as required for adequate performance with their real use cases. This is just how software is constructed. It's not a reason to, say, copy all of the information from an authority record into every bib record in their shared/exchanged/canonical representations, causing a maintenance nightmare where every time a line is changed in an authority record there are a thousand bib records that all need to be updated in the central database. Jonathan (because you can't get to the Work unless you go through the Expression even if you aren't using anything in it for display), plus an author, I think we'll see some response time problems. I know that XC is using a FRBR-ish design. VTLS also has one. Can anyone comment on the relative efficiency, or how one can mitigate the design to improve response time? Also, is a triple store more efficient? kc On 5/18/12 6:28 PM, Simon Spero wrote: On Fri, May 18, 2012 at 8:03 PM, Joe M Tomich jtom...@uwm.edu mailto:jtom...@uwm.edu wrote: Simon, In your model, does the stored information for an individual author or publisher constitute a record within a table (as would likely be the case in a typical relational database), or is each author, publisher, etc. effectively its own table? Typically you would have a table for each type of entity; you wouldn't have a table for each instance (that would be a lot of tables :-) In the examples I gave I actually presented four different models, representing different ways of using a relational model. In the first model we had a table where the reference to the right entry in the names table was included as a column in the table for bibliographic records. In this case we have 2 tables In the second model, we created a separate join table, which had a reference to an entry in the bibliographic records table, and a reference to an entry in the names table (this approach can be used with fields that could have multiple values for the same record, e.g. added entries). In this case we have 3 tables. In the third model, we had a separate table for every property, each with two columns. One column identified the thing that this was a property of (for example bib record number 9); the other gave a value of that property - in a performer table this might be value of n91064231, or possibly http://lccn.loc.gov/n91064231 ). In this case we have a separate table for every property, not for every record. The subject, table name, and value correspond to the three parts of an RDF triple. In the fourth model, we store the subject, property name, and value in a single table. This corresponds to a naive implementation of a triple store. In this case we only have a single table. Does this make things clearer? Simon -- Karen Coyle kco...@kcoyle.net mailto:kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)
On 5/21/12 7:28 AM, Jonathan Rochkind wrote: On 5/19/2012 10:52 AM, Karen Coyle wrote: This is what worries me about FRBR and the assumptions that every bibliographic record will be made up of at least four and probably more like 6-8 table joins. If every record to be displayed requires a join of a Manifestation, an Expression, and a Work FRBR is an ontology, I don't think it makes any demands on how a system stores the data. First, FRBR in its IFLA document form is a mental model. FRBRer, as encoded in the Open Metadata Registry, is an RDF ontology that has *very* strict requirements on how the elements can be used in a linked data environment. In other applications, like XC, presumably how you instantiate FRBR in your data store, the field is wide open. But read what I said: I worry about the assumptions that are being made. There are actually folks creating systems in which each bib description has a separate record for WEMI because that's how they interpret FRBR, and the IFLA RDF definition of FRBR actually encodes the relationships between the entities that, if followed, many of us think lead down the wrong path. I have a page that captures some of the discussion on this at: http://futurelib.pbworks.com/w/page/48221836/FRBR%20Models%20Discussion Obviously, you can do what you want with FRBR inside your own system, but we're talking about massive sharing of data. It's the sharing part that matters. The danger is that the library community will form standards that are widely followed but that are not a good idea. Or that deteriorate over time, like MARC, but we're so stuck to our standards that we can't imagine changing. If you actually look at that page and read the arguments there, rather than just shoot back an email telling me that I don't know what I'm talking about, you might see why some folks are concerned. I think a good working meeting about FRBR and what it means for implementations is long overdue. We can prattle on about it, but I think it's time to get concrete. For example, I would like to see an implementation of the Murray/Tillett model, and compare that perhaps to an implementation of Rob Stiles' model (if he's still thinking that way). Jakob Voss also has some great ideas. It does make a big difference whether we are assuming RDF or some other way of expressing the bibliographic data. The Dublin Core community is starting to re-address standards for Application Profiles and will (hopefully) eventually get to the point of addressing FRBR as it has been modeled in various ways in RDF. (A list of those is on the futurelib page.) At the moment the AP discussion is taking on some easier issues. http://wiki.dublincore.org/index.php/DCAM_Revision_Design_Patterns and in particular http://wiki.dublincore.org/index.php/DCAM_Revision_High_Level_Example_Publication_Statement My assumption is that there will be silo'd database implementations that export some of the data as RDF. I also suspect that there will be something like WorldCat that is used for cataloging, and that the result of that will either stay in the library cloud (much like Ex Libris' Alma) or will be pulled into local databases for local uses. These are different applications, but they will need to play well together if we are to link our data to the web. I think we need to model all kinds of possibilities -- perhaps as part of the study for the new bibliographic framework. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)
On 21/05/2012 18:06, Karen Coyle wrote: snip Obviously, you can do what you want with FRBR inside your own system, but we're talking about massive sharing of data. It's the sharing part that matters. The danger is that the library community will form standards that are widely followed but that are not a good idea. Or that deteriorate over time, like MARC, but we're so stuck to our standards that we can't imagine changing. If you actually look at that page and read the arguments there, rather than just shoot back an email telling me that I don't know what I'm talking about, you might see why some folks are concerned. /snip Yes, sharing data, and sharing it in the ways as seen in the Linked Data world, is entering unknown territory. The non-libraries who are already there, and those who are trying to get there, are not waiting for libraries to show them the right ways to do it. I don't think they really care if library metadata is added or not. Therefore, it is up to libraries to enter *their* world in the best ways possible and not expect everyone to follow us. I personally cannot believe the FRBR structures/ontology will be widely followed, but to expect the (weird) WEMI structure to magically become compatible with other structures that are only W or E or M or I or strange amalgamations that change constantly, or are generated dynamically--such as XSL Transformations and the on-the-fly transformations such as Google Translate, or when browser plugins are used--is taking a lot for granted. What I personally believe is that WEMI is more of a remnant of the print/physical world and has little to do with most digital information. Not that most members of the public wanted WEMI anyway. -- *James Weinheimer* weinheimer.ji...@gmail.com *First Thus* http://catalogingmatters.blogspot.com/ *Cooperative Cataloging Rules* http://sites.google.com/site/opencatalogingrules/ *Cataloging Matters Podcasts* http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html
Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)
The theory of database design and the practice don't always coincide, especially for large datasets. When I was working on large databases in Oracle the catchword was that joins are costly and the more of them that it took to respond to your query (between search and display) the worse your response time. Today's computers are bigger and faster so the constraints are probably lessened, but I suspect some constraints still exist. This is what worries me about FRBR and the assumptions that every bibliographic record will be made up of at least four and probably more like 6-8 table joins. If every record to be displayed requires a join of a Manifestation, an Expression, and a Work (because you can't get to the Work unless you go through the Expression even if you aren't using anything in it for display), plus an author, I think we'll see some response time problems. I know that XC is using a FRBR-ish design. VTLS also has one. Can anyone comment on the relative efficiency, or how one can mitigate the design to improve response time? Also, is a triple store more efficient? kc On 5/18/12 6:28 PM, Simon Spero wrote: On Fri, May 18, 2012 at 8:03 PM, Joe M Tomich jtom...@uwm.edu mailto:jtom...@uwm.edu wrote: Simon, In your model, does the stored information for an individual author or publisher constitute a record within a table (as would likely be the case in a typical relational database), or is each author, publisher, etc. effectively its own table? Typically you would have a table for each type of entity; you wouldn't have a table for each instance (that would be a lot of tables :-) In the examples I gave I actually presented four different models, representing different ways of using a relational model. In the first model we had a table where the reference to the right entry in the names table was included as a column in the table for bibliographic records. In this case we have 2 tables In the second model, we created a separate join table, which had a reference to an entry in the bibliographic records table, and a reference to an entry in the names table (this approach can be used with fields that could have multiple values for the same record, e.g. added entries). In this case we have 3 tables. In the third model, we had a separate table for every property, each with two columns. One column identified the thing that this was a property of (for example bib record number 9); the other gave a value of that property - in a performer table this might be value of n91064231, or possibly http://lccn.loc.gov/n91064231 ). In this case we have a separate table for every property, not for every record. The subject, table name, and value correspond to the three parts of an RDF triple. In the fourth model, we store the subject, property name, and value in a single table. This corresponds to a naive implementation of a triple store. In this case we only have a single table. Does this make things clearer? Simon -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] RDA, DBMS and RDF
I'm coming into this discussion somewhat late so apologies if the following has been covered. As someone who works both with MARC directly as a cataloger and MS Access (both in the same ILS--Voyager) I was very interested in this discussion and wanted to weigh in. My question: Is it RDA that is incompatible with relational database principles, or does the underlying nature of information that a library must convey with respect to its holdings prevent that information from integrating fully into a relational database environment? The building blocks of a relational database are, of course, tables containing attributes of a particular entity. A classic, real-world example would attributes such as name, address, age, SSN, etc. of employees (entities) of a company. The strength of a relational database (“RDB” hereafter) is the ability to link those tables and pull together different attributes of user-sought combinations of entities, while eliminating data redundancy. The purpose in typical scenarios is to gain information about the entities themselves. In a library catalog, however, the entities are really means to an end, an end which is largely pre-defined and often inherently redundant. Ostensibly, it's tempting to look at the relationship between a bibliographic and authority record and, due to the parallels with fully realized relational databases, see the potential to move library catalogs more in that direction. The problem is the libraries' need to store and convey what are, in effect, fixed, unique relationships involving redundant entities. While an author's name may be redundant in and of itself, its relationship to title, publisher, year, etc. in each of the author’s works in the catalog is unique, and each of those unique relationships needs to be captured and conveyed. I don’t think they can be generated on the fly by linking tables. In some ways, we've incorporated RDB principles already, such as the use of an authority record to store earlier forms of an author's name, which eliminates the need to place these in the relevant bib records. While there is certainly room for improvement (static linking of the bib and authority records of controlled fields as one possible example), I think the scope is limited. In essence, we've already identified the minimum information necessary to convey, again, those unique relationships between entities (author, title, publisher, year, etc.) that constitute a work in the library's collection. If we were interested in attributes of the individual entities in MARC records (give me books published between 1990 and 2005 whose publisher is a publicly traded company) we could also make more use of RDB principles (linking via publisher name to a publisher table that states whether publishers are public or private), but again, in a library catalog, the entities themselves are of relatively little interest beyond their role in the creation or description of the information being sought and, for that, the pieces of information in each (currently MARC) record seem to me both sufficient and necessary. Thus, while I share with my colleagues many of the stated concerns with RDA, I think there are some limiting factors to consider when using relational database principles as a standard of measure. Joe Tomich UW-Milwaukee Libraries
Re: [RDA-L] RDA, DBMS and RDF
Joe, Thanks for this very pertinent comment. This is exactly what I have been wondering myself. Kathleen F. Lamantia, MLIS Technical Services Librarian Stark County District Library 715 Market Avenue North Canton, OH 44702 330-458-2723 klaman...@starklibrary.org Inspiring Ideas ∙ Enriching Lives ∙ Creating Community -Original Message- From: Joe Tomich [mailto:jtom...@uwm.edu] Sent: Friday, May 18, 2012 9:12 AM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] RDA, DBMS and RDF I'm coming into this discussion somewhat late so apologies if the following has been covered. As someone who works both with MARC directly as a cataloger and MS Access (both in the same ILS--Voyager) I was very interested in this discussion and wanted to weigh in. My question: Is it RDA that is incompatible with relational database principles, or does the underlying nature of information that a library must convey with respect to its holdings prevent that information from integrating fully into a relational database environment? The building blocks of a relational database are, of course, tables containing attributes of a particular entity. A classic, real-world example would attributes such as name, address, age, SSN, etc. of employees (entities) of a company. The strength of a relational database (RDB hereafter) is the ability to link those tables and pull together different attributes of user-sought combinations of entities, while eliminating data redundancy. The purpose in typical scenarios is to gain information about the entities themselves. In a library catalog, however, the entities are really means to an end, an end which is largely pre-defined and often inherently redundant. Ostensibly, it's tempting to look at the relationship between a bibliographic and authority record and, due to the parallels with fully realized relational databases, see the potential to move library catalogs more in that direction. The problem is the libraries' need to store and convey what are, in effect, fixed, unique relationships involving redundant entities. While an author's name may be redundant in and of itself, its relationship to title, publisher, year, etc. in each of the author's works in the catalog is unique, and each of those unique relationships needs to be captured and conveyed. I don't think they can be generated on the fly by linking tables. In some ways, we've incorporated RDB principles already, such as the use of an authority record to store earlier forms of an author's name, which eliminates the need to place these in the relevant bib records. While there is certainly room for improvement (static linking of the bib and authority records of controlled fields as one possible example), I think the scope is limited. In essence, we've already identified the minimum information necessary to convey, again, those unique relationships between entities (author, title, publisher, year, etc.) that constitute a work in the library's collection. If we were interested in attributes of the individual entities in MARC records (give me books published between 1990 and 2005 whose publisher is a publicly traded company) we could also make more use of RDB principles (linking via publisher name to a publisher table that states whether publishers are public or private), but again, in a library catalog, the entities themselves are of relatively little interest beyond their role in the creation or description of the information being sought and, for that, the pieces of information in each (currently MARC) record seem to me both sufficient and necessary. Thus, while I share with my colleagues many of the stated concerns with RDA, I think there are some limiting factors to consider when using relational database principles as a standard of measure. Joe Tomich UW-Milwaukee Libraries
Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)
Simon, In your model, does the stored information for an individual author or publisher constitute a record within a table (as would likely be the case in a typical relational database), or is each author, publisher, etc. effectively its own table? Joe Tomich UW-Milwaukee Libraries - Original Message - From: Simon Spero sesunc...@gmail.com To: RDA-L@LISTSERV.LAC-BAC.GC.CA Sent: Friday, May 18, 2012 2:33:25 PM Subject: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF) [I am top posting to preserve context,; I'm re-adding BIBFRAME to the address list because much of this is relevant to Bibliographic Framework issues. Since issues of RDA content are directly involved and the question was posed on RDA-L, this message is sent to both lists] I am taking the central question raised in the original message to be the one listed above to be: [1] Are RDA, MARC, and Bibliographic concepts compatible with relational database principles or systems, and should this be the standard of measure for evaluating RDA qua RDA. There are other implied questions raised, including the nature of entities, and the whether identifying and separating out some entities is useful for library purposes. [2] Publishers are used as an example. [3] The question is also raised as to whether and how , given that an author's name may be redundant in and of itself, its relationship to title, publisher, year, etc. in each of the author’s works in the catalog is unique, and each of those unique relationships needs to be captured and conveyed, a full record can be generated by linking (or joining) tables. We can gain some insight in to the fundamental questions by looking at the questions in reverse order. [3] It is indeed possible to generate a full record by joining together things from different tables. We'll look at a couple of ways that this can be done using a simple example involving just two tables - one table containing authors, the other containing everything else apart from the author name. The first thing to keep in mind is that for something to be an entity is that it must have an identity (No Entity without Identity! was Quine's slogan ). The criteria used to identify something can be intrinsic . For example, the entries in a table of authorized personal names can be uniquely identified by the name, dates, relationship to a specified work et. al (the 100 field in the MARC authorities). Alternatively, we can identify the name entity using an assigned identifier- for example, the LCCN for the authority record (010) ; a locally assigned identifier (.e.g. the 001); or the URL for the name entity assigned at viaf.org . If we want to connect the name entity from a bibliographic record, we need to establish some sort of connection between them. If each bibliographic record can have at most one main entry for personal name, we can include an identifier for the author entity directly in the table for the bibliographic record. If we use the first style of identifying the name record (i.e. the 100 field), we end up with a bibliographic record that doesn't look very different from the first record. However, any changes to the authorized name must also be applied to all of the bibliographic records that refer to that name. This problem does not occur if we use the other kind of identifier (for example. the LCCN). When we want to fetch the whole record, we link the two tables together using the identifier as a key to look up the right entry in the names table. For example, if the names table has fields for (name-lccn, name-heading), and the original bibliographic record table has fields for (bib-lccn, name-heading,title, publisher,date), we can create a new table for the bibliographic record that has fields (bib-lccn, name-lccn, title,publisher,date). We regenerate the original record by fetching entries from both tables, fetching the name entry whose name-lccn is equal to the name-lccn in the bibliographic record entry. We don't have to store the name-lccn in the bibliographic record directly. We can instead create a third table to store the main entry. This third table must carry identifiers for both the name record and the bibliographic record - e.g. (bib-lccn, name-lccn). This approach may seem like extra work if the table is only used to hold the main entry, but is needed in standard relational databases for fields that are repeatable - for example, added entries. We should note here that we could create a separate table for every field in the original bibliographic record, with one field naming the bibliographic record that the value is a property of, and the second holding the either a simple value directly, or an identifier for a more complex value. To recreate the original record, we fetch all of the values from all of the tables whose
Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)
On Fri, May 18, 2012 at 8:03 PM, Joe M Tomich jtom...@uwm.edu wrote: Simon, In your model, does the stored information for an individual author or publisher constitute a record within a table (as would likely be the case in a typical relational database), or is each author, publisher, etc. effectively its own table? Typically you would have a table for each type of entity; you wouldn't have a table for each instance (that would be a lot of tables :-) In the examples I gave I actually presented four different models, representing different ways of using a relational model. In the first model we had a table where the reference to the right entry in the names table was included as a column in the table for bibliographic records. In this case we have 2 tables In the second model, we created a separate join table, which had a reference to an entry in the bibliographic records table, and a reference to an entry in the names table (this approach can be used with fields that could have multiple values for the same record, e.g. added entries). In this case we have 3 tables. In the third model, we had a separate table for every property, each with two columns. One column identified the thing that this was a property of (for example bib record number 9); the other gave a value of that property - in a performer table this might be value of n91064231, or possibly http://lccn.loc.gov/n91064231 ). In this case we have a separate table for every property, not for every record. The subject, table name, and value correspond to the three parts of an RDF triple. In the fourth model, we store the subject, property name, and value in a single table. This corresponds to a naive implementation of a triple store. In this case we only have a single table. Does this make things clearer? Simon
Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)
Jonathan, there is nothing wrong with testing out ways to retrieve a record with multiple subject headings that share some keywords. It's probably the most common case we have. I don't know why you see experimentation as wrong. If the RDBMS doesn't give the desired result, then we should move on to other technologies. The question at hand is: do headings give us the desired result using the common technology of our library systems? If not, should something change about how we do headings, or do we need a different technology? The underlying question, though, is what do we want headings to accomplish in our systems? I happen to think that implementation details and cataloging practice must inform each other. kc On 5/16/12 10:17 PM, Jonathan Rochkind wrote: Certainly you can come up with an infinite number of wrong ways to do it that won't get the results you want. With any given technology. I do not understand why you are trying to come up with wrong ways to do this arbitrary goal, you seem to be working on refining your software approaches with the goal of finding something that won't work. Why would anyone want to do that? In addition to a nearly infinite number of wrong ways to accomplish this particular goal, there are also a few right ways to do it. There are several other designs using a rdbms, in addition to the one Simon prototypes, that could also give you the results I think you're describing. Results that it's not entirely clear to me any user actually wants, but if they did, we could do it. With an rdbms, with something else. The technology used for your database or text index or search engine is an implementation detail. Good metadata with the semantics needed to answer the questions you might want to put it to (without having to make the computer guess probabilistically) matters -- if it's there, systems can be created to do what you want. Sure, with a rdbms. Or with specialized inverted indexing tools. Or a combination. Or something else. The best tools will depend on exactly what you're wanting to do, as well as the scale (in various dimensions), the current availability/cost of various options, etc. These are questions for programmers and software engineers. If the right semantics are captured in the data, the tool can be built -- that is the question for metadata engineers and catalogers. (To be sure, some understanding of algorithms and other aspects of how computers work is important to be able to understand what software can get out of any given data modelled/represented in any given way). I don't understand what you're driving at, what the point of this conversation is. From: Resource Description and Access / Resource Description and Access [RDA-L@LISTSERV.LAC-BAC.GC.CA] on behalf of Karen Coyle [li...@kcoyle.net] Sent: Wednesday, May 16, 2012 8:46 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd) Thanks Simon, It's much better to have an actual mock-up than just a description. If I understand this correctly, to do this you do three separate queries. If you had been able to use a single query (e.g. if you had an overall keyword index), with UNION ALL would you have been able to retain instances where the same keyword appears more than once in the record? In other words, I'm wondering if one entry for the weasels came from the title and one from the subject heading. If one book had two subject headings, could you get this result just from a subject heading search? (I'm thinking that using a search on different indexes that match the search key rather than a single index is an added factor.) kc On 5/16/12 4:38 PM, Simon Spero wrote: On Wed, May 16, 2012 at 5:50 AM, Karen Coyleli...@kcoyle.netmailto:li...@kcoyle.net wrote: This confirms what I was saying about retrieval. There are some on this list that claim that there ARE systems that could do what I asked (the bibliographic record will display 3 times in the list of retrievals). I can explain (with a bunch of drawings) why each record appears only once. Those who disagree with me should point to an example, and then we can analyze the functionality. But I want to see something real. You seem to be saying that you can use drawings that will show that it is not possible to have records show up more than once in a search using DBMS. Despite my name, I prefer to do coding. So, rather than draw this out, I'll ask a DBMS - in this case I'll go with PostgresSQL, a mature, open source relational database system. I'll create a simplified database table, with columns for author, title, and the primary subject heading. I'll also add an id column, so we can see which row is which. This simplification is for exposition purposes. The database is real; only the data has been made up to annoy the French. Lets look at the content. # select id,title, author,subject1 from book; id
Re: [RDA-L] RDA, DBMS and RDF
to understand what software can get out of any given data modelled/represented in any given way). I don't understand what you're driving at, what the point of this conversation is. __**__ From: Resource Description and Access / Resource Description and Access [ RDA-L@LISTSERV.LAC-BAC.GC.CA] on behalf of Karen Coyle [li...@kcoyle.net] Sent: Wednesday, May 16, 2012 8:46 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd) Thanks Simon, It's much better to have an actual mock-up than just a description. If I understand this correctly, to do this you do three separate queries. If you had been able to use a single query (e.g. if you had an overall keyword index), with UNION ALL would you have been able to retain instances where the same keyword appears more than once in the record? In other words, I'm wondering if one entry for the weasels came from the title and one from the subject heading. If one book had two subject headings, could you get this result just from a subject heading search? (I'm thinking that using a search on different indexes that match the search key rather than a single index is an added factor.) kc On 5/16/12 4:38 PM, Simon Spero wrote: On Wed, May 16, 2012 at 5:50 AM, Karen Coyleli...@kcoyle.netmailto:** li...@kcoyle.net li...@kcoyle.net wrote: This confirms what I was saying about retrieval. There are some on this list that claim that there ARE systems that could do what I asked (the bibliographic record will display 3 times in the list of retrievals). I can explain (with a bunch of drawings) why each record appears only once. Those who disagree with me should point to an example, and then we can analyze the functionality. But I want to see something real. You seem to be saying that you can use drawings that will show that it is not possible to have records show up more than once in a search using DBMS. Despite my name, I prefer to do coding. So, rather than draw this out, I'll ask a DBMS - in this case I'll go with PostgresSQL, a mature, open source relational database system. I'll create a simplified database table, with columns for author, title, and the primary subject heading. I'll also add an id column, so we can see which row is which. This simplification is for exposition purposes. The database is real; only the data has been made up to annoy the French. Lets look at the content. # select id,title, author,subject1 from book; id | title| author | subject1 +-**---+--** ---+--** 5 | I hate rich people | Hollande, François | Politics--Gaffes and gaffers 2 | A brief history of white flags | Monkey, Cheese Eating-Surrender | France-History 4 | See France on twenty weasels a day | Weasel, Ima | France--Guidebooks 3 | We'll never surrender | Weasel, Ima | France--Fiction If we look at the data, we see four entries. Three of them have the word France in the subject field; one also has the word in the title. Although PostgresSQL has built in full text indexing, I'm not going to use it for this example; instead I'll just use standard SQL approximate matching - the LIKE operator. When we compare things using LIKE, the % character serves as a wild card. OPAC users may prefer to pronounce it '#'. For example, 'I hate rich people' LIKE '%France%' is false 'See France on twenty weasels a day' LIKE '%France%' is true Now we're going to try doing a search for 'France' anywhere in any of these fields. We'll also sort the results in alphabetical order, based on the field in which the word occurs. We'll do this by creating a query that has three parts - one for each field we'll be searching on. For each part of the query, we'll include the value of the matched field in a column in the result set that we'll call sort_key. Let's create the three parts of the query. First title: select id,title,author,subject1,title as sort_key from book where title like '%France%' Then subject: select id,title,author,subject1,**subject1 as sort_key from book where subject1 like '%France%' Finally author: select id,title,author,subject1,**author as sort_key from book where author like '%France%' (Notice that in one of these queries, we choose a different field to be the value of sort_key). Right now, we have three different queries- we need some way to combine them into a single set of results. Fortunately, we can do this using another standard SQL operator - UNION ALL.This command takes the results of two queries that return the same columns and turns them in to a single list of results. Using UNION ALL instead of UNION tells the database not to get rid of any duplicate rows. select id,title,author,subject1,title
Re: [RDA-L] RDA, DBMS and RDF
For reference, here is a recent authority record with 374 (occupation) using an LCSH term: LDR cz 22 n 4500 001 541951 005 20120514104731.0 008 800520n| acannaabn |a aaa 010 ‡an 79100565 035 ‡a(OCoLC)oca00332681 035 ‡a(DLC)n 79100565 035 ‡a(DLCn)703231 035 ‡a11654658 035 ‡a2898 040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC 046 ‡f19020204‡g19740826 100 1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974 370 ‡aDetroit, Mich.‡bHawaii 374 ‡aAir pilots‡2lcsh 400 1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974 670 ‡aVan Every, D. Charles Lindbergh, his life, 1927. 670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across the Atlantic) 670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia (Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 1974, Maui, Hawaii; American aviator, engineer, and Pulitzer Prize winner for autobiography, The Spirit of St. Louis; first to make nonstop solo flight across Atlantic; baby son kidnapped and murdered, 1932) Thomas Brenndorfer Guelph Public Library From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Sean Chen Sent: May 16, 2012 10:05 AM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] RDA, DBMS and RDF I agree values for field of activity and occupation elements should come from a controlled vocabulary, if anything to make the job of the person cataloging easier. I think I'd follow what Richard Moore says later on in the the thread: he emphasizes that a Linked Data approach would require this. Also I think the need to move away from the precoordinated Authorized Access Points and think about the rest of the elements that make up an authority record is really important. Or at least to think of them as separate beasts (which RDA does do, depending on your opinion). With field of activity, it seems to me to be less troublesome since a plural doesn't seem to cause too much dissonance in a heading (Economics vs. Economic; Statistics/Statistic) and in other situations LCSH has used a singular form; based on other guidance (Constitutional law vs Constitutional laws). Occupations are a bit more difficult with LCSH using plural a lot more; especially with headings in the category of classes of people which is where I think occupations would draw from. On top of that the actual term might often not line up with representation (Chemistry teacher vs. Professor of chemistry). Are there better vocabularies for occupations than LCSH? -- Sean Chen slc.c...@gmail.commailto:slc.c...@gmail.com On May 13, 2012, at 11:07 PM, Adam L. Schiff wrote: The elements that constitute authorized access points have been separated out in MARC because of RDA (such as, fuller form of name-- 378; form of work-- 380; dates-- 046, and these are encoded in externally referenced standards -- ISO 8601 and EDTF). Other elements, such as Field of Activity or Occupation can be linked to controlled vocabulary terms, such as LCSH headings. Except that LCSH occupation/profession headings are in the plural, while RDA terms would be in the singular. I'm not at all sure that you could singularize an LCSH heading and still code the subfield $2 of the 374 field for LCSH. What do others think about this? Some ideas for improving RDA that follow from the points raised: - Separate out Authorized Access Points entirely from the numbered instructions. Treat them as a sidebar, and have side-by-side links to the instructions for each individual element so one can see all the relevants instructions as one is constructing an authorized access point. This will further solidify the idea that Authorized Access Points are creatures belonging to some catalog implementations, but may not be needed in others. I'm also beginning to believe that we may need indicators in the MARC fields for the elements that would be included in an authorized access point, so that a machine could generate them on the fly. If you have recorded, for example, multiple professions/occupations, you might want to designate which one should go into the authorized access point. Or you might record one or more professions that would never go into the access point, and you might want to tell the system that too. The same is true for many other elements (e.g. associated place) that are sometimes needed in an access point but which might be recorded even when not needed to differentiate an entity/access point from another. ** * Adam L. Schiff * * Principal Cataloger* * University of Washington Libraries * * Box 352900 * * Seattle, WA 98195-2900 * * (206
Re: [RDA-L] RDA, DBMS and RDF
The question of plurals has come up in the discussions of vocabularies within JSC, since the vocabularies that are coded in the Open Metadata Registry (at http://rdvocab.info). The first thing to remember is that the words used are merely display forms; the actual data is an identifier (at least for any controlled list). In many cases you need singular in some situations and plural in others (1 map, 3 maps). The identifier for your vocabulary term in this case does not change; if you have give map the identifier http://something.org/23435; in your vocabulary list, it is the same in both situations. How to indicate a plural v. singular isn't clear yet, but it's an obvious need that many communities will have. The thing that we have to remember is that different natural languages handle this differently, so there needs to be a solution that works for as many language groups as possible. The key thing to remember, though, is that we are talking about *display* forms, not their underlying meaning when we contemplate singular v. plural. In most cases (at least the ones I have so far run into) we wouldn't want separate lists for singular and plural, only the option to use different displays based on the context. kc On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote: For reference, here is a recent authority record with 374 (occupation) using an LCSH term: LDR cz 22 n 4500 001 541951 005 20120514104731.0 008 800520n| acannaabn |a aaa 010 ‡an 79100565 035 ‡a(OCoLC)oca00332681 035 ‡a(DLC)n 79100565 035 ‡a(DLCn)703231 035 ‡a11654658 035 ‡a2898 040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC 046 ‡f19020204‡g19740826 100 1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974 370 ‡aDetroit, Mich.‡bHawaii 374 ‡aAir pilots‡2lcsh 400 1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974 670 ‡aVan Every, D. Charles Lindbergh, his life, 1927. 670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across the Atlantic) 670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia (Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 1974, Maui, Hawaii; American aviator, engineer, and Pulitzer Prize winner for autobiography, The Spirit of St. Louis; first to make nonstop solo flight across Atlantic; baby son kidnapped and murdered, 1932) Thomas Brenndorfer Guelph Public Library *From:*Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] *On Behalf Of *Sean Chen *Sent:* May 16, 2012 10:05 AM *To:* RDA-L@LISTSERV.LAC-BAC.GC.CA *Subject:* Re: [RDA-L] RDA, DBMS and RDF I agree values for field of activity and occupation elements should come from a controlled vocabulary, if anything to make the job of the person cataloging easier. I think I'd follow what Richard Moore says later on in the the thread: he emphasizes that a Linked Data approach would require this. Also I think the need to move away from the precoordinated Authorized Access Points and think about the rest of the elements that make up an authority record is really important. Or at least to think of them as separate beasts (which RDA does do, depending on your opinion). With field of activity, it seems to me to be less troublesome since a plural doesn't seem to cause too much dissonance in a heading (Economics vs. Economic; Statistics/Statistic) and in other situations LCSH has used a singular form; based on other guidance (Constitutional law vs Constitutional laws). Occupations are a bit more difficult with LCSH using plural a lot more; especially with headings in the category of classes of people which is where I think occupations would draw from. On top of that the actual term might often not line up with representation (Chemistry teacher vs. Professor of chemistry). Are there better vocabularies for occupations than LCSH? -- Sean Chen slc.c...@gmail.com mailto:slc.c...@gmail.com On May 13, 2012, at 11:07 PM, Adam L. Schiff wrote: The elements that constitute authorized access points have been separated out in MARC because of RDA (such as, fuller form of name-- 378; form of work-- 380; dates-- 046, and these are encoded in externally referenced standards -- ISO 8601 and EDTF). Other elements, such as Field of Activity or Occupation can be linked to controlled vocabulary terms, such as LCSH headings. Except that LCSH occupation/profession headings are in the plural, while RDA terms would be in the singular. I'm not at all sure that you could singularize an LCSH heading and still code the subfield $2 of the 374 field for LCSH. What do others think about this? Some ideas for improving RDA that follow from the points raised: - Separate out Authorized Access Points entirely from the numbered instructions. Treat them as a sidebar, and have side-by-side links to the instructions for each
Re: [RDA-L] RDA, DBMS and RDF
Just curious how these pieces can be implemented, within the current framework and any future ones … There are separately mapped singular and plural vocabulary values in the Open Metadata Registry: map http://rdvocab.info/termList/extentCarto/1004 maps http://rdvocab.info/termList/extentCarto/1013 And there is some interesting overlap, where extent terms for notated music are in singular and plural: Extent of notated music http://metadataregistry.org/concept/list/vocabulary_id/59.html and parallel terms are in singular only: Format of notated music http://metadataregistry.org/concept/list/vocabulary_id/109.html And it would be worthwhile knowing how these issues can be handled with the Linked Data link to the controlled vocabulary in the example authority record: Air pilots http://id.loc.gov/authorities/subjects/sh85002673 Thomas Brenndorfer Guelph Public Library From: Karen Coyle [mailto:li...@kcoyle.net] Sent: May 16, 2012 12:06 PM To: Resource Description and Access / Resource Description and Access Cc: Brenndorfer, Thomas Subject: Re: [RDA-L] RDA, DBMS and RDF The question of plurals has come up in the discussions of vocabularies within JSC, since the vocabularies that are coded in the Open Metadata Registry (at http://rdvocab.info). The first thing to remember is that the words used are merely display forms; the actual data is an identifier (at least for any controlled list). In many cases you need singular in some situations and plural in others (1 map, 3 maps). The identifier for your vocabulary term in this case does not change; if you have give map the identifier http://something.org/23435;http://something.org/23435 in your vocabulary list, it is the same in both situations. How to indicate a plural v. singular isn't clear yet, but it's an obvious need that many communities will have. The thing that we have to remember is that different natural languages handle this differently, so there needs to be a solution that works for as many language groups as possible. The key thing to remember, though, is that we are talking about *display* forms, not their underlying meaning when we contemplate singular v. plural. In most cases (at least the ones I have so far run into) we wouldn't want separate lists for singular and plural, only the option to use different displays based on the context. kc On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote: For reference, here is a recent authority record with 374 (occupation) using an LCSH term: LDR cz 22 n 4500 001 541951 005 20120514104731.0 008 800520n| acannaabn |a aaa 010 ‡an 79100565 035 ‡a(OCoLC)oca00332681 035 ‡a(DLC)n 79100565 035 ‡a(DLCn)703231 035 ‡a11654658 035 ‡a2898 040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC 046 ‡f19020204‡g19740826 100 1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974 370 ‡aDetroit, Mich.‡bHawaii 374 ‡aAir pilots‡2lcsh 400 1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974 670 ‡aVan Every, D. Charles Lindbergh, his life, 1927. 670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across the Atlantic) 670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia (Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 1974, Maui, Hawaii; American aviator, engineer, and Pulitzer Prize winner for autobiography, The Spirit of St. Louis; first to make nonstop solo flight across Atlantic; baby son kidnapped and murdered, 1932) Thomas Brenndorfer Guelph Public Library From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Sean Chen Sent: May 16, 2012 10:05 AM To: RDA-L@LISTSERV.LAC-BAC.GC.CAmailto:RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] RDA, DBMS and RDF I agree values for field of activity and occupation elements should come from a controlled vocabulary, if anything to make the job of the person cataloging easier. I think I'd follow what Richard Moore says later on in the the thread: he emphasizes that a Linked Data approach would require this. Also I think the need to move away from the precoordinated Authorized Access Points and think about the rest of the elements that make up an authority record is really important. Or at least to think of them as separate beasts (which RDA does do, depending on your opinion). With field of activity, it seems to me to be less troublesome since a plural doesn't seem to cause too much dissonance in a heading (Economics vs. Economic; Statistics/Statistic) and in other situations LCSH has used a singular form; based on other guidance (Constitutional law vs Constitutional laws). Occupations are a bit more difficult with LCSH using plural a lot more; especially with headings in the category of classes of people which
Re: [RDA-L] RDA, DBMS and RDF
The plural forms for RDA terms in the Open Metadata Registry represent an earlier state of our deliberations. I believe that all of them still have the status New-Proposed. The Joint Steering Committee is considering whether we need the plural forms of terms to be given explicitly (with distinct URIs) in the Registry. So far the discussion seems to be in the direction that Karen describes. Before the vocabularies in question are published, this issue will be resolved, and (if that is the decision) the plural forms will be deleted and only the singular forms will be published. John Attig ALA Representative to the Joint Steering Committee jx...@psu.edu - Original Message - | From: Thomas Brenndorfer tbrenndor...@library.guelph.on.ca | To: RDA-L@LISTSERV.LAC-BAC.GC.CA | Sent: Wednesday, May 16, 2012 1:21:39 PM | Subject: Re: [RDA-L] RDA, DBMS and RDF | Just curious how these pieces can be implemented, within the current | framework and any future ones … | There are separately mapped singular and plural vocabulary values in | the Open Metadata Registry: | map | http://rdvocab.info/termList/extentCarto/1004 | maps | http://rdvocab.info/termList/extentCarto/1013 | And there is some interesting overlap, where extent terms for notated | music are in singular and plural: | Extent of notated music | http://metadataregistry.org/concept/list/vocabulary_id/59.html | and parallel terms are in singular only: | Format of notated music | http://metadataregistry.org/concept/list/vocabulary_id/109.html | And it would be worthwhile knowing how these issues can be handled | with the Linked Data link to the controlled vocabulary in the | example authority record: | Air pilots | http://id.loc.gov/authorities/subjects/sh85002673 | Thomas Brenndorfer | Guelph Public Library | From: Karen Coyle [mailto:li...@kcoyle.net] | Sent: May 16, 2012 12:06 PM | To: Resource Description and Access / Resource Description and Access | Cc: Brenndorfer, Thomas | Subject: Re: [RDA-L] RDA, DBMS and RDF | The question of plurals has come up in the discussions of | vocabularies within JSC, since the vocabularies that are coded in | the Open Metadata Registry (at http://rdvocab.info ). The first | thing to remember is that the words used are merely display forms; | the actual data is an identifier (at least for any controlled | list). In many cases you need singular in some situations and plural | in others (1 map, 3 maps). The identifier for your vocabulary term | in this case does not change; if you have give map the identifier | http://something.org/23435; in your vocabulary list, it is the same | in both situations. How to indicate a plural v. singular isn't clear | yet, but it's an obvious need that many communities will have. The | thing that we have to remember is that different natural languages | handle this differently, so there needs to be a solution that works | for as many language groups as possible. The key thing to remember, | though, is that we are talking about *display* forms, not their | underlying meaning when we contemplate singular v. plural. In most | cases (at least the ones I have so far run into) we wouldn't want | separate lists for singular and plural, only the option to use | different displays based on the context. | kc | On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote: | For reference, here is a recent authority record with 374 | (occupation) using an LCSH term: | LDR cz 22 n 4500 | 001 541951 | 005 20120514104731.0 | 008 800520n| acannaabn |a aaa | 010 ‡a n 79100565 | 035 ‡a (OCoLC)oca00332681 | 035 ‡a (DLC)n 79100565 | 035 ‡a (DLCn)703231 | 035 ‡a 11654658 | 035 ‡a 2898 | 040 ‡a DLC ‡c DLC ‡d DLC ‡d MoSpS-AV ‡d DLC | 046 ‡f 19020204 ‡g 19740826 | 100 1 ‡a Lindbergh, Charles A. ‡q (Charles Augustus), ‡d 1902-1974 | 370 ‡a Detroit, Mich. ‡b Hawaii | 374 ‡a Air pilots ‡2 lcsh | 400 1 ‡w nna ‡a Lindbergh, Charles Augustus, ‡d 1902-1974 | 670 ‡a Van Every, D. Charles Lindbergh, his life, 1927. | 670 ‡a The entrepreneurs, an American adventure. Part 3, Expanding | America [VR] 1991, c1986: ‡b container (Charles Lindbergh; flew | across the Atlantic) | 670 ‡a Funk and Wagnalls WWW Home page, Dec. 11, 2000: ‡b | Encyclopedia (Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; | d. Aug. 26, 1974, Maui, Hawaii; American aviator, engineer, and | Pulitzer Prize winner for autobiography, The Spirit of St. Louis; | first to make nonstop solo flight across Atlantic; baby son | kidnapped and murdered, 1932) | Thomas Brenndorfer | Guelph Public Library | From: Resource Description and Access / Resource Description and | Access [ mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA ] On Behalf Of Sean | Chen | Sent: May 16, 2012 10:05 AM | To: RDA-L@LISTSERV.LAC-BAC.GC.CA | Subject: Re: [RDA-L] RDA, DBMS and RDF | I agree values for field of activity and occupation elements should | come from a controlled vocabulary, if anything to make the job of | the person cataloging easier. I think I'd follow what Richard
Re: [RDA-L] RDA, DBMS and RDF
On 5/16/12 10:21 AM, Brenndorfer, Thomas wrote: And it would be worthwhile knowing how these issues can be handled with the Linked Data link to the controlled vocabulary in the example authority record: Air pilots http://id.loc.gov/authorities/subjects/sh85002673 I think the answer is that we don't know yet, but that this is an issue that libraries and the semantic web community need to work on together. We may be the first community that has extensive examples in this area. Remember that the semantic web standards that exist today are kind of the ground floor standards. There is a lot of work going on to create the upper storeys. I'll check and see if this has been brought up to the W3C yet, and if not explore how to get it on their radar. kc Thomas Brenndorfer Guelph Public Library *From:*Karen Coyle [mailto:li...@kcoyle.net] *Sent:* May 16, 2012 12:06 PM *To:* Resource Description and Access / Resource Description and Access *Cc:* Brenndorfer, Thomas *Subject:* Re: [RDA-L] RDA, DBMS and RDF The question of plurals has come up in the discussions of vocabularies within JSC, since the vocabularies that are coded in the Open Metadata Registry (at http://rdvocab.info). The first thing to remember is that the words used are merely display forms; the actual data is an identifier (at least for any controlled list). In many cases you need singular in some situations and plural in others (1 map, 3 maps). The identifier for your vocabulary term in this case does not change; if you have give map the identifier http://something.org/23435; http://something.org/23435 in your vocabulary list, it is the same in both situations. How to indicate a plural v. singular isn't clear yet, but it's an obvious need that many communities will have. The thing that we have to remember is that different natural languages handle this differently, so there needs to be a solution that works for as many language groups as possible. The key thing to remember, though, is that we are talking about *display* forms, not their underlying meaning when we contemplate singular v. plural. In most cases (at least the ones I have so far run into) we wouldn't want separate lists for singular and plural, only the option to use different displays based on the context. kc On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote: For reference, here is a recent authority record with 374 (occupation) using an LCSH term: LDR cz 22 n 4500 001 541951 005 20120514104731.0 008 800520n| acannaabn |a aaa 010 ‡an 79100565 035 ‡a(OCoLC)oca00332681 035 ‡a(DLC)n 79100565 035 ‡a(DLCn)703231 035 ‡a11654658 035 ‡a2898 040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC 046 ‡f19020204‡g19740826 100 1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974 370 ‡aDetroit, Mich.‡bHawaii 374 ‡aAir pilots‡2lcsh 400 1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974 670 ‡aVan Every, D. Charles Lindbergh, his life, 1927. 670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across the Atlantic) 670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia (Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 1974, Maui, Hawaii; American aviator, engineer, and Pulitzer Prize winner for autobiography, The Spirit of St. Louis; first to make nonstop solo flight across Atlantic; baby son kidnapped and murdered, 1932) Thomas Brenndorfer Guelph Public Library *From:*Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] *On Behalf Of *Sean Chen *Sent:* May 16, 2012 10:05 AM *To:* RDA-L@LISTSERV.LAC-BAC.GC.CA mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA *Subject:* Re: [RDA-L] RDA, DBMS and RDF I agree values for field of activity and occupation elements should come from a controlled vocabulary, if anything to make the job of the person cataloging easier. I think I'd follow what Richard Moore says later on in the the thread: he emphasizes that a Linked Data approach would require this. Also I think the need to move away from the precoordinated Authorized Access Points and think about the rest of the elements that make up an authority record is really important. Or at least to think of them as separate beasts (which RDA does do, depending on your opinion). With field of activity, it seems to me to be less troublesome since a plural doesn't seem to cause too much dissonance in a heading (Economics vs. Economic; Statistics/Statistic) and in other situations LCSH has used a singular form; based on other guidance (Constitutional law vs Constitutional laws). Occupations are a bit more difficult with LCSH using plural a lot more; especially with headings in the category of classes of people which is where I think occupations would draw from. On top of that the actual term might often not line up with representation (Chemistry teacher vs. Professor of chemistry
Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)
On Wed, May 16, 2012 at 5:50 AM, Karen Coyle li...@kcoyle.net wrote: This confirms what I was saying about retrieval. There are some on this list that claim that there ARE systems that could do what I asked (the bibliographic record will display 3 times in the list of retrievals). I can explain (with a bunch of drawings) why each record appears only once. Those who disagree with me should point to an example, and then we can analyze the functionality. But I want to see something real. You seem to be saying that you can use drawings that will show that it is not possible to have records show up more than once in a search using DBMS. Despite my name, I prefer to do coding. So, rather than draw this out, I'll ask a DBMS - in this case I'll go with PostgresSQL, a mature, open source relational database system. I'll create a simplified database table, with columns for author, title, and the primary subject heading. I'll also add an id column, so we can see which row is which. This simplification is for exposition purposes. The database is real; only the data has been made up to annoy the French. Lets look at the content. *# select id,title, author,subject1 from book;* id | title| author | subject1 ++-+-- 5 | I hate rich people | Hollande, François | Politics--Gaffes and gaffers 2 | A brief history of white flags | Monkey, Cheese Eating-Surrender | France-History 4 | See France on twenty weasels a day | Weasel, Ima | France--Guidebooks 3 | We'll never surrender | Weasel, Ima | France--Fiction If we look at the data, we see four entries. Three of them have the word France in the subject field; one also has the word in the title. Although PostgresSQL has built in full text indexing, I'm not going to use it for this example; instead I'll just use standard SQL approximate matching - the LIKE operator. When we compare things using LIKE, the % character serves as a wild card. OPAC users may prefer to pronounce it '#'. For example, 'I hate rich people' LIKE '%France%' is false 'See France on twenty weasels a day' LIKE '%France%' is true Now we're going to try doing a search for 'France' anywhere in any of these fields. We'll also sort the results in alphabetical order, based on the field in which the word occurs. We'll do this by creating a query that has three parts - one for each field we'll be searching on. For each part of the query, we'll include the value of the matched field in a column in the result set that we'll call sort_key. Let's create the three parts of the query. First title: select id,title,author,subject1,title as sort_key from book where title like '%France%' Then subject: select id,title,author,subject1,subject1 as sort_key from book where subject1 like '%France%' Finally author: select id,title,author,subject1,author as sort_key from book where author like '%France%' (Notice that in one of these queries, we choose a different field to be the value of sort_key). Right now, we have three different queries- we need some way to combine them into a single set of results. Fortunately, we can do this using another standard SQL operator - UNION ALL.This command takes the results of two queries that return the same columns and turns them in to a single list of results. Using UNION ALL instead of UNION tells the database *not* to get rid of any duplicate rows. select id,title,author,subject1,title as sort_key from book where title like '%France%' UNION ALL select id,title,author,subject1,subject1 as sort_key from book where subject1 like '%France%' UNION ALL select id,title,author,subject1,author as sort_key from book where author like '%France%' Finally, we'll sort the results using the sort_key column we created. It seems appropriate. To do this, we'll add an ORDER BY sort_key clause to the end of the query. Let's put it all together and see what happens when we execute the query. *# select id,title,author,subject1,title as sort_key from book where title like '%France%' * * UNION ALL* *select id,title,author,subject1,subject1 as sort_key from book where subject1 like '%France%' * * UNION ALL * *select id,title,author,subject1,author as sort_key from book where author like '%France%' * * ORDER BY sort_key;* id | title| author | subject1 | sort_key ++-++ 3 | We'll never surrender | Weasel, Ima | France--Fiction| France--Fiction 4 | See France on twenty weasels a day | Weasel, Ima | France--Guidebooks | France--Guidebooks 2 | A brief history of white flags | Monkey, Cheese Eating-Surrender |
Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)
Thanks Simon, It's much better to have an actual mock-up than just a description. If I understand this correctly, to do this you do three separate queries. If you had been able to use a single query (e.g. if you had an overall keyword index), with UNION ALL would you have been able to retain instances where the same keyword appears more than once in the record? In other words, I'm wondering if one entry for the weasels came from the title and one from the subject heading. If one book had two subject headings, could you get this result just from a subject heading search? (I'm thinking that using a search on different indexes that match the search key rather than a single index is an added factor.) kc On 5/16/12 4:38 PM, Simon Spero wrote: On Wed, May 16, 2012 at 5:50 AM, Karen Coyle li...@kcoyle.net mailto:li...@kcoyle.net wrote: This confirms what I was saying about retrieval. There are some on this list that claim that there ARE systems that could do what I asked (the bibliographic record will display 3 times in the list of retrievals). I can explain (with a bunch of drawings) why each record appears only once. Those who disagree with me should point to an example, and then we can analyze the functionality. But I want to see something real. You seem to be saying that you can use drawings that will show that it is not possible to have records show up more than once in a search using DBMS. Despite my name, I prefer to do coding. So, rather than draw this out, I'll ask a DBMS - in this case I'll go with PostgresSQL, a mature, open source relational database system. I'll create a simplified database table, with columns for author, title, and the primary subject heading. I'll also add an id column, so we can see which row is which. This simplification is for exposition purposes. The database is real; only the data has been made up to annoy the French. Lets look at the content. *# select id,title, author,subject1 from book;* id | title| author | subject1 ++-+-- 5 | I hate rich people | Hollande, François | Politics--Gaffes and gaffers 2 | A brief history of white flags | Monkey, Cheese Eating-Surrender | France-History 4 | See France on twenty weasels a day | Weasel, Ima | France--Guidebooks 3 | We'll never surrender | Weasel, Ima | France--Fiction If we look at the data, we see four entries. Three of them have the word France in the subject field; one also has the word in the title. Although PostgresSQL has built in full text indexing, I'm not going to use it for this example; instead I'll just use standard SQL approximate matching - the LIKE operator. When we compare things using LIKE, the % character serves as a wild card. OPAC users may prefer to pronounce it '#'. For example, 'I hate rich people' LIKE '%France%' is false 'See France on twenty weasels a day' LIKE '%France%' is true Now we're going to try doing a search for 'France' anywhere in any of these fields. We'll also sort the results in alphabetical order, based on the field in which the word occurs. We'll do this by creating a query that has three parts - one for each field we'll be searching on. For each part of the query, we'll include the value of the matched field in a column in the result set that we'll call sort_key. Let's create the three parts of the query. First title: select id,title,author,subject1,title as sort_key from book where title like '%France%' Then subject: select id,title,author,subject1,subject1 as sort_key from book where subject1 like '%France%' Finally author: select id,title,author,subject1,author as sort_key from book where author like '%France%' (Notice that in one of these queries, we choose a different field to be the value of sort_key). Right now, we have three different queries- we need some way to combine them into a single set of results. Fortunately, we can do this using another standard SQL operator - UNION ALL.This command takes the results of two queries that return the same columns and turns them in to a single list of results. Using UNION ALL instead of UNION tells the database /not/ to get rid of any duplicate rows. select id,title,author,subject1,title as sort_key from book where title like '%France%' UNION ALL select id,title,author,subject1,subject1 as sort_key from book where subject1 like '%France%' UNION ALL select id,title,author,subject1,author as sort_key from book where author like '%France%' Finally, we'll sort the results using the sort_key column we created. It seems appropriate. To do this, we'll add an ORDER BY sort_key
Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)
Certainly you can come up with an infinite number of wrong ways to do it that won't get the results you want. With any given technology. I do not understand why you are trying to come up with wrong ways to do this arbitrary goal, you seem to be working on refining your software approaches with the goal of finding something that won't work. Why would anyone want to do that? In addition to a nearly infinite number of wrong ways to accomplish this particular goal, there are also a few right ways to do it. There are several other designs using a rdbms, in addition to the one Simon prototypes, that could also give you the results I think you're describing. Results that it's not entirely clear to me any user actually wants, but if they did, we could do it. With an rdbms, with something else. The technology used for your database or text index or search engine is an implementation detail. Good metadata with the semantics needed to answer the questions you might want to put it to (without having to make the computer guess probabilistically) matters -- if it's there, systems can be created to do what you want. Sure, with a rdbms. Or with specialized inverted indexing tools. Or a combination. Or something else. The best tools will depend on exactly what you're wanting to do, as well as the scale (in various dimensions), the current availability/cost of various options, etc. These are questions for programmers and software engineers. If the right semantics are captured in the data, the tool can be built -- that is the question for metadata engineers and catalogers. (To be sure, some understanding of algorithms and other aspects of how computers work is important to be able to understand what software can get out of any given data modelled/represented in any given way). I don't understand what you're driving at, what the point of this conversation is. From: Resource Description and Access / Resource Description and Access [RDA-L@LISTSERV.LAC-BAC.GC.CA] on behalf of Karen Coyle [li...@kcoyle.net] Sent: Wednesday, May 16, 2012 8:46 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd) Thanks Simon, It's much better to have an actual mock-up than just a description. If I understand this correctly, to do this you do three separate queries. If you had been able to use a single query (e.g. if you had an overall keyword index), with UNION ALL would you have been able to retain instances where the same keyword appears more than once in the record? In other words, I'm wondering if one entry for the weasels came from the title and one from the subject heading. If one book had two subject headings, could you get this result just from a subject heading search? (I'm thinking that using a search on different indexes that match the search key rather than a single index is an added factor.) kc On 5/16/12 4:38 PM, Simon Spero wrote: On Wed, May 16, 2012 at 5:50 AM, Karen Coyle li...@kcoyle.netmailto:li...@kcoyle.net wrote: This confirms what I was saying about retrieval. There are some on this list that claim that there ARE systems that could do what I asked (the bibliographic record will display 3 times in the list of retrievals). I can explain (with a bunch of drawings) why each record appears only once. Those who disagree with me should point to an example, and then we can analyze the functionality. But I want to see something real. You seem to be saying that you can use drawings that will show that it is not possible to have records show up more than once in a search using DBMS. Despite my name, I prefer to do coding. So, rather than draw this out, I'll ask a DBMS - in this case I'll go with PostgresSQL, a mature, open source relational database system. I'll create a simplified database table, with columns for author, title, and the primary subject heading. I'll also add an id column, so we can see which row is which. This simplification is for exposition purposes. The database is real; only the data has been made up to annoy the French. Lets look at the content. # select id,title, author,subject1 from book; id | title| author | subject1 ++-+-- 5 | I hate rich people | Hollande, François | Politics--Gaffes and gaffers 2 | A brief history of white flags | Monkey, Cheese Eating-Surrender | France-History 4 | See France on twenty weasels a day | Weasel, Ima | France--Guidebooks 3 | We'll never surrender | Weasel, Ima | France--Fiction If we look at the data, we see four entries. Three of them have the word France in the subject field; one also has the word in the title. Although PostgresSQL has built in full text indexing, I'm
Re: [RDA-L] RDA, DBMS and RDF
Adam Except that LCSH occupation/profession headings are in the plural, while RDA terms would be in the singular. I'm not at all sure that you could singularize an LCSH heading and still code the subfield $2 of the 374 field for LCSH. What do others think about this? I think that if we are to use LCSH terms for occupations in 374, we should use them as they appear in LCSH: that is, in the plural. It's the only approach that makes sense to me if we are thinking in terms of linked data. This is the advice I've given to our group of cataloguers who are creating RDA authorities: LCSH terms for classes of persons are given in the plural. Use LCSH terms concisely and only include subdivisions when necessary. Subdivisions should be indicated with a double dash. _ Richard Moore Authority Control Team Manager The British Library Tel.: +44 (0)1937 546806 E-mail: richard.mo...@bl.uk ** Experience the British Library online at http://www.bl.uk/ The British Library’s new interactive Annual Report and Accounts 2010/11 : http://www.bl.uk/annualreport2010-11http://www.bl.uk/knowledge Help the British Library conserve the world's knowledge. Adopt a Book. http://www.bl.uk/adoptabook The Library's St Pancras site is WiFi - enabled * The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the mailto:postmas...@bl.uk : The contents of this e-mail must not be disclosed or copied without the sender's consent. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author. * Think before you print
Re: [RDA-L] RDA, DBMS and RDF
On 13/05/2012 19:49, Karen Coyle wrote: snip All, After struggling for a long time with my frustration with the difficulties of dealing with MARC, FRBR and RDA concepts in the context of data management, I have done a blog post that explains some of my thinking on the topic: http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html The short summary is that RDA is not really suitable for storage and use in a relational database system, and therefore is even further from being suitable for RDF. I use headings (access points in RDA, I believe) as my example, but there are numerous other aspects of RDA that belie its intention to support scenario one. I have intended to write something much more in depth on this topic but as that has been in progress now for a considerable time, I felt that a short, albeit incomplete, explanation was needed. I welcome all discussion on this topic. /snip This is really good. I question whether libraries primarily need a new relational database model for our catalogs, especially one based on FRBR. I still have never seen a practical advantage over what can be done now. The power of the Lucene-type full-text engines and the searches they allow and their speed are simply stunning, and nothing can compare to them right now. There are versions such as the Zebra indexing system in Koha, which was created for bibliographic records and very similar to Lucene. http://www.indexdata.com/zebra and the guide http://www.indexdata.com/zebra/doc/zebra.pdf. A relational database would be far too slow if used in conjunction with a huge database such as Google. So, some catalogs use the DBMS only for record maintenance, then everything is indexed in Lucene for searching, while the displays are made from the XML versions of the records. The DBMS is there only for storage and maintenance. This is how Koha works and could be more or less how Worldcat works as well, but these are not the only catalogs that work like this. Still, I will say that much of this lies beyond the responsibility of cataloging per se, and goes into that of systems. But on the other hand, your point that library headings are not relational and are actually based on browsing textual strings really is a responsibility of cataloging. It is also absolutely true and should be a matter of general debate. The text strings haven't worked in years because what worked rather clearly in a card catalog did not work online. I've written about this before, but there was a discussion on Autocat not too long ago. Here is one of my posts where I discussed the issue and offered an alternative to the current display of the headings found under Edgar Allen Poe: http://blog.jweinheimer.net/2012/04/re-acat-death-of-dictionary-catalog-was.html I still maintain that we do not really know what the public wants yet. Everything is in a state of change right now, so it will take a lot of research, along with trial and error, to find out. I do think that people would want the traditional power of the catalog, but they will not use left-anchored text strings. The way it works now is far too clunky and new methods for the web must be found. Paths such as you point out would lead to genuine change and possible improvements in how our catalogs function for the public, which is the major road we need to take. -- *James Weinheimer* weinheimer.ji...@gmail.com *First Thus* http://catalogingmatters.blogspot.com/ *Cooperative Cataloging Rules* http://sites.google.com/site/opencatalogingrules/ *Cataloging Matters Podcasts* http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html
Re: [RDA-L] RDA, DBMS and RDF
The authorized access point part of RDA is one of the carryovers from AACR2, which we hope eventually will become unnecessary in a Scenario 1 environment, other than as a default display form. There are several areas of RDA that had to be carried over from AACR2 simply because discussions with the relevant communities had not been completed (e.g., with the Music community, law, religion, etc. - and those discussions are underway). We also will be renewing conversations with the publishing community to revisit the RDA/ONIX framework. RDA will continue to evolve and improve with the help of our international collaborations. - Barbara Tillett, Chair, Joint Steering Committee for Development of RDA -Original Message- From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle Sent: Sunday, May 13, 2012 1:49 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: [RDA-L] RDA, DBMS and RDF All, After struggling for a long time with my frustration with the difficulties of dealing with MARC, FRBR and RDA concepts in the context of data management, I have done a blog post that explains some of my thinking on the topic: http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html The short summary is that RDA is not really suitable for storage and use in a relational database system, and therefore is even further from being suitable for RDF. I use headings (access points in RDA, I believe) as my example, but there are numerous other aspects of RDA that belie its intention to support scenario one. I have intended to write something much more in depth on this topic but as that has been in progress now for a considerable time, I felt that a short, albeit incomplete, explanation was needed. I welcome all discussion on this topic. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] RDA, DBMS and RDF
Three possible scenarios are described in Tom Delsey's paper RDA Database Implementation Scenarios available on the JSC web site (http://www.rda-jsc.org/docs/5editor2rev.pdf). Judy Kuhagen, Secretary Joint Steering Committee for Development of RDA -Original Message- From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Tillett, Barbara Sent: Monday, May 14, 2012 6:44 AM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: Re: [RDA-L] RDA, DBMS and RDF The authorized access point part of RDA is one of the carryovers from AACR2, which we hope eventually will become unnecessary in a Scenario 1 environment, other than as a default display form. There are several areas of RDA that had to be carried over from AACR2 simply because discussions with the relevant communities had not been completed (e.g., with the Music community, law, religion, etc. - and those discussions are underway). We also will be renewing conversations with the publishing community to revisit the RDA/ONIX framework. RDA will continue to evolve and improve with the help of our international collaborations. - Barbara Tillett, Chair, Joint Steering Committee for Development of RDA -Original Message- From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle Sent: Sunday, May 13, 2012 1:49 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: [RDA-L] RDA, DBMS and RDF All, After struggling for a long time with my frustration with the difficulties of dealing with MARC, FRBR and RDA concepts in the context of data management, I have done a blog post that explains some of my thinking on the topic: http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html The short summary is that RDA is not really suitable for storage and use in a relational database system, and therefore is even further from being suitable for RDF. I use headings (access points in RDA, I believe) as my example, but there are numerous other aspects of RDA that belie its intention to support scenario one. I have intended to write something much more in depth on this topic but as that has been in progress now for a considerable time, I felt that a short, albeit incomplete, explanation was needed. I welcome all discussion on this topic. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] RDA, DBMS and RDF
On 5/14/12 3:43 AM, Tillett, Barbara wrote: The authorized access point part of RDA is one of the carryovers from AACR2, which we hope eventually will become unnecessary in a Scenario 1 environment, other than as a default display form. Barbara, can you say more about this? Do you have examples? (Or could you make some up?) What type of retrieval would be made on RDA records compared to how we retrieve on records today? Has anyone mocked up data displays? (that aren't in MARC) It might be that I just haven't found the right site or documentation that answers my questions. kc There are several areas of RDA that had to be carried over from AACR2 simply because discussions with the relevant communities had not been completed (e.g., with the Music community, law, religion, etc. - and those discussions are underway). We also will be renewing conversations with the publishing community to revisit the RDA/ONIX framework. RDA will continue to evolve and improve with the help of our international collaborations. - Barbara Tillett, Chair, Joint Steering Committee for Development of RDA -Original Message- From: Resource Description and Access / Resource Description and Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle Sent: Sunday, May 13, 2012 1:49 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: [RDA-L] RDA, DBMS and RDF All, After struggling for a long time with my frustration with the difficulties of dealing with MARC, FRBR and RDA concepts in the context of data management, I have done a blog post that explains some of my thinking on the topic: http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html The short summary is that RDA is not really suitable for storage and use in a relational database system, and therefore is even further from being suitable for RDF. I use headings (access points in RDA, I believe) as my example, but there are numerous other aspects of RDA that belie its intention to support scenario one. I have intended to write something much more in depth on this topic but as that has been in progress now for a considerable time, I felt that a short, albeit incomplete, explanation was needed. I welcome all discussion on this topic. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [RDA-L] RDA, DBMS and RDF
Mac, I did a search on the subject term France and on the 3d page of hits (sorted by title) there were two titles that seemed to be for the same item. Instead, they do turn out to be two records because there are two volumes. Here's the case that I'm trying to get to -- let's say you have a record with 3 subject headings: Working class -- France Working class -- Dwellings -- France Housing -- France In a card catalog, these would result in 3 separate cards and therefore should you look all through the subject card catalog you would see the book in question 3 times. In a keyword search limited to subject headings, most systems would retrieve this record once and display it once. That has to do with how the DBMS resolves from indexes to records. So even though a keyword may appear more than once in a record, the record is only retrieved once. In your catalog, which displays the subject headings on a line with the author and title 1) will each of these subject headings appear in the display? 2) does that mean that the bibliographic record (represented by the author and title) will display 3 times in the list of retrievals? kc On 5/14/12 3:02 PM, J. McRee Elrod wrote: Karen, Because ebrary (through whom CEL and some other clients distribute MARC records) can only accommodate one 856$u per record, those clients must have a monograph record for each volume of a multivolume set, and each issue of a serial (e.g., yearbooks) having its own PDF URL. I suspect that is why you saw what appeared to be the same record more than once. When an individual volume has a distinctive title, that title goes in 245$a, and the set or serial title in 490/8XX. But if not, we must use 245$n, with the set or serial title in 245$a. As I keep saying over and over and over, our problems arise from systems limitations, not ISBD/AACR2/MARC21 limitations. The building should have received out attention before the building blocks. If what you saw was because of a 245 and a 246 being very similar, or for some other reason, please cite an example and Matt can tell you how his OPAC handles that. __ __ J. McRee (Mac) Elrod (m...@slc.bc.ca) {__ | / Special Libraries Cataloguing HTTP://www.slc.bc.ca/ ___} |__ \__ Forwarded message Date: Mon, 14 May 2012 10:26:19 -0700 From: Matt Elrodm...@elrod.ca To: J. McRee Elrodm...@slc.bc.ca Subject: Re: [RDA-L] RDA, DBMS and RDF Mac, I would need to know which title seems to appear twice in a hit list to answer this question. Distinct records might *appear* to be duplicates for multi-volume sets for example. Recall that SLC sometimes creates redundant monograph records to handle sets and serials. Matt On 14/05/2012 9:58 AM, J. McRee Elrod wrote: Karen asked: Mac, I'd love to see your file design. I did find an example of a record that appears more than once in a single list, and I am wondering if you had to replicate the record in the database to accomplish that, or if you have another way to retrieve a record more than once on a single keyword retrieval. I'm copying your question to the designermatt@elrod who should be able to answer your question. http://www.canadianelectroniclibrary.ca/cel-arc.html __ __ J. McRee (Mac) Elrod (m...@slc.bc.ca) {__ | / Special Libraries Cataloguing HTTP://www.slc.bc.ca/ ___} |__ \__ -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
[RDA-L] RDA, DBMS and RDF
All, After struggling for a long time with my frustration with the difficulties of dealing with MARC, FRBR and RDA concepts in the context of data management, I have done a blog post that explains some of my thinking on the topic: http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html The short summary is that RDA is not really suitable for storage and use in a relational database system, and therefore is even further from being suitable for RDF. I use headings (access points in RDA, I believe) as my example, but there are numerous other aspects of RDA that belie its intention to support scenario one. I have intended to write something much more in depth on this topic but as that has been in progress now for a considerable time, I felt that a short, albeit incomplete, explanation was needed. I welcome all discussion on this topic. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
[RDA-L] RDA, DBMS and RDF
From: Resource Description and Access / Resource Description and Access [RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle [li...@kcoyle.net] Sent: May-13-12 1:49 PM To: RDA-L@LISTSERV.LAC-BAC.GC.CA Subject: [RDA-L] RDA, DBMS and RDF All, After struggling for a long time with my frustration with the difficulties of dealing with MARC, FRBR and RDA concepts in the context of data management, I have done a blog post that explains some of my thinking on the topic: http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html The reason that RDA continues to instruct catalogers to create pre-coordinated headings is that RDA supports backwards compatibility with existing catalog implementations (Scenarios 2 and 3). Authorized access points are otherwise treated as sidebars in RDA-- they're separated in RDA from instructions for the consistuent elements, and recognized throughout as only one method of identifying a specific entity (and there should be only one unique authorized access point per entity-- it's not a valid criticism to say they each point to only one or a very small number of records-- but there are some weaknesses, such as undifferentiated persons and expression headings for translations that do not go far enough in differentiating expression entities). The elements that constitute authorized access points have been separated out in MARC because of RDA (such as, fuller form of name-- 378; form of work-- 380; dates-- 046, and these are encoded in externally referenced standards -- ISO 8601 and EDTF). Other elements, such as Field of Activity or Occupation can be linked to controlled vocabulary terms, such as LCSH headings. An element like Undiffentieated Name Indicator (RDA 8.11) refers only to the core elements as being insufficient to differentiate between two or more persons with the same name. It does not refer to the concatenated authorized access point as being insufficient to differentiate the entity. Generally, there is a separation between consideration of the elements and the construction of an authorized access point, although there is overlap. That being said, there is some bleeding of the concept of the authorized access point into decisions about individual elements. Example for Preferred Title of the Work (RDA 6.2.2.1) -- The preferred title of the work is the title or form of title chosen as the basis for the authorized access point representing the work. There is room for improvement for supporting connections between data. But it's not always necessary -- one doesn't always need a separate lookup table for an element's value (even pre-set drop-down values can be built into a single table without referencing an external table, similar to using macros or keyboard shortcuts to create quasi-controlled free-text strings). Some ideas for improving RDA that follow from the points raised: - Separate out Authorized Access Points entirely from the numbered instructions. Treat them as a sidebar, and have side-by-side links to the instructions for each individual element so one can see all the relevants instructions as one is constructing an authorized access point. This will further solidify the idea that Authorized Access Points are creatures belonging to some catalog implementations, but may not be needed in others. - Make better use of the FRAD distinction between the Name and the Actual Entity (RDA treats the Name of a Person as an attribute of a Person, whereas it would be better to see the Name as being related to the Person entity). The reason why this might be useful is that the Names of entities can be linked to the sources found (i.e., to specific manifestations, so as to track the frequency of usage of a name). This is better than the justify the added entry concept which is the old standby for determining frequency of usage-- the Preferred Form of Names of entities do change, and this would continue to be the case in all implementations, even in those that don't use Authorized Access Points!! - Continue to identify elements that can be linked to lookup tables or controlled vocabulary. But do not sacrifice the principle of representation -- there is a need to identify how an entity represents itself and transcribe as found versus normalizing the data for better machine-processing. Both objectives can co-exist (as they do to some extent now, such as with MARC fixed fields representing controlled terms to go with variable transcribed text fields). - Allow RDA content instructions to easily merge with specific encoding rules. The RDA Element Set has started this with combining RDA instructions with related MARC instructions, but there is a need for a streamlined set of instructions that can leap from content instructions directly into encoding rules for specific applications-- and ideally right down to an ILS's specific conventions. Thomas Brenndorfer Guelph Public Library