Re: [CODE4LIB] LCSH and Linked Data
The short version of this lengthy post is that there's really no value in worrying about how to handle precoordinated strings except for purposes of busting them up. The Rube Goldberg style precoordination rules that cause so many headaches were developed to address challenges brought about by paper card catalogs. The physicality of paper required a mechanism to ensure a limited number of cards would file together. Unless you still use a paper catalog, they're as relevant as spurs are to race car drivers. The order you see in the MARC record mimics the paper rules exactly (because MARC was used mostly for card printing for decades) and has also lead to literally tens of millions of unique subject strings as there are so many permutations. As a practical matter, even highly trained librarians cannot guess how these were put together without going through a substantial research process. I hate to dig up stuff written in the 1920's that's rammed down the throats of first semester library school students. However, in the case at hand, logic from these works has direct application for purposes of making MARC data usable. To summarize, the concept is that subjects can be broken down into aspects (i.e. facets) with the primary ones time, place, action, material, and personality -- you can think of this last category as natural groupings of the type that standardized subdivisions can be applied to such as materials, animals, corporate entitities, diseases, body parts, etc. It's much better to think of the facets (time, place, etc) as attributes rather than occuring in any particular order as this allows interactive and relatively precise drilling through huge amounts of data. You'll notice that good search engines effectively do just that. kyle One of the challenges for pre-coordinated strings at least as currently implemented (that facets evade) is that no order will suit everyone. Which of the following is better? Dwellings $z Australia $x History $y 20th century Dwellings $z Indonesia $x Economic aspects Dwellings $z Indonesia $x Psychological aspects Dwellings $z Indonesia $x Social aspects Dwellings $z Ireland $x Economic aspects Dwellings $z Ireland $x Psychological aspects Dwellings $z Ireland $x Social aspects Dwellings $z Japan $x Economic aspects Dwellings $z Japan $x Psychological aspects Dwellings $z Japan $x Social aspects OR (mostly current practice) *Dwellings $z Australia $x History $y 20th century **Current practice Dwellings $x Economic aspects $z Indonesia Dwellings $x Economic aspects $z Ireland Dwellings $x Economic aspects $z Japan *Dwellings $x History $z Australia $y 20th century **Airlie recommendation Dwellings $x Psychological aspects $z Indonesia Dwellings $x Psychological aspects $z Ireland Dwellings $x Psychological aspects $z Japan Dwellings $x Social aspects $z Indonesia Dwellings $x Social aspects $z Ireland Dwellings $x Social aspects $z Japan
Re: [CODE4LIB] LCSH and Linked Data [cataloging]
On Apr 17, 2011, at 10:58 AM, Bill Dueber wrote: OK, so I've been trying to follow all of this, and have to say, I'm finding it all very interesting. I want to give a special shout-out to the cataloger who have joined in; I (and, I think, much of code4lib) need this kind of input on a much more regular basis than we've been getting it. I concur. It is nice to have the balance of traditional cataloging mixed with 21st Century hacking. At the same time, it behooves some of us Code4Libbers to be a part of hardcore mailing lists like AUTOCAT. Think Hillmann's talk at the most recent conference. -- Eric Needs To Practice What He Preaches Morgan University of Notre Dame
Re: [CODE4LIB] LCSH and Linked Data
On 4/17/2011 10:58 AM, Bill Dueber wrote: At the same time, I'm finding it hard to determine if we're converging on when trying to turn LCSH into reasonable facets, here's what you need to do or when trying to turn LCSH into reasonable facets, you've haven't got a freakin' prayer. Can someone help me here? FAST has done it somehow -- turned LCSH into reasonable facets. But I'm not sure if there's a good overview available of how. reasonable is certainly something of degree too. I don't think you can do 'reasonably' with a pretty rough and ready approach. I need to blog about the just _few_ things I've done to normalize LCSH for facetting in my blacklight-based catalog, which I think gets it pretty close to 'reasonable'. Heck, some people think just taking the subdivisions on marc subfields and splitting em into facets is 'reasonable', although it results in many oddities. But, I think the utlimate answer to your question with full precision and based on full knowledge of LCSH is not yet determiend -- unless the FAST people have figured it out and can share what they've figured out in a useful way.
Re: [CODE4LIB] LCSH and Linked Data
For FAST, see Chan and O'Neill (2010). There are large parts of FAST where the editors wisely opted to punt on the more intractable parts. Simon Chan, Lois Mai and O'Neill, Ed (2010). FAST, Faceted Application of Subject Terminology: Principles and Application. Libraries Unlimited. ISBN: 9781591587224 On Mon, Apr 18, 2011 at 11:07 AM, Jonathan Rochkind rochk...@jhu.eduwrote: On 4/17/2011 10:58 AM, Bill Dueber wrote: At the same time, I'm finding it hard to determine if we're converging on when trying to turn LCSH into reasonable facets, here's what you need to do or when trying to turn LCSH into reasonable facets, you've haven't got a freakin' prayer. Can someone help me here? FAST has done it somehow -- turned LCSH into reasonable facets. But I'm not sure if there's a good overview available of how. reasonable is certainly something of degree too. I don't think you can do 'reasonably' with a pretty rough and ready approach. I need to blog about the just _few_ things I've done to normalize LCSH for facetting in my blacklight-based catalog, which I think gets it pretty close to 'reasonable'. Heck, some people think just taking the subdivisions on marc subfields and splitting em into facets is 'reasonable', although it results in many oddities. But, I think the utlimate answer to your question with full precision and based on full knowledge of LCSH is not yet determiend -- unless the FAST people have figured it out and can share what they've figured out in a useful way.
Re: [CODE4LIB] LCSH and Linked Data [cataloging]
Oh jeez, I'm not sure I'd suggest AutoCat. Even I can't bear that. But the RDA-L list has a fair amount of discussion that still dusts off the traditional issues and tries to figure out what sill matters. Diane Hillmann On Apr 17, 2011, at 10:58 AM, Bill Dueber wrote: OK, so I've been trying to follow all of this, and have to say, I'm finding it all very interesting. I want to give a special shout-out to the cataloger who have joined in; I (and, I think, much of code4lib) need this kind of input on a much more regular basis than we've been getting it. I concur. It is nice to have the balance of traditional cataloging mixed with 21st Century hacking. At the same time, it behooves some of us Code4Libbers to be a part of hardcore mailing lists like AUTOCAT. Think Hillmann's talk at the most recent conference.
Re: [CODE4LIB] LCSH and Linked Data
On Sun, Apr 17, 2011 at 7:40 AM, Simon Spero s...@unc.edu wrote: The main study on this subject was the Michigan study performed/led by Karen Markey (some reports were written as Karen M. Drabenstott. The final report of the project is available at http://deepblue.lib.umich.edu/handle/2027.42/57992 . The work took place in the mid to late 90s, after Airlie . ... The most perplexing results were those that showed that measured understanding was lower when headings were displayed in the context of a bibliographic record rather than on their own. This indicates either a problem in the measurement process, or an either more fundamental problem with subdivided headings that may so negate the significant theoretical advantages of pre-coordination that the value of the whole practice is thrown in to doubt. That is fascinating. And disturbing. I don't think I ever read the original study, but now I'll have to. Touching on another topic, I believe that the movement of geographical subdivisions to follow the right most geographically sub-dividable subdivision can sometimes be interrupted by the interposition of a $x topical subdivision, but I haven't determined whether this is a legacy exception (the ones that came to mind were related to subtopics of the US Civil War, which seems inevitable given that the first elements are United States--History--Civil War, 1861-1865--). I think the key here is partly In 1992, it was decided to adopt that order where it could be applied. so LC didn't promise to do them all. $x History is probably the biggest one that hasn't been made geographically subdividable, but it's hard to say if that's on principle or because of practical concerns about the huge amount of disruption that would cause in individual systems. It's interesting that some of the biggies like economic aspects are more recent. One of the challenges for pre-coordinated strings at least as currently implemented (that facets evade) is that no order will suit everyone. Which of the following is better? Dwellings $z Australia $x History $y 20th century Dwellings $z Indonesia $x Economic aspects Dwellings $z Indonesia $x Psychological aspects Dwellings $z Indonesia $x Social aspects Dwellings $z Ireland $x Economic aspects Dwellings $z Ireland $x Psychological aspects Dwellings $z Ireland $x Social aspects Dwellings $z Japan $x Economic aspects Dwellings $z Japan $x Psychological aspects Dwellings $z Japan $x Social aspects OR (mostly current practice) *Dwellings $z Australia $x History $y 20th century **Current practice Dwellings $x Economic aspects $z Indonesia Dwellings $x Economic aspects $z Ireland Dwellings $x Economic aspects $z Japan *Dwellings $x History $z Australia $y 20th century **Airlie recommendation Dwellings $x Psychological aspects $z Indonesia Dwellings $x Psychological aspects $z Ireland Dwellings $x Psychological aspects $z Japan Dwellings $x Social aspects $z Indonesia Dwellings $x Social aspects $z Ireland Dwellings $x Social aspects $z Japan Probably not helpful to have history be an outlier, though. Kelley
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 15, 2011 at 7:21 PM, Kelley McGrath kell...@uoregon.edu wrote: It used to be that geographical subdivision was much more flexible and was supposed to convey different meanings depending on where it occurred in the string. Then there was some research showing that not only did users not know how to interpret this, but catalogers did not understand these rules and were constructing inconsistent headings. The main study on this subject was the Michigan study performed/led by Karen Markey (some reports were written as Karen M. Drabenstott. The final report of the project is available at http://deepblue.lib.umich.edu/handle/2027.42/57992 . The work took place in the mid to late 90s, after Airlie . This study had serious methodological problems; these became apparent during the course of the study, and were partly due to the results being so unexpected. Unfortunately, there have not been any follow up studies at scale that would correct for these methodological issues. Some of the scoring approaches used by the Gleitmans for Phrase and Paraphrase might be revealing. The most perplexing results were those that showed that measured understanding was lower when headings were displayed in the context of a bibliographic record rather than on their own. This indicates either a problem in the measurement process, or an either more fundamental problem with subdivided headings that may so negate the significant theoretical advantages of pre-coordination that the value of the whole practice is thrown in to doubt. (Incidentally, this year is the diamond anniversary of the pre- v. post- debate) Touching on another topic, I believe that the movement of geographical subdivisions to follow the right most geographically sub-dividable subdivision can sometimes be interrupted by the interposition of a $x topical subdivision, but I haven't determined whether this is a legacy exception (the ones that came to mind were related to subtopics of the US Civil War, which seems inevitable given that the first elements are United States--History--Civil War, 1861-1865--). Simon
Re: [CODE4LIB] LCSH and Linked Data
A few belated ramblings from a cataloger: 1) GEOGRAPHICAL SUBDIVISION It used to be that geographical subdivision was much more flexible and was supposed to convey different meanings depending on where it occurred in the string. Then there was some research showing that not only did users not know how to interpret this, but catalogers did not understand these rules and were constructing inconsistent headings. This led to a movement for simplification. From LC's Subject Heading Manual: The Subject Subdivisions Conference that took place at Airlie, Virginia, in 1991 recommended that the standard order of subdivisions be [topic]–[place]–[chronology]–[form]. In 1992, it was decided to adopt that order where it could be applied. This leaves a standard order of $a, $b [rare], $x, $z, $y, $v with some exceptions. As was pointed out earlier, the current rule is to put the geographic subdivision ($$z) as near the end as is legal. This can be mechanically determined based on a fixed field in the authority record. Although fixed fields in bib records are often unreliable, those in authority records are probably as accurate as they can reasonably be made to be, allowing for human error. This is both because LC coordinates training and reviews records and because the fixed fields are used as decision points so there are short-term consequences for later catalogers if they're not done right. The fixed field (008/06) in LCSH authority records that tells you if a geographic subdivision can come after the heading (http://www.loc.gov/marc/authority/ad008.html). Id.loc.gov doesn't seem to give you that info, but it might be nice if it did. 650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided geographically-indirect] $z England [n 82068148] $x Finance [sh2002007885, Geo Subd = # = Not subdivided geographically] 650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided geographically-indirect] $x Economic aspects [sh 99005484 Geo Subd = i = Subdivided geographically-indirect] $z England [n 82068148]. One reason not to rely on found order is that LC has been moving in the direction of the Airlie House recommendation so in addition to the usual mistakes, you'll probably come across a lot of older forms if you take data from the wild. For example, until somewhat recently, the economic aspects record above looked like the finance one so you'll probably still see records like 650 _0 $a Education $z England $x Economic aspects. A) Indirect Subdivision In general, when a heading string starts with a geographic name, it is in direct order: 651 _0 $a London (England) [n 79005665] $x Economic conditions [sh 99005736]. If a geographic name is modifying a topical heading, it is given in indirect order: 650 _0 $a Education [sh 85040989] $z England $z London [n 79005665; covers both $z subfields]. Thanks to a project that OCLC did for FAST (which uses only the indirect style), in most cases both of these can be extracted from the authority record, which will have a 781 with the indirect form added: n 79005665 151 $a London (England) 451 $a Londinium (England) ... 781 0 $z England $z London Some records (usually for geographic areas within cities) cannot be used to modify topical headings, but can be used in 651$a as the main term in a heading string. There are identified by a note and lack of 781. n 85192245 151 $a Hackney (London, England) 667 $a SUBJECT USAGE: This heading is not valid for use as a geographic subdivision. B) Geographic Entities and Name vs. Subject Headings Notice that in the above example, the control number/identifier for Education starts with sh while the one for London starts with n. This is an important distinction. Heading identifiers that start with sh are LCSH terms found in the subject authority file and are available from id.loc.gov. I think these all fall into FRBR's group 3 bib entities. Heading identifiers that start with n are stored in the LC NAF (Name Authority File) and are not available as linked data. These are the FRBR group 1 and 2 entities and maybe some from group 3. Most of these can also be used as subjects in LCSH. So you can't actually get at all the building blocks of LCSH strings nor use linked data for all subjects. Named geographic features (e.g., mountains, lakes, continents) are established in the subject authority file using the rules in the Subject Cataloging Manual for LCSH. The headings are tagged 151 and can be found at id.loc.gov. sh 85082617 151 $a McKinley, Mount (Alaska) sh 85044620 151 $a Erie, Lake sh 85008606 151 $a Asia Geographic features appear in bib records only as 651 or 650+ $z subject terms. Jurisdiction names (e.g., cities, states, countries) are established in the name authority file using descriptive cataloging rules (e.g., AACR2 ch 23 and the NACO Participants' Manual). They
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 15, 2011 at 7:21 PM, Kelley McGrath kell...@uoregon.edu wrote: I’m sure this is way too much info for most (or all) on this list, but in case it is helpful, I thought I’d throw it out there. I disagree. I think this was fantastic and most enlightening. Most of us deal with this stuff all the time, yet we (obviously) have zero idea how it actually works, so it's nice to be schooled (and have this mini-lesson in LCSH contextually in the mailing list archives). Thanks for putting this out there, Kelley. -Ross.
Re: [CODE4LIB] LCSH and Linked Data
There is a lot of redundant data in MARC that is an encoded form of something that elsewhere is expressed as text -- somewhat controlled text, but text Much of this redundant input (think of the time!) could be eliminated if we quit keying text strings but allowed the display to derive from the coded data. ..because it does not get input consistently, it's hard to base any functionality on it since that functionality would apply only to a somewhat random subset of the records in the database. The reality with fixed fields is that few are used by *any* system. That provides a disincentive to spend loads of time (i.e. money) mucking about with them, particularly since they lack expressivity and practical use cases are not that compelling. In any case, even if everything suddenly started getting entered consistently today, you'd still have to deal with all the legacy data. Cataloging practices change. For example, the form subdivisions mentioned early in this thread have only been stored in |v for a few years now. Thoroughness of records is highly variable. This means that systems need to be built around the assumption that data are only somewhat consistent. As a result parsing and normalizing text is a far more realistic approach than messing with fixed fields. kyle
Re: [CODE4LIB] LCSH and Linked Data
Quoting Ross Singer rossfsin...@gmail.com: Yeah, this could get ugly pretty fast. It's a bit unclear to me what the distinction is between identical terms in both the geographic areas and the country codes (http://id.loc.gov/vocabulary/geographicAreas/e-uk-en http://id.loc.gov/vocabulary/countries/enk). Well, in LC's current representation, there *is* no distinction, they're both just skos:Concepts that (by virtue of skos:exactMatch) effectively interchangeable. The distinction is MARC-based. There is a lot of redundant data in MARC that is an encoded form of something that elsewhere is expressed as text -- somewhat controlled text, but text. The geographic area code is input in the coded data area of MARC (0XX) to make up for the fact that figuring out a geographic area from LC subject headings is difficult. This is not unlike having publication dates as text in the 260 $c and again in a fixed format in the 008 field. Much of this redundant input (think of the time!) could be eliminated if we quit keying text strings but allowed the display to derive from the coded data. The existence of all of the coded data fields in MARC is proof that there is some consciousness that text is not sufficient for some of the functionality that we would like to have in our systems. Unfortunately, because the coded data is not human-friendly AND is redundant, it does not get input consistently. And because it does not get input consistently, it's hard to base any functionality on it since that functionality would apply only to a somewhat random subset of the records in the database. So... here we are. kc See also http://id.loc.gov/vocabulary/geographicAreas/fa and http://id.loc.gov/authorities/sh85009230#concept. You have a single institution minting multiple URIs for what is effectively the same thing (albeit in different vocabularies), although, ironically, nothing points at any actual real world objects. VIAF doesn't do much better in this particular case (there are lots of examples where it does, mind you): http://viaf.org/viaf/142995804 (see: http://viaf.org/viaf/142995804/rdf.xml). We have all of these triangulations around the concept of England or Atlas mountains, but we can't actually refer to England or the Atlas mountains. Also, I am not somehow above this problem, either. With the linked MARC codes lists (http://purl.org/NET/marccodes/), I had to make a similar decision, I just chose to go the opposite route: define them as things, rather than concepts (http://purl.org/NET/marccodes/gacs/fa#location, http://purl.org/NET/marccodes/gacs/e-uk-en#location, http://purl.org/NET/marccodes/countries/enk#location, etc.), which presents its own set of problems (http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing no matter how liberal your definition). At some point, it's worth addressing what these things actually *are* and if, indeed, they are effectively the same thing, if it's worth preserving these redundancies, because I think they'll cause grief in the future. -Ross. -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] LCSH and Linked Data
Karen Miller works at Northwestern University where an authorities librarian has been maintaining, to the dot, the authority related records (headings, subdivisions, encoding, etc.) for over 20 years. If a cataloger there makes a mistake, that will be fixed by the refined set of procedures run consistently on their bibliographic vis-a-vis authorities files. There is no other such institution catalog in the US. LC have often invited that authorities librarian to come fix their collection as well. Bill, which new evidence have you found for almost rupture or depression regarding reflection on geographic names? The fact AUTOCAT librarians started to assist our discussions is in fact grounds for rapture (pun intended) as we improve analysis, *Ya'aqov* * * * * * * *On Fri, Apr 8, 2011 at 5:07 PM, Bill Dueber b...@dueber.com wrote: * *2011/4/8 Karen Miller k-mill...@northwestern.edu I hope I'm not pointing out the obvious, * *That made me laugh so hard I almost ruptured something. Thank you so much for such a complete (please, god, tell me it's complete...) explanation. It's a little depressing, but at least now I now why I'm depressed :-) -- Bill Dueber Library Systems Programmer University of Michigan Library * * ** *
Re: [CODE4LIB] LCSH and Linked Data
Am 07.04.2011 17:44, schrieb Ford, Kevin: Actually, it appears to depend on whose Authority record you're looking at. The Canadians, Australians, and Israelis have it as a CorporateName (110), as do the French (210 - unimarc); LC and the Germans say it's a Geographic Name. No, the original England record linked to VIAF in the German GND says it is a Gebietskörperschaft, which is a corporate body in English. See http://d-nb.info/gnd/15138-5/about/html and the RDF representation at http://d-nb.info/gnd/15138-5/about/rdf Perhaps something went wrong in the mapping of the German authority record to MARC21, so England got into the 151 (or there might be good reasons to do it that way, ask metadata experts...). The original record is not maintained in MARC21, we don't do MARC21 (or any MARC at all) here, we are just starting to switch to it as future(!) exchange format... :-). Sorry for being pedantic, early morning and not enough coffee yet... Till -- Till Kinstler Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger Sieben 1, D 37073 Göttingen kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de
Re: [CODE4LIB] LCSH and Linked Data
Thanks for all the information and discussion. I don't think I'm familiar enough with Authority file formats to completely comprehend - but I certainly understand the issues around the question of 'place' vs 'histo-geo-poltical entity'. Some of this makes me worry about the immediate applicability of the LC Authority files in the Linked Data space - someone said to me recently 'SKOS is just a way of avoiding dealing with the real semantics' :) Anyway - putting that to one side, the simplest approach for me at the moment seems to only look at authorised LCSH as represented on id.loc.gov. Picking up on Andy's first response: On Thu, Apr 7, 2011 at 3:46 PM, Houghton,Andrew hough...@oclc.org wrote: After having done numerous matching and mapping projects, there are some issues that you will face with your strategy, assuming I understand it correctly. Trying to match a heading starting at the left most subfield and working forward will not necessarily produce correct results when matching against the LCSH authority file. Using your example: 650 _0 $a Education $z England $x Finance is a good example of why processing the heading starting at the left will not necessarily produce the correct results. Assuming I understand your proposal you would first search for: 150 __ $a Education and find the heading with LCCN sh85040989. Next you would look for: 181 __ $z England and you would NOT find this heading in LCSH. OK - ignoring the question of where the best place to look for this is - I can live with not matching it for now. Later (perhaps when I understand it better, or when these headings are added to id.loc.gov we can revisit this) The second issue using your example is that you want to find the “longest” matching heading. While the pieces parts are there, so is the enumerated authority heading: 150 __ $a Education $z England as LCCN sh2008102746. So your heading is actually composed of the enumerated headings: sh2008102746150 __ $a Education $z England sh2002007885180 __ $x Finance and not the separate headings: sh85040989 150 __ $a Education n82068148 150 __ $a England sh2002007885180 __ $x Finance Although one could argue that either analysis is correct depending upon what you are trying to accomplish. What I'm interested in is representing the data as RDF/Linked Data in a way that opens up the best opportunities for both understanding and querying the data. Unfortunately at the moment there isn't a good way of representing LCSH directly in RDF (the MADS work may help I guess but to be honest at the moment I see that as overly complex - but that's another discussion). What I can do is make statements that an item is 'about' a subject (probably using dc:subject) and then point at an id.loc.gov URI. However, if I only express individual headings: Education England (natch) Finance Then obviously I lose the context of the full heading - so I also want to look for Education--England--Finance (which I won't find on id.loc.gov as not authorised) At this point I could stop, but my feeling is that it is useful to also look for other combinations of the terms: Education--England (not authorised) Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008) My theory is that as long as I stick to combinations that start with a topical term I'm not going to make startlingly inaccurate statements? The matching algorithm I have used in the past contains two routines. The first f(a) will accept a heading as a parameter, scrub the heading, e.g., remove unnecessary subfield like $0, $3, $6, $8, etc. and do any other pre-processing necessary on the heading, then call the second function f(b). The f(b) function accepts a heading as a parameter and recursively calls itself until it builds up the list LCCNs that comprise the heading. It first looks for the given heading when it doesn’t find it, it removes the **last ** subfield and recursively calls itself, otherwise it appends the found LCCN to the returned list and exits. This strategy will find the longest match. Unless I've misunderstood this, this strategy would not find 'Education--Finance'? Instead I need to remove each *subdivision* in turn (no matter where it appears in the heading order) and try all possible combinations checking each for a match on id.loc.gov. Again, I can do this without worrying about possible invalid headings, as these wouldn't have been authorised anyway... I can check the number of variations around this but I guess that in my limited set of records (only 30k) there will be a relatively small number of possible patterns to check. Does that make sense?
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens o...@ostephens.com wrote: Then obviously I lose the context of the full heading - so I also want to look for Education--England--Finance (which I won't find on id.loc.gov as not authorised) At this point I could stop, but my feeling is that it is useful to also look for other combinations of the terms: Education--England (not authorised) Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008) My theory is that as long as I stick to combinations that start with a topical term I'm not going to make startlingly inaccurate statements? I would definitely ask this question somewhere other than Code4lib (autocat, maybe?), since I think the answer is more complicated than this (although they could validate/invalidate your assumption about whether or not this approach would get you close enough). My understanding is that Education--England--Finance *is* authorized, because Education--Finance is and England is a free-floating geographic subdivision. Because it's also an authorized heading, Education--England--Finance is, in fact, an authority. The problem is that free-floating subdivisions cause an almost infinite number of permutations, so there aren't LCCNs issued for them. This is where things get super-wonky. It's also the reason I initially created lcsubjects.org, specifically to give these (and, ideally, locally controlled subject headings) a publishing platform/centralized repository, but it quickly grew to be more than just a side project. There were issues of how the data would be constructed (esp. since, at the time, I had no access to the NAF), how to reconcile changes, provenance, etc. Add to the fact that 2 years ago, there wasn't much linked library data going on, it was really hard to justify the effort. But, yeah, it would be worth running your ideas by a few catalogers to see what they think. -Ross.
[CODE4LIB] LCSH and Linked Data / Ross
*Hi and thank you Ross, Jonathan, and Andy, I do wish someone from LC would answer Jonathan's questions for all codes and geographic subdivision or subject implications. There's so much self-inflicted pain I can go through trying to revive my cataloging days. Here are some clarifications though: List of Geographic Areas is the macro list, whereby List of countries includes only countries as a subset from the macro list. MARC Code List for Countries [choice of a MARC code is generally related to information in field 260 (Publication, Distribution, etc. (Imprint)). The code recorded in 008/15-17 is used in conjunction with field 044 (Country of Producer Code) when more than one code is appropriate to an item.] MARC Geographic Area Codes are codes entered (according to geographic names in the 6xx fields) in field 043.* * * *The Country Codes and Geographic Area Codes are entered bureaucratically, bypassing Jonathan's refined distinctions. These tasks are outsourced to agencies separate from the catalogers assigning LCSH.* * * *Now it starts getting uglier, since upkeep for these lists differs in time and agency. Possibly new territory names are done now by NATO ... You would expect to see the same name in a code list and in a geographic name (151) . Sometimes you won't. Sometimes you'll see redundancies which confuse even more. So since:* 1. *LCSH has mistakes, inconsistencies* 2. *LC doesn't talk to CODE4LIB to answer our questions* 3. *OCLC will not talk to LC on our behalf* *we can create the geographic name list(s) we need. Since we know that 6xx forms for geographic names appear in 151 and 781 fields, we can create an index for those names for matching to 6xx in LCSH. Andrew, please complete/comment-on this list.* * * *Ya'aqov* * * * *
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 10:10 AM, Ross Singer rossfsin...@gmail.com wrote: But, yeah, it would be worth running your ideas by a few catalogers to see what they think. And if anyone does this...please please *please* write it up! -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] LCSH and Linked Data
Thanks Ross - I have been pushing some cataloguing folk to comment on some of this as well (and have some feedback) - but I take the point that wider consultation via autocat could be a good idea. (for some reason this makes me slightly nervous!)s In terms of whether Education--England--Finance is authorised or not - I think I took from Andy's response that it wasn't, but also looking at it on authorities.loc.gov it isn't marked as 'authorised'. Anyway - the relevant thing for me at this stage is that I won't find a match via id.loc.gov - so I can't get a URI for it anyway. There are clearly quite a few issues with interacting with LCSH as Linked Data at the moment - I'm not that keen on how this currently works, and my reaction to the MADS/RDF ontology is similar to that of Bruce D'Arcus (see http://metadata.posterous.com/lcs-madsrdf-ontology-and-the-future-of-the-se), but on the otherhand I want to embrace the opportunity to start joining some stuff up and seeing what happens :) Owen On Fri, Apr 8, 2011 at 3:10 PM, Ross Singer rossfsin...@gmail.com wrote: On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens o...@ostephens.com wrote: Then obviously I lose the context of the full heading - so I also want to look for Education--England--Finance (which I won't find on id.loc.gov as not authorised) At this point I could stop, but my feeling is that it is useful to also look for other combinations of the terms: Education--England (not authorised) Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008 ) My theory is that as long as I stick to combinations that start with a topical term I'm not going to make startlingly inaccurate statements? I would definitely ask this question somewhere other than Code4lib (autocat, maybe?), since I think the answer is more complicated than this (although they could validate/invalidate your assumption about whether or not this approach would get you close enough). My understanding is that Education--England--Finance *is* authorized, because Education--Finance is and England is a free-floating geographic subdivision. Because it's also an authorized heading, Education--England--Finance is, in fact, an authority. The problem is that free-floating subdivisions cause an almost infinite number of permutations, so there aren't LCCNs issued for them. This is where things get super-wonky. It's also the reason I initially created lcsubjects.org, specifically to give these (and, ideally, locally controlled subject headings) a publishing platform/centralized repository, but it quickly grew to be more than just a side project. There were issues of how the data would be constructed (esp. since, at the time, I had no access to the NAF), how to reconcile changes, provenance, etc. Add to the fact that 2 years ago, there wasn't much linked library data going on, it was really hard to justify the effort. But, yeah, it would be worth running your ideas by a few catalogers to see what they think. -Ross. -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
I'm a cataloger who has been following this discussion with interest, but not necessarily understanding all of it. I'll try to add what I can regarding the rules for constructing LCSH headings. My understanding is that Education--England--Finance *is* authorized, because Education--Finance is and England is a free-floating geographic subdivision. Because it's also an authorized heading, Education--England--Finance is, in fact, an authority. The problem is that free-floating subdivisions cause an almost infinite number of permutations, so there aren't LCCNs issued for them. Ross is essentially correct. Education is an authorized subject term that can be subdivided geographically. Finance is a free-floating subdivision that is authorized for use under subject terms that conform to parameters given in the scope notes in its authority record (680 fields), but it cannot be subdivided geographically. England is an authorized geographic subject term that can be added to any heading that can be subdivided geographically. Thus, Education -- England -- Finance is a valid LCSH heading, whereas Education -- Finance -- England would not be. This is wonky, and it's stuff like this that makes LCSH so unwieldy and difficult to validate, even for humans who actually have the capacity to learn and adjust to all of the various inconsistencies. I don't know how relevant it is to this particular discussion, but going forward I'm not sure how important it is to validate LCSH headings. I really appreciate developers who seek to preserve the semantic relationships present in the headings as much as possible; I believe many of them have value. But aren't there ways to preserve/extract that value without getting too bogged down in the inconsistent left-to-right structure of the existing headings? I hope this helps, at least a little bit. I'd be happy to answer additional questions. Shirley Shirley Lincicum Frustrated Cataloger
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 1:50 PM, Shirley Lincicum shirley.linci...@gmail.com wrote: Ross is essentially correct. Education is an authorized subject term that can be subdivided geographically. Finance is a free-floating subdivision that is authorized for use under subject terms that conform to parameters given in the scope notes in its authority record (680 fields), but it cannot be subdivided geographically. England is an authorized geographic subject term that can be added to any heading that can be subdivided geographically. Wait, so is it possible to know if England means the free-floating geographic entity or the country? Or is that just plain unknowable. Suddenly, my mouth is hungering for something gun-flavored. I know OCLC did some work trying to dis-integrate different types of terms with the FAST stuff, but it's not clear to me how I can leverage that (or anything else) to make LCSH at all useful as a search target or (even better) facet. Has anyone done anything with it?
Re: [CODE4LIB] LCSH and Linked Data
OK, as a cataloger who has been confused by the jurisdictional/place name distinction, I'm going to jump in here. Whether England means the free-floating geographic entity or the country is not quite unknowable -- it depends on the MARC codes that accompany it. The brief answer is this: a field used in a 651$a or a $z should match a 151 in the LC authorities. If the MARC field is 151 or 651 (let's just say x51), then the $a should match a 151 in the authority file. MARC subfield z ($z) is always a geographic subdivision and should match a 151. Here's where it gets tricky: If the MARC field is a x10 (110, 610, 710 – corporate bodies), then the $a should match a 110 or a 151 in the authority file. If the first indicator of such a MARC field is a 1, then it will probably match a 151 – first indicator 1 means that a heading is jurisdictional and may match a 151. For example: 110 1_ United States. ‡b Dept. of Agriculture There is a 151 United States in the LC authorities, but no 110 United States yet it can be used as a corporate body name in a bib. record with a 110 field. This is further confused by the VIAF, in which some national libraries have established the United States as a corporate body (110). At the risk of confusing things, I'd suggest looking at countries like the United States, Kenya or Canada as examples. England is not a great example because it's not a current jurisdiction name - there is a note in the LC authority record that reads Heading for England valid as a jurisdiction before 1536 only. Use (England) as qualifier for places (23.4D) and for nongovernment bodies (24.4C2). It is established as a 110 because it *used to be* a jurisdiction name and would be valid for works issued by the government prior to 1536. Obviously this note is of no use to a machine, but it explains why we aren't seeing it used as a jurisdiction (a corporate body) with subordinate bodies. I hope I'm not pointing out the obvious, but the use of names that appear in 151 fields in the authority file as 110 fields in bibliographic records confused me for a very long time; our authorities librarian explained it to me at least twice before the proverbial light bulb went on for me. Karen Karen D. Miller Monographic/Digital Projects Cataloger Bibliographic Services Dept. Northwestern University Library Evanston, IL k-mill...@northwestern.edu 847-467-3462 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill Dueber Sent: Friday, April 08, 2011 1:40 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data On Fri, Apr 8, 2011 at 1:50 PM, Shirley Lincicum shirley.linci...@gmail.com wrote: Ross is essentially correct. Education is an authorized subject term that can be subdivided geographically. Finance is a free-floating subdivision that is authorized for use under subject terms that conform to parameters given in the scope notes in its authority record (680 fields), but it cannot be subdivided geographically. England is an authorized geographic subject term that can be added to any heading that can be subdivided geographically. Wait, so is it possible to know if England means the free-floating geographic entity or the country? Or is that just plain unknowable. Suddenly, my mouth is hungering for something gun-flavored. I know OCLC did some work trying to dis-integrate different types of terms with the FAST stuff, but it's not clear to me how I can leverage that (or anything else) to make LCSH at all useful as a search target or (even better) facet. Has anyone done anything with it?
Re: [CODE4LIB] LCSH and Linked Data
2011/4/8 Karen Miller k-mill...@northwestern.edu I hope I'm not pointing out the obvious, That made me laugh so hard I almost ruptured something. Thank you so much for such a complete (please, god, tell me it's complete...) explanation. It's a little depressing, but at least now I now why I'm depressed :-) -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] LCSH and Linked Data
We are working on converting some MARC library records to RDF, and looking at how we handle links to LCSH (id.loc.gov) - and I'm looking for feedback on how we are proposing to do this... I'm not 100% confident about the approach, and to some extent I'm trying to work around the nature of how LCSH interacts with RDF at the moment I guess... but here goes - I would very much appreciate feedback/criticism/being told why what I'm proposing is wrong: I guess what I want to do is preserve aspects of the faceted nature of LCSH in a useful way, give useful links back to id.loc.gov where possible, and give access to a wide range of facets on which the data set could be queried. Because of this I'm proposing not just expressing the whole of the 650 field as a LCSH and checking for it's existence on id.loc.gov, but also checking for various combinations of topical term and subdivisions from the 650 field. So for any 650 field I'm proposing we should check on id.loc.govfor labels matching: check(650$$a) -- topical term check(650$$b) -- topical term check(650$$v) -- Form subdivision check(650$$x) -- General subdivision check(650$$y) -- Chronological subdivision check(650$$z) -- Geographic subdivision Then using whichever elements exist (all as topical terms): Check(650$$a--650$$b) Check(650$$a--650$$v) Check(650$$a--650$$x) Check(650$$a--650$$y) Check(650$$a--650$$z) Check(650$$a--650$$b--650$$v) Check(650$$a--650$$b--650$$x) Check(650$$a--650$$b--650$$y) Check(650$$a--650$$b--650$$z) Check(650$$a--650$$b--650$$x--650$$v) Check(650$$a--650$$b--650$$x--650$$y) Check(650$$a--650$$b--650$$x--650$$z) Check(650$$a--650$$b--650$$x--650$$z--650$$v) Check(650$$a--650$$b--650$$x--650$$z--650$$y) Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v) As an example given: 650 00 $$aPopular music$$xHistory$$y20th century We would be checking id.loc.gov for 'Popular music' as a topical term (http://id.loc.gov/authorities/sh85088865) 'History' as a general subdivision (http://id.loc.gov/authorities/sh99005024 ) '20th century' as a chronological subdivision ( http://id.loc.gov/authorities/sh2002012476) 'Popular music--History and criticism' as a topical term ( http://id.loc.gov/authorities/sh2008109787) 'Popular music--20th century' as a topical term (not authorised) 'Popular music--History and criticism--20th century' as a topical term (not authorised) And expressing all matches in our RDF. My understanding of LCSH isn't what it might be - but the ordering of terms in the combined string checking is based on what I understand to be the usual order - is this correct, and should we be checking for alternative orderings? Thanks Owen -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Thanks Tom - very helpful Perhaps this suggests that rather using an order we should check combinations while preserving the order of the original 650 field (I assume this should in theory be correct always - or at least done to the best of the cataloguers knowledge)? So for: 650 _0 $$a Education $$z England $$x Finance. check: Education England (subdiv) Finance (subdiv) Education--England Education--Finance Education--England--Finance While for 650 _0 $$a Education $$x Economic aspects $$z England we check Education Economic aspects (subdiv) England (subdiv) Education--Economic aspects Education--England Education--Economic aspects--England - It is possible for other orders in special circumstances, e.g. with language dictionaries which can go something like: 650 _0 $$a English language $$v Dictionaries $$x Albanian. This possiblity would also covered by preserving the order - check: English Language Dictionaries (subdiv) Albanian (subdiv) English Language--Dictionaries English Language--Albanian English Language--Dictionaries-Albanian Creating possibly invalid headings isn't necessarily a problem - as we won't get a match on id.loc.gov anyway. (Instinctively English Language--Albanian doesn't feel right) - Some of these are repeatable, so you can have too $$vs following each other (e.g. Biography--Dictionaries); two $$zs (very common), as in Education--England--London; two $xs (e.g. Biography--History and criticism). OK - that's fine, we can use each individually and in combination for any repeated headings I think - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of them in the database? Hadn't checked until you asked! We have 1 in the dataset in question (c.30k records) :) I'm not sure how possible it would be to come up with a definitive list of (reasonable) possible combinations. You are probably right - but I'm not too bothered about aiming at 'definitive' at this stage anyway - but I do want to get something relatively functional/useful Tom Thomas Meehan Head of Current Cataloguing University College London Library Services Owen Stephens wrote: We are working on converting some MARC library records to RDF, and looking at how we handle links to LCSH (id.loc.gov http://id.loc.gov) - and I'm looking for feedback on how we are proposing to do this... I'm not 100% confident about the approach, and to some extent I'm trying to work around the nature of how LCSH interacts with RDF at the moment I guess... but here goes - I would very much appreciate feedback/criticism/being told why what I'm proposing is wrong: I guess what I want to do is preserve aspects of the faceted nature of LCSH in a useful way, give useful links back to id.loc.gov http://id.loc.gov where possible, and give access to a wide range of facets on which the data set could be queried. Because of this I'm proposing not just expressing the whole of the 650 field as a LCSH and checking for it's existence on id.loc.gov http://id.loc.gov, but also checking for various combinations of topical term and subdivisions from the 650 field. So for any 650 field I'm proposing we should check on id.loc.gov http://id.loc.gov for labels matching: check(650$$a) -- topical term check(650$$b) -- topical term check(650$$v) -- Form subdivision check(650$$x) -- General subdivision check(650$$y) -- Chronological subdivision check(650$$z) -- Geographic subdivision Then using whichever elements exist (all as topical terms): Check(650$$a--650$$b) Check(650$$a--650$$v) Check(650$$a--650$$x) Check(650$$a--650$$y) Check(650$$a--650$$z) Check(650$$a--650$$b--650$$v) Check(650$$a--650$$b--650$$x) Check(650$$a--650$$b--650$$y) Check(650$$a--650$$b--650$$z) Check(650$$a--650$$b--650$$x--650$$v) Check(650$$a--650$$b--650$$x--650$$y) Check(650$$a--650$$b--650$$x--650$$z) Check(650$$a--650$$b--650$$x--650$$z--650$$v) Check(650$$a--650$$b--650$$x--650$$z--650$$y) Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v) As an example given: 650 00 $$aPopular music$$xHistory$$y20th century We would be checking id.loc.gov http://id.loc.gov for 'Popular music' as a topical term ( http://id.loc.gov/authorities/sh85088865) 'History' as a general subdivision ( http://id.loc.gov/authorities/sh99005024) '20th century' as a chronological subdivision ( http://id.loc.gov/authorities/sh2002012476) 'Popular music--History and criticism' as a topical term ( http://id.loc.gov/authorities/sh2008109787) 'Popular music--20th century' as a topical term (not authorised) 'Popular music--History and criticism--20th century' as a topical term (not authorised) And expressing all matches in our RDF. My understanding of LCSH isn't what it might be - but the ordering of terms in the combined string checking is based on what I understand to be the usual order - is this correct, and should we be checking for alternative orderings? Thanks Owen -- Owen
Re: [CODE4LIB] LCSH and Linked Data
*... Creating possibly invalid headings isn't necessarily a problem - as we won't get a match on id.loc.gov anyway ... *LCSH headings reflect materials cataloged by LC. You may have materials at your UK (or Albania, Tunisia, etc.) which were not cataloged yet at LC, thus nothing yet to match on. *Ya'aqov*
Re: [CODE4LIB] LCSH and Linked Data
After having done numerous matching and mapping projects, there are some issues that you will face with your strategy, assuming I understand it correctly. Trying to match a heading starting at the left most subfield and working forward will not necessarily produce correct results when matching against the LCSH authority file. Using your example: 650 _0 $a Education $z England $x Finance is a good example of why processing the heading starting at the left will not necessarily produce the correct results. Assuming I understand your proposal you would first search for: 150 __ $a Education and find the heading with LCCN sh85040989. Next you would look for: 181 __ $z England and you would NOT find this heading in LCSH. This is issue one. Unfortunately, LC does not create 181 in LCSH (actually I think there are some, but not if it’s a name), instead they create a 781 in the name authority record. So to find the corresponding $z England we need to go to the name authority record 150 England with LCCN n82068148. Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org. The second issue using your example is that you want to find the “longest” matching heading. While the pieces parts are there, so is the enumerated authority heading: 150 __ $a Education $z England as LCCN sh2008102746. So your heading is actually composed of the enumerated headings: sh2008102746150 __ $a Education $z England sh2002007885180 __ $x Finance and not the separate headings: sh85040989 150 __ $a Education n82068148 150 __ $a England sh2002007885180 __ $x Finance Although one could argue that either analysis is correct depending upon what you are trying to accomplish. The matching algorithm I have used in the past contains two routines. The first f(a) will accept a heading as a parameter, scrub the heading, e.g., remove unnecessary subfield like $0, $3, $6, $8, etc. and do any other pre-processing necessary on the heading, then call the second function f(b). The f(b) function accepts a heading as a parameter and recursively calls itself until it builds up the list LCCNs that comprise the heading. It first looks for the given heading when it doesn’t find it, it removes the *last* subfield and recursively calls itself, otherwise it appends the found LCCN to the returned list and exits. This strategy will find the longest match. The headings are search against an augmented LCSH database where the 781 name authority records have been transformed into 181 records keeping the LCCN of the name authority record. Not ideal, but it generally works well. Adjust algorithm per need. Hope this helps, Andy. From: public-lld-requ...@w3.org [mailto:public-lld-requ...@w3.org] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 08:11 To: Thomas Meehan Cc: Code for Libraries; public-lld; f.zabl...@open.ac.uk Subject: Re: LCSH and Linked Data Importance: Low Thanks Tom - very helpful Perhaps this suggests that rather using an order we should check combinations while preserving the order of the original 650 field (I assume this should in theory be correct always - or at least done to the best of the cataloguers knowledge)? So for: 650 _0 $$a Education $$z England $$x Finance. check: Education England (subdiv) Finance (subdiv) Education--England Education--Finance Education--England--Finance While for 650 _0 $$a Education $$x Economic aspects $$z England we check Education Economic aspects (subdiv) England (subdiv) Education--Economic aspects Education--England Education--Economic aspects--England - It is possible for other orders in special circumstances, e.g. with language dictionaries which can go something like: 650 _0 $$a English language $$v Dictionaries $$x Albanian. This possiblity would also covered by preserving the order - check: English Language Dictionaries (subdiv) Albanian (subdiv) English Language--Dictionaries English Language--Albanian English Language--Dictionaries-Albanian Creating possibly invalid headings isn't necessarily a problem - as we won't get a match on id.loc.gov anyway. (Instinctively English Language--Albanian doesn't feel right) - Some of these are repeatable, so you can have too $$vs following each other (e.g. Biography--Dictionaries); two $$zs (very common), as in Education--England--London; two $xs (e.g. Biography--History and criticism). OK - that's fine, we can use each individually and in combination for any repeated headings I think - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of them in the database? Hadn't checked until you asked! We have 1 in the dataset in question (c.30k records) :) I'm not sure how possible it would be to come up with a definitive list of (reasonable)
Re: [CODE4LIB] LCSH and Linked Data
Andrew, please see *[YZ]* below *181 __ $z England and you would NOT find this heading in LCSH. This is issue one. Unfortunately, LC does not create 181 in LCSH (actually I think there are some, but not if it’s a name), instead they create a 781 in the name authority record. * *[YZ]* MARC/LCSH distinguishes between names 100 and geographic names 151 in their authority record. You'll find all geographic names if you look for 151 records. *So to find the corresponding $z England we need to go to the name authority record 150 England with LCCN n82068148.* *[YZ]* *LCCN n82068148* authority record is for 151 England. Also Andrew, are you indicating there is a difference between the form of geographic name in 151$a and 781$z -- ? *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. makes little sense to mix personal/corporate names with geographic ones. Let's see what Ralph comments. *Ya'aqov*
Re: [CODE4LIB] LCSH and Linked Data
Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Actually, it appears to depend on whose Authority record you're looking at. The Canadians, Australians, and Israelis have it as a CorporateName (110), as do the French (210 - unimarc); LC and the Germans say it's a Geographic Name. In the case of LCSH, therefore, it would be a 151. Regardless, it is in VIAF. Warmly, Kevin From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of LeVan,Ralph [le...@oclc.org] Sent: Thursday, April 07, 2011 11:34 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Ralph, Owen's pointing to a list where corporate (110) and geographic names (151) are mixed. Thanks Owen, I haven't seen that the first time. I guess you got that mixed 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround. *Ya'aqov* On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- *ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456 *
Re: [CODE4LIB] LCSH and Linked Data
I'm out of my depth here :) But... this is what I understood Andrew to be saying. In this instance (?because 'England' is a Name Authority?) rather than create a separate LCSH authority record for 'England' (as the 151), rather the LCSH subdivision is recorded in the 781 of the existing Name Authority record. Searching on http://authorities.loc.gov for England, I find an Authorised heading, marked as a LCSH - but when I go to that record what I get is the name authority record n 82068148 - the name authority record as represented on VIAF by http://viaf.org/viaf/142995804/ (which links to http://errol.oclc.org/laf/n%20%2082068148.html) Just as this is getting interesting time differences mean I'm about to head home :) Owen On Thu, Apr 7, 2011 at 4:34 PM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Kevin, England exists as a corporate body and also as a geographic name. BOTH entities exist in LCSH. This doesn't apply to all geographic names, only to some. Andrew pointed us to VIAF, but I expect his algorithm to limit the search for LCSH. Let's wait for his reply. *Ya'aqov* *On Thu, Apr 7, 2011 at 10:44 AM, Ford, Kevin k...@loc.gov wrote: * * Actually, it appears to depend on whose Authority record you're looking at. The Canadians, Australians, and Israelis have it as a CorporateName (110), as do the French (210 - unimarc); LC and the Germans say it's a Geographic Name. In the case of LCSH, therefore, it would be a 151. Regardless, it is in VIAF. Warmly, Kevin * * *
Re: [CODE4LIB] LCSH and Linked Data
More confusing yet, if you look at the raw XML for that record (add viaf.xml to the end of the URI and then view source) you’ll see that the name type is indeed Geographic. My boss is puzzled. Ralph From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] Sent: Thursday, April 07, 2011 11:56 AM To: Code for Libraries Cc: LeVan,Ralph; Houghton,Andrew Subject: Re: [CODE4LIB] LCSH and Linked Data Ralph, Owen's pointing to a list where corporate (110) and geographic names (151) are mixed. Thanks Owen, I haven't seen that the first time. I guess you got that mixed 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround. Ya'aqov On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- ya'aqovZISO | yaaq...@gmail.com | 856 217 3456
Re: [CODE4LIB] LCSH and Linked Data
On 4/7/2011 10:46 AM, Houghton,Andrew wrote: to go to the name authority record 150 England with LCCN n82068148. Currently under id.loc.gov you will not find name authority records, If this would change, so name authority record elements used in 6xx subject cataloging were in id.loc.gov, it would make powerful use of id.loc.gov much more feasible. Is there anyone at LC this suggestion/request could be sent to, possibly en masse? I do sort of have the impression it's been an item of contention inside LC. Jonathan
Re: [CODE4LIB] LCSH and Linked Data
Jonathan, hi and thanks, 1. I believe id.loc.gov includes a list of MARC countries and a list for geographic areas (based on the geographic names in 151 fields. 2. cataloging rules instruct catalogers to use THOSE very name forms in 151 $a when a subject can be divided (limited) geographically using $z. 3. Not all subjects which can be divided geographically will have the geographical subdivision immediately after the subject. There could be 2 different sequences: 650 $a Picket lines $z Ohio 650 $a Picket Lines $x Economical aspects $z Ohio (where/when does the geographical subdivision follow immediately or not $a is part of the rules LC catalogers observe to the dot). There could be also two geographical subdivisions following each other 650 $a Picket lines $zOhio $z Columbus Oh yeah, these record elements could be used powerfully for our users. *Ya'aqov* *On Thu, Apr 7, 2011 at 11:29 AM, Jonathan Rochkind rochk...@jhu.eduwrote: * *On 4/7/2011 10:46 AM, Houghton,Andrew wrote: * * to go to the name authority record 150 England with LCCN n82068148. Currently under id.loc.gov you will not find name authority records, * * * * If this would change, so name authority record elements used in 6xx subject cataloging were in id.loc.gov, it would make powerful use of id.loc.gov much more feasible. Is there anyone at LC this suggestion/request could be sent to, possibly en masse? I do sort of have the impression it's been an item of contention inside LC. Jonathan *
Re: [CODE4LIB] LCSH and Linked Data
1. No disagreement, except that some 151 appears in the name file and some appear in the subject file: n82068148 008/11=a 008/14=a 151 _ _ $a England sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco Mountains (Mexico) 2. Yes, see n5359 151 _ _ $a Sonora (Mexico : State) 751 _ _ $z Mexico $z Sonora (State) 3. Oops, my apologies to my VIAF colleagues, I believe that geographic names are in the works… or at least I was under the impression they were from a discussion I had last night. From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] Sent: Thursday, April 07, 2011 11:18 To: Code for Libraries; Houghton,Andrew Cc: LeVan,Ralph Subject: Re: [CODE4LIB] LCSH and Linked Data Andrew, please see [YZ] below 181 __ $z England and you would NOT find this heading in LCSH. This is issue one. Unfortunately, LC does not create 181 in LCSH (actually I think there are some, but not if it’s a name), instead they create a 781 in the name authority record. [YZ] MARC/LCSH distinguishes between names 100 and geographic names 151 in their authority record. You'll find all geographic names if you look for 151 records. So to find the corresponding $z England we need to go to the name authority record 150 England with LCCN n82068148. [YZ] LCCN n82068148 authority record is for 151 England. Also Andrew, are you indicating there is a difference between the form of geographic name in 151$a and 781$z -- ? Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org. [YZ] viaf.org does not include geographic names. I just checked there England. makes little sense to mix personal/corporate names with geographic ones. Let's see what Ralph comments. Ya'aqov
Re: [CODE4LIB] LCSH and Linked Data
That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction... Andy. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ya'aqov Ziso Sent: Thursday, April 07, 2011 11:56 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Ralph, Owen's pointing to a list where corporate (110) and geographic names (151) are mixed. Thanks Owen, I haven't seen that the first time. I guess you got that mixed 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround. *Ya'aqov* On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- *ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456 *
Re: [CODE4LIB] LCSH and Linked Data
*Andrew, as always, most helpful news, kindest thanks! more [YZ] below:* *1. No disagreement, except that some 151 appears in the name file and some appear in the subject file:* *n82068148 008/11=a 008/14=a 151 _ _ $a England* *sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco Mountains (Mexico)* *[YZ] would it be possible then to use both files as sources and create one file for geographical names for our purpose(s)?* *2. Yes, see n5359* *151 _ _ $a Sonora (Mexico : State)* *751 _ _ $z Mexico $z Sonora (State)* ***[YZ] Both stand for a distinct cataloging usage. Jonathan's suggestion to consult LC may answer the question of which field/when to use for geographical names * *3. Oops, my apologies to my VIAF colleagues, I believe that geographic names are in the works… * ***[YZ] inshAllah!* * * *4. That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction...* *[YZ] Exactly. This distinction called for creating both a 110 AND a 151. But we are talking about 151. The case where there is both a 110 and a 151 does NOT apply to geographic names, only to some.* * * *[YZ] VIAF would be helpful to provide a way to limit geographical names ONLY to 151 names and their cross references.*
Re: [CODE4LIB] LCSH and Linked Data
On Thu, Apr 7, 2011 at 12:58 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: 1. I believe id.loc.gov includes a list of MARC countries and a list for geographic areas (based on the geographic names in 151 fields. 2. cataloging rules instruct catalogers to use THOSE very name forms in 151 $a when a subject can be divided (limited) geographically using $z. Yeah, this could get ugly pretty fast. It's a bit unclear to me what the distinction is between identical terms in both the geographic areas and the country codes (http://id.loc.gov/vocabulary/geographicAreas/e-uk-en http://id.loc.gov/vocabulary/countries/enk). Well, in LC's current representation, there *is* no distinction, they're both just skos:Concepts that (by virtue of skos:exactMatch) effectively interchangeable. See also http://id.loc.gov/vocabulary/geographicAreas/fa and http://id.loc.gov/authorities/sh85009230#concept. You have a single institution minting multiple URIs for what is effectively the same thing (albeit in different vocabularies), although, ironically, nothing points at any actual real world objects. VIAF doesn't do much better in this particular case (there are lots of examples where it does, mind you): http://viaf.org/viaf/142995804 (see: http://viaf.org/viaf/142995804/rdf.xml). We have all of these triangulations around the concept of England or Atlas mountains, but we can't actually refer to England or the Atlas mountains. Also, I am not somehow above this problem, either. With the linked MARC codes lists (http://purl.org/NET/marccodes/), I had to make a similar decision, I just chose to go the opposite route: define them as things, rather than concepts (http://purl.org/NET/marccodes/gacs/fa#location, http://purl.org/NET/marccodes/gacs/e-uk-en#location, http://purl.org/NET/marccodes/countries/enk#location, etc.), which presents its own set of problems (http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing no matter how liberal your definition). At some point, it's worth addressing what these things actually *are* and if, indeed, they are effectively the same thing, if it's worth preserving these redundancies, because I think they'll cause grief in the future. -Ross.
Re: [CODE4LIB] LCSH and Linked Data
My bad in (2) that should have been 781 and it’s LC’s way to indicate the geographic form used for a 181 when a heading may be geographically subdivided. The point is, when you are trying to do authority matching/mapping you have to match against the 181’s in LCSH *and* the 781’s in NAF. This is an oddity of the LC authority file that people may not be aware of, hence why I pointed it out. As I indicated, in my mapping projects I have taken LCSH and added new 181 records based on the 781’s found in NAF. This allows the matching process to work reasonably well without dragging in the entire NAF for searching and matching. However, this still doesn’t give the complete the picture since in LCSH the *construction rules* allow you to use things in the name authority file as subjects, ugh. Effectively, LCSH isn’t useful by itself when trying to match/decompose 6XX in bibliographic records. You really need access to NAF as well. Things get worst when talking about the Children’s headings… since you can pull from both LCSH and NAF, ugh-ugh. While LC would like us to think of the authority file as three separate authorities, LCSH, LCSHac, NAF, in reality the dependencies require you to ignore the thesaurus boundaries and just treat the entire authority file as one thesauri. We struggled with this in the terminology services project, especially when the references in one thesaurus cross over into the other thesauri. Andy. From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] Sent: Thursday, April 07, 2011 13:47 To: Code for Libraries; Houghton,Andrew Cc: Hickey,Thom; LeVan,Ralph Subject: Re: [CODE4LIB] LCSH and Linked Data Andrew, as always, most helpful news, kindest thanks! more [YZ] below: 1. No disagreement, except that some 151 appears in the name file and some appear in the subject file: n82068148 008/11=a 008/14=a 151 _ _ $a England sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco Mountains (Mexico) [YZ] would it be possible then to use both files as sources and create one file for geographical names for our purpose(s)? 2. Yes, see n5359 151 _ _ $a Sonora (Mexico : State) 751 _ _ $z Mexico $z Sonora (State) [YZ] Both stand for a distinct cataloging usage. Jonathan's suggestion to consult LC may answer the question of which field/when to use for geographical names 3. Oops, my apologies to my VIAF colleagues, I believe that geographic names are in the works… [YZ] inshAllah! 4. That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction... [YZ] Exactly. This distinction called for creating both a 110 AND a 151. But we are talking about 151. The case where there is both a 110 and a 151 does NOT apply to geographic names, only to some. [YZ] VIAF would be helpful to provide a way to limit geographical names ONLY to 151 names and their cross references.
Re: [CODE4LIB] LCSH and Linked Data
On 4/7/2011 1:21 PM, Houghton,Andrew wrote: That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction... This starts getting into categorization philosophy type issues, and reveal that LCSH isn't entirely consistent in it's modelling (as virtually no classification will be without being extraordinarily complex, the world is a messy place), along the lines Ross was talking about too, but I think it can be explicated a bit I'm not sure it's quite true to say that a 151 (corresponding to a 6xx $v subdivision) is a geographic place as entirely distinct from a 'country entity'. I might instead say the 151 is meant to be a sort of geo-historical place, that does take into account, well, either political entities or general contemporary conceptions of place distinctions at particular historical times. While the 110 is about a collective-body _actor_, a government All of these are $v's, which presumably are authorized by authority 151s: Soviet Union Russia Russia (Federation) Former Soviet Republics typically assigned for works about that area of the world at the time that area of the world was known as a particular thing, heh. Or: Italy / Roman Empire Byzantine Empire / Ottoman Empire / Turkey / Balkan Peninsula Now, all those things aren't the _exact_ same longitude and lattitude, but with significant overlap, different in different cases. At any rate, 151s aren't purely a name for a geographic boundary on the planet, they're some kind of, um, geo-political-historical concept. Compare to the terms you can put in an 048, which ARE meant to be history and political entity free. e-ur == Russia. Russian Empire. Soviet Union. Former Soviet Republics. Yeah, all of em together. Nevermind they dont' have exactly the same boundaries. (And of course the boundaries of any one of em can and did change over time). At least 048's MOSTLY try to be purely geographical, free of historical/political context, but then sometimes they go ahead and add weird ones that can't possibly follow that principle, like d= Developing Countries or dd=Developed Countries. But yeah, then we've got the 110 England, which isn't a geographical concept AT ALL, it refers really to the Government/political _actor_ (as a collective body) known as England. Which happens to have controlled or claimed certain geographic territory for itself at different times, but the 110 England isn't about the geographic territory, it's about the collective-body actor. (Does that even still exist? What is it's contemporary or historical relationship to the concepts United Kingdom and Great Britain, are those political actors too?) Somewhere I read an article about the particular messiness of geographic vocabularies, as discussed above, I forget where. Wish I could find it again, it would be helpful here. But modelling the real world with a subject vocabulary is inherently messy, especially so with geographic classification like this that is meant to somehow cover all of recorded human history too. The map is not the territory.