Re: [CODE4LIB] LCSH and Linked Data
2011/4/8 Karen Miller > I hope I'm not pointing out the obvious, That made me laugh so hard I almost ruptured something. Thank you so much for such a complete (please, god, tell me it's complete...) explanation. It's a little depressing, but at least now I now why I'm depressed :-) -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] LCSH and Linked Data
OK, as a cataloger who has been confused by the jurisdictional/place name distinction, I'm going to jump in here. Whether "England" means the free-floating geographic entity or the country is not quite unknowable -- it depends on the MARC codes that accompany it. The brief answer is this: a field used in a 651$a or a $z should match a 151 in the LC authorities. If the MARC field is 151 or 651 (let's just say x51), then the $a should match a 151 in the authority file. MARC subfield z ($z) is always a geographic subdivision and should match a 151. Here's where it gets tricky: If the MARC field is a x10 (110, 610, 710 – corporate bodies), then the $a should match a 110 or a 151 in the authority file. If the first indicator of such a MARC field is a 1, then it will probably match a 151 – first indicator "1" means that a heading is jurisdictional and may match a 151. For example: 110 1_ United States. ‡b Dept. of Agriculture There is a 151 United States in the LC authorities, but no 110 United States yet it can be used as a corporate body name in a bib. record with a 110 field. This is further confused by the VIAF, in which some national libraries have established the United States as a corporate body (110). At the risk of confusing things, I'd suggest looking at countries like the United States, Kenya or Canada as examples. England is not a great example because it's not a current jurisdiction name - there is a note in the LC authority record that reads "Heading for England valid as a jurisdiction before 1536 only. Use "(England)" as qualifier for places (23.4D) and for nongovernment bodies (24.4C2)." It is established as a 110 because it *used to be* a jurisdiction name and would be valid for works issued by the government prior to 1536. Obviously this note is of no use to a machine, but it explains why we aren't seeing it used as a jurisdiction (a corporate body) with subordinate bodies. I hope I'm not pointing out the obvious, but the use of names that appear in 151 fields in the authority file as 110 fields in bibliographic records confused me for a very long time; our authorities librarian explained it to me at least twice before the proverbial light bulb went on for me. Karen Karen D. Miller Monographic/Digital Projects Cataloger Bibliographic Services Dept. Northwestern University Library Evanston, IL k-mill...@northwestern.edu 847-467-3462 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill Dueber Sent: Friday, April 08, 2011 1:40 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data On Fri, Apr 8, 2011 at 1:50 PM, Shirley Lincicum wrote: > Ross is essentially correct. Education is an authorized subject term > that can be subdivided geographically. Finance is a free-floating > subdivision that is authorized for use under subject terms that > conform to parameters given in the scope notes in its authority record > (680 fields), but it cannot be subdivided geographically. England is > an authorized geographic subject term that can be added to any heading > that can be subdivided geographically. Wait, so is it possible to know if "England" means the free-floating geographic entity or the country? Or is that just plain unknowable. Suddenly, my mouth is hungering for something gun-flavored. I know OCLC did some work trying to dis-integrate different types of terms with the FAST stuff, but it's not clear to me how I can leverage that (or anything else) to make LCSH at all useful as a search target or (even better) facet. Has anyone done anything with it?
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 1:50 PM, Shirley Lincicum wrote: > Ross is essentially correct. Education is an authorized subject term > that can be subdivided geographically. Finance is a free-floating > subdivision that is authorized for use under subject terms that > conform to parameters given in the scope notes in its authority record > (680 fields), but it cannot be subdivided geographically. England is > an authorized geographic subject term that can be added to any heading > that can be subdivided geographically. Wait, so is it possible to know if "England" means the free-floating geographic entity or the country? Or is that just plain unknowable. Suddenly, my mouth is hungering for something gun-flavored. I know OCLC did some work trying to dis-integrate different types of terms with the FAST stuff, but it's not clear to me how I can leverage that (or anything else) to make LCSH at all useful as a search target or (even better) facet. Has anyone done anything with it?
Re: [CODE4LIB] LCSH and Linked Data
I'm a cataloger who has been following this discussion with interest, but not necessarily understanding all of it. I'll try to add what I can regarding the rules for constructing LCSH headings. > My understanding is that Education--England--Finance *is* authorized, > because Education--Finance is and England is a free-floating > geographic subdivision. Because it's also an authorized heading, > "Education--England--Finance" is, in fact, an authority. The problem > is that free-floating subdivisions cause an almost infinite number of > permutations, so there aren't LCCNs issued for them. Ross is essentially correct. Education is an authorized subject term that can be subdivided geographically. Finance is a free-floating subdivision that is authorized for use under subject terms that conform to parameters given in the scope notes in its authority record (680 fields), but it cannot be subdivided geographically. England is an authorized geographic subject term that can be added to any heading that can be subdivided geographically. Thus, Education -- England -- Finance is a valid LCSH heading, whereas Education -- Finance -- England would not be. This is wonky, and it's stuff like this that makes LCSH so unwieldy and difficult to validate, even for humans who actually have the capacity to learn and adjust to all of the various inconsistencies. I don't know how relevant it is to this particular discussion, but going forward I'm not sure how important it is to validate LCSH headings. I really appreciate developers who seek to preserve the semantic relationships present in the headings as much as possible; I believe many of them have value. But aren't there ways to preserve/extract that value without getting too bogged down in the inconsistent left-to-right structure of the existing headings? I hope this helps, at least a little bit. I'd be happy to answer additional questions. Shirley Shirley Lincicum Frustrated Cataloger
Re: [CODE4LIB] LCSH and Linked Data
Thanks Ross - I have been pushing some cataloguing folk to comment on some of this as well (and have some feedback) - but I take the point that wider consultation via autocat could be a good idea. (for some reason this makes me slightly nervous!)s In terms of whether Education--England--Finance is authorised or not - I think I took from Andy's response that it wasn't, but also looking at it on authorities.loc.gov it isn't marked as 'authorised'. Anyway - the relevant thing for me at this stage is that I won't find a match via id.loc.gov - so I can't get a URI for it anyway. There are clearly quite a few issues with interacting with LCSH as Linked Data at the moment - I'm not that keen on how this currently works, and my reaction to the MADS/RDF ontology is similar to that of Bruce D'Arcus (see http://metadata.posterous.com/lcs-madsrdf-ontology-and-the-future-of-the-se), but on the otherhand I want to embrace the opportunity to start joining some stuff up and seeing what happens :) Owen On Fri, Apr 8, 2011 at 3:10 PM, Ross Singer wrote: > On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens wrote: > > > Then obviously I lose the context of the full heading - so I also want to > > look for > > Education--England--Finance (which I won't find on id.loc.gov as not > > authorised) > > > > At this point I could stop, but my feeling is that it is useful to also > look > > for other combinations of the terms: > > > > Education--England (not authorised) > > Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008 > ) > > > > My theory is that as long as I stick to combinations that start with a > > topical term I'm not going to make startlingly inaccurate statements? > > I would definitely ask this question somewhere other than Code4lib > (autocat, maybe?), since I think the answer is more complicated than > this (although they could validate/invalidate your assumption about > whether or not this approach would get you "close enough"). > > My understanding is that Education--England--Finance *is* authorized, > because Education--Finance is and England is a free-floating > geographic subdivision. Because it's also an authorized heading, > "Education--England--Finance" is, in fact, an authority. The problem > is that free-floating subdivisions cause an almost infinite number of > permutations, so there aren't LCCNs issued for them. > > This is where things get super-wonky. It's also the reason I > initially created lcsubjects.org, specifically to give these (and, > ideally, locally controlled subject headings) a publishing > platform/centralized repository, but it quickly grew to be more than > "just a side project". There were issues of how the data would be > constructed (esp. since, at the time, I had no access to the NAF), how > to reconcile changes, provenance, etc. Add to the fact that 2 years > ago, there wasn't much linked library data going on, it was really > hard to justify the effort. > > But, yeah, it would be worth running your ideas by a few catalogers to > see what they think. > > -Ross. > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 10:10 AM, Ross Singer wrote: > But, yeah, it would be worth running your ideas by a few catalogers to > see what they think. > And if anyone does this...please please *please* write it up! -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] LCSH and Linked Data / Ross
*Hi and thank you Ross, Jonathan, and Andy, I do wish someone from LC would answer Jonathan's questions for all codes and geographic subdivision or subject implications. There's so much self-inflicted pain I can go through trying to revive my cataloging days. Here are some clarifications though: List of Geographic Areas is the macro list, whereby List of countries includes only countries as a subset from the macro list. MARC Code List for Countries [choice of a MARC code is generally related to information in field 260 (Publication, Distribution, etc. (Imprint)). The code recorded in 008/15-17 is used in conjunction with field 044 (Country of Producer Code) when more than one code is appropriate to an item.] MARC Geographic Area Codes are codes entered (according to geographic names in the 6xx fields) in field 043.* * * *The Country Codes and Geographic Area Codes are entered bureaucratically, bypassing Jonathan's refined distinctions. These tasks are outsourced to agencies separate from the catalogers assigning LCSH.* * * *Now it starts getting uglier, since upkeep for these lists differs in time and agency. Possibly new territory names are done now by NATO ... You would expect to see the same name in a code list and in a geographic name (151) . Sometimes you won't. Sometimes you'll see redundancies which confuse even more. So since:* 1. *LCSH has mistakes, inconsistencies* 2. *LC doesn't talk to CODE4LIB to answer our questions* 3. *OCLC will not talk to LC on our behalf* *we can create the geographic name list(s) we need. Since we know that 6xx forms for geographic names appear in 151 and 781 fields, we can create an index for those names for matching to 6xx in LCSH. Andrew, please complete/comment-on this list.* * * *Ya'aqov* * * * *
Re: [CODE4LIB] LCSH and Linked Data
On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens wrote: > Then obviously I lose the context of the full heading - so I also want to > look for > Education--England--Finance (which I won't find on id.loc.gov as not > authorised) > > At this point I could stop, but my feeling is that it is useful to also look > for other combinations of the terms: > > Education--England (not authorised) > Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008) > > My theory is that as long as I stick to combinations that start with a > topical term I'm not going to make startlingly inaccurate statements? I would definitely ask this question somewhere other than Code4lib (autocat, maybe?), since I think the answer is more complicated than this (although they could validate/invalidate your assumption about whether or not this approach would get you "close enough"). My understanding is that Education--England--Finance *is* authorized, because Education--Finance is and England is a free-floating geographic subdivision. Because it's also an authorized heading, "Education--England--Finance" is, in fact, an authority. The problem is that free-floating subdivisions cause an almost infinite number of permutations, so there aren't LCCNs issued for them. This is where things get super-wonky. It's also the reason I initially created lcsubjects.org, specifically to give these (and, ideally, locally controlled subject headings) a publishing platform/centralized repository, but it quickly grew to be more than "just a side project". There were issues of how the data would be constructed (esp. since, at the time, I had no access to the NAF), how to reconcile changes, provenance, etc. Add to the fact that 2 years ago, there wasn't much linked library data going on, it was really hard to justify the effort. But, yeah, it would be worth running your ideas by a few catalogers to see what they think. -Ross.
Re: [CODE4LIB] MARC magic for file
http://i.imgur.com/6WtA0.png (Sorry, it's Friday. Also, blame dchud for the idea.) -Sean On 4/6/11 4:53 PM, "Mike Taylor" wrote: > On 6 April 2011 19:53, Jonathan Rochkind wrote: >> On 4/6/2011 2:43 PM, William Denton wrote: >>> >>> "Validity" does mean something definite ... but Postel's Law is a good >>> guideline, especially with the swamp of bad MARC, old MARC, alternate >>> MARC, that's out there. Valid MARC is valid MARC, but if---for the sake >>> of file and its magic---we can identify technically invalid but still >>> usable MARC, that's good. >> >> Hmm, accept in the case of Web Browsers, I think general consensus is >> Postel's law was not helpful. These days, most people seem to think that >> having different browsers be tolerant of invalid data in different ways was >> actually harmful rather than helpful to inter-operability (which is >> theoretically the goal of Postel's law), and that's not what people do >> anymore in web browser land, at least not to the extremes they used to do >> it. > > But the idea that browsers should be less permissive in what they > accept is a modern one that we now have the luxury of only because > adherence to Postel's law in the early days of the Web allowed it to > become ubiquitous. Though it's true, as Harvey Thompson has observed > that "it's difficult to retro-fit correctness", Clay Shirky was also > very right when he pointed out that "You cannot simultaneously have > mass adoption and rigor". If browsers in 1995 had been as pedantic as > the browsers of 2011 (rightly) are, we wouldn't even have the Web; or > if it existed at all it would just be a nichey thing that a few > scientists used to make their publications available to each other. > > So while I agree that in the case of HTML we are right to now be > moving towards more rigorous demands of what to accept (as well, of > course, as being conservative in what we emit), I don't think we could > have made the leap from nothing to modern rigour. > > -- Mike
[CODE4LIB] Win $450 for the best Personal Data Mashup!
Of possible interest. -Jodi Begin forwarded message: > From: Laura Dragan > Date: 8 April 2011 13:11:19 GMT+01:00 > To: deri.ie-resea...@lists.deri.org > Subject: [Deri.ie-research] Win 450USD for the best Personal Data Mashup! > > Personal Data Mashup Challenge > http://semanticweb.org/wiki/PSD2011Challenge > > > Your computer is overflowing with applications for managing your data: > your photos, your documents, your calendar, your email, etc. On the > other side of your Internet connection, the web is overflowing with > services for creating, managing, and sharing many of the same things. > > We believe that Semantic Technology can be used for linking, > categorizing and combining the data from all these sources, giving you > an overall view that no single application can match. > > If you agree - come to PSD2011 [1] - show us how and get rich(*) and > famous! > > The challenge prize is kindly sponsored by the Open Semantic > Collaboration Architecture Foundation (OSCAF) [2]. > > Submission deadline is 1st June 2011. The winner will be announced > on the 26th June, at PSD2011. > > > Regards, > The PSD2011 organizing committee > > > [1] http://semanticweb.org/wiki/PSD2011 > [2] http://www.oscaf.org/
Re: [CODE4LIB] LCSH and Linked Data
Thanks for all the information and discussion. I don't think I'm familiar enough with Authority file formats to completely comprehend - but I certainly understand the issues around the question of 'place' vs 'histo-geo-poltical entity'. Some of this makes me worry about the immediate applicability of the LC Authority files in the Linked Data space - someone said to me recently 'SKOS is just a way of avoiding dealing with the real semantics' :) Anyway - putting that to one side, the simplest approach for me at the moment seems to only look at authorised LCSH as represented on id.loc.gov. Picking up on Andy's first response: On Thu, Apr 7, 2011 at 3:46 PM, Houghton,Andrew wrote: > After having done numerous matching and mapping projects, there are some > issues that you will face with your strategy, assuming I understand it > correctly. Trying to match a heading starting at the left most subfield and > working forward will not necessarily produce correct results when matching > against the LCSH authority file. Using your example: > > > > 650 _0 $a Education $z England $x Finance > > > > is a good example of why processing the heading starting at the left will > not necessarily produce the correct results. Assuming I understand your > proposal you would first search for: > > > > 150 __ $a Education > > > > and find the heading with LCCN sh85040989. Next you would look for: > > > > 181 __ $z England > > > > and you would NOT find this heading in LCSH. > OK - ignoring the question of where the best place to look for this is - I can live with not matching it for now. Later (perhaps when I understand it better, or when these headings are added to id.loc.gov we can revisit this) > The second issue using your example is that you want to find the “longest” > matching heading. While the pieces parts are there, so is the enumerated > authority heading: > > > > 150 __ $a Education $z England > > > > as LCCN sh2008102746. So your heading is actually composed of the > enumerated headings: > > > > sh2008102746150 __ $a Education $z England > > sh2002007885180 __ $x Finance > > > > and not the separate headings: > > > > sh85040989 150 __ $a Education > > n82068148 150 __ $a England > > sh2002007885180 __ $x Finance > > > > Although one could argue that either analysis is correct depending upon > what you are trying to accomplish. > > > What I'm interested in is representing the data as RDF/Linked Data in a way that opens up the best opportunities for both understanding and querying the data. Unfortunately at the moment there isn't a good way of representing LCSH directly in RDF (the MADS work may help I guess but to be honest at the moment I see that as overly complex - but that's another discussion). What I can do is make statements that an item is 'about' a subject (probably using dc:subject) and then point at an id.loc.gov URI. However, if I only express individual headings: Education England (natch) Finance Then obviously I lose the context of the full heading - so I also want to look for Education--England--Finance (which I won't find on id.loc.gov as not authorised) At this point I could stop, but my feeling is that it is useful to also look for other combinations of the terms: Education--England (not authorised) Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008) My theory is that as long as I stick to combinations that start with a topical term I'm not going to make startlingly inaccurate statements? > The matching algorithm I have used in the past contains two routines. The > first f(a) will accept a heading as a parameter, scrub the heading, e.g., > remove unnecessary subfield like $0, $3, $6, $8, etc. and do any other > pre-processing necessary on the heading, then call the second function f(b). > The f(b) function accepts a heading as a parameter and recursively calls > itself until it builds up the list LCCNs that comprise the heading. It first > looks for the given heading when it doesn’t find it, it removes the **last > ** subfield and recursively calls itself, otherwise it appends the found > LCCN to the returned list and exits. This strategy will find the longest > match. > Unless I've misunderstood this, this strategy would not find 'Education--Finance'? Instead I need to remove each *subdivision* in turn (no matter where it appears in the heading order) and try all possible combinations checking each for a match on id.loc.gov. Again, I can do this without worrying about possible invalid headings, as these wouldn't have been authorised anyway... I can check the number of variations around this but I guess that in my limited set of records (only 30k) there will be a relatively small number of possible patterns to check. Does that make sense?
[CODE4LIB] Mapping vocabularies (was: LCSH and Linked Data)
Hi, Any transformation of a controlled vocabulary, either in format (MARC to RDF) or in coverage (e.g. vom LCSH to DDC, MeSH, GND, etc.) has to decide whether (a) there is a one-to-one (or one-to-zero) mapping between all concepts (b) you need n-to-m or even more complex mappings Mapping name authority files in VIAF was one of (a) because we more or less agree on hat a person is always the same person. But It looks like mapping authority data in MARC from different institutions is an instance of (b). Not only are concepts like "England" more fuzzy than people, but they are also used in different context for different purpose, depending on the cataloging rules and their specific interpretation. It does not help to argue about MARC field because there just is no easy one-to-one mapping between for instance: - The Kingdom of England (927–1707) - The area of the Kingdom of England (927–1707) - The country England as today - The area of England including the Principality of Sealand - The area of England excluding the Principality of Sealand - The whole Island Great Britain - The Island Great Britain including Ireland - The Island Great Britain including Northern Ireland - The Kingdom of Great Britain (1707 to 1801) - The United Kingdom of Great Britain and Ireland (1801 to 1922) - etc. I gave a talk about the fruitless attempt to put reality in terms of Semantic Web at Wikimania 2007 (stating with slide 12): http://www.slideshare.net/NCurse/jakob-voss-wikipedia2007 Instead of discussing how to map terms and concepts "the right way" you should think about how to express fuzzy and complex mappings. The SKOS mapping vocabulary provides some relations for this purpose. I can also recommend the DC2010 paper "Establishing a Multi-Thesauri-Scenario based on SKOS and Cross-Concordances" by Mayr, Zapilko, and Sure: http://dcpapers.dublincore.org/ojs/pubs/article/viewArticle/1031 If you do not want to bother with complex mappings but prefer one-to-one, you should not talk about differences like England as corporate body or as England as place or England as nationality etc. Sure you can put all these meanings into a broad and fuzzy term "England" but than stop complaining about semantic differences and use the term as unqualified subject heading with no specific meaning for anything that is related to any of the many ideas that anyone can call "England". This is the way that full text retrieval works. You just can't have both simple mappings and precise terms. Jakob -- Jakob Voß , skype: nichtich Verbundzentrale des GBV (VZG) / Common Library Network Platz der Goettinger Sieben 1, 37073 Göttingen, Germany +49 (0)551 39-10242, http://www.gbv.de
Re: [CODE4LIB] LCSH and Linked Data
Am 07.04.2011 17:44, schrieb Ford, Kevin: > Actually, it appears to depend on whose Authority record you're looking at. > The Canadians, Australians, and Israelis have it as a CorporateName (110), as > do the French (210 - unimarc); LC and the Germans say it's a Geographic Name. No, the original "England" record linked to VIAF in the German GND says it is a "Gebietskörperschaft", which is a corporate body in English. See http://d-nb.info/gnd/15138-5/about/html and the RDF representation at http://d-nb.info/gnd/15138-5/about/rdf Perhaps something went "wrong" in the mapping of the German authority record to MARC21, so "England" got into the 151 (or there might be good reasons to do it that way, ask metadata experts...). The original record is not maintained in MARC21, we don't do MARC21 (or any MARC at all) here, we are just starting to switch to it as future(!) exchange format... :-). Sorry for being pedantic, early morning and not enough coffee yet... Till -- Till Kinstler Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger Sieben 1, D 37073 Göttingen kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de