[CODE4LIB] LCSH and Linked Data
We are working on converting some MARC library records to RDF, and looking at how we handle links to LCSH (id.loc.gov) - and I'm looking for feedback on how we are proposing to do this... I'm not 100% confident about the approach, and to some extent I'm trying to work around the nature of how LCSH interacts with RDF at the moment I guess... but here goes - I would very much appreciate feedback/criticism/being told why what I'm proposing is wrong: I guess what I want to do is preserve aspects of the faceted nature of LCSH in a useful way, give useful links back to id.loc.gov where possible, and give access to a wide range of facets on which the data set could be queried. Because of this I'm proposing not just expressing the whole of the 650 field as a LCSH and checking for it's existence on id.loc.gov, but also checking for various combinations of topical term and subdivisions from the 650 field. So for any 650 field I'm proposing we should check on id.loc.govfor labels matching: check(650$$a) -- topical term check(650$$b) -- topical term check(650$$v) -- Form subdivision check(650$$x) -- General subdivision check(650$$y) -- Chronological subdivision check(650$$z) -- Geographic subdivision Then using whichever elements exist (all as topical terms): Check(650$$a--650$$b) Check(650$$a--650$$v) Check(650$$a--650$$x) Check(650$$a--650$$y) Check(650$$a--650$$z) Check(650$$a--650$$b--650$$v) Check(650$$a--650$$b--650$$x) Check(650$$a--650$$b--650$$y) Check(650$$a--650$$b--650$$z) Check(650$$a--650$$b--650$$x--650$$v) Check(650$$a--650$$b--650$$x--650$$y) Check(650$$a--650$$b--650$$x--650$$z) Check(650$$a--650$$b--650$$x--650$$z--650$$v) Check(650$$a--650$$b--650$$x--650$$z--650$$y) Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v) As an example given: 650 00 $$aPopular music$$xHistory$$y20th century We would be checking id.loc.gov for 'Popular music' as a topical term (http://id.loc.gov/authorities/sh85088865) 'History' as a general subdivision (http://id.loc.gov/authorities/sh99005024 ) '20th century' as a chronological subdivision ( http://id.loc.gov/authorities/sh2002012476) 'Popular music--History and criticism' as a topical term ( http://id.loc.gov/authorities/sh2008109787) 'Popular music--20th century' as a topical term (not authorised) 'Popular music--History and criticism--20th century' as a topical term (not authorised) And expressing all matches in our RDF. My understanding of LCSH isn't what it might be - but the ordering of terms in the combined string checking is based on what I understand to be the usual order - is this correct, and should we be checking for alternative orderings? Thanks Owen -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] utf8 \xC2 does not map to Unicode
yaz-marcdump does a really good job of charset and format conversion for MARC records, and is blindingly fast. But yaz-marcdump seems to think there are a lot of separators in the wrong place and bad indicator data, whether treating the records as UTF-8 or MARC-8. The leaders in the records say they are UTF-8, but looking at the data, the byte sequences that Jon G. noticed reminds me of UTF-8 data that was UTF-8-encoded a second time. I wonder if they go re-encoded in transmission somewhere along the way. Maybe just in the download from zoila. -Tod On Apr 6, 2011, at 4:11 PM, Jonathan Rochkind wrote: That's hilarious, that Terry has had to do enough ugliness with Marc encodings that he indeed can recognize 0xC2 off the bat as the Marc8 encoding it represents! I am in awe, as well as sympathy. If the record is in Marc8, then you need to know if Perl Batch::Marc can handle Marc8. If it's supposed to be able to handle it, you need to figure out why it's not. (leader byte says UTF-8 even though it's really Marc8?). If Batch::Marc can't handle Marc8, you need to convert to UTF-8 first. The only software package I know of that can convert from and to Marc8 encoding is Java Marc4J, but I wouldn't be shocked if there was something in Perl to do it. (But yes, as you can tell by the name, Marc8 is a character encoding ONLY used in Marc, nobody but library people write software for dealing with it). On 4/6/2011 5:01 PM, Reese, Terry wrote: I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in MARC-8. I'd guess the file isn't in UTF8. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 06, 2011 1:28 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] utf8 \xC2 does not map to Unicode I am not familar with that Perl module. But I'm more familiar then I'd want with char encoding in Marc. I don't recognize the bytes 0xC2 (there are some bytes I became pathetically familiar with in past debugging, but I've forgotten em), but the first things to look at: 1. Is your Marc file encoded in Marc8 or UTF-8? I'm betting Marc8. Theoretically there is a Marc leader byte that tells you whether it's Marc8 or UTF-8, but the leader byte is often wrong in real world records. Is it wrong? 2. Does Perl MARC::Batch have a function to convert from Marc8 to UTF-8? If so, how does it decide whether to convert? Is it trying to do that? Is it assuming that the leader byte the record accurately identifies the encoding, and if so, is the leader byte wrong? Is it trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the first place? Or is it assuming the source was UTF-8 in the first place, when in fact it was Marc8? Not the answer you wanted, maybe someone else will have that. Debugging char encoding is hands down the most annoying kind of debugging I ever do. On 4/6/2011 4:13 PM, Eric Lease Morgan wrote: Ack! While using the venerable Perl MARC::Batch module I get the following error while trying to read a MARC record: utf8 \xC2 does not map to Unicode This is a real pain, and I'm hoping someone here can help me either: 1) trap this error allowing me to move on, or 2) figure out how to open the file correctly. Tod Olson t...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] LCSH and Linked Data
Thanks Tom - very helpful Perhaps this suggests that rather using an order we should check combinations while preserving the order of the original 650 field (I assume this should in theory be correct always - or at least done to the best of the cataloguers knowledge)? So for: 650 _0 $$a Education $$z England $$x Finance. check: Education England (subdiv) Finance (subdiv) Education--England Education--Finance Education--England--Finance While for 650 _0 $$a Education $$x Economic aspects $$z England we check Education Economic aspects (subdiv) England (subdiv) Education--Economic aspects Education--England Education--Economic aspects--England - It is possible for other orders in special circumstances, e.g. with language dictionaries which can go something like: 650 _0 $$a English language $$v Dictionaries $$x Albanian. This possiblity would also covered by preserving the order - check: English Language Dictionaries (subdiv) Albanian (subdiv) English Language--Dictionaries English Language--Albanian English Language--Dictionaries-Albanian Creating possibly invalid headings isn't necessarily a problem - as we won't get a match on id.loc.gov anyway. (Instinctively English Language--Albanian doesn't feel right) - Some of these are repeatable, so you can have too $$vs following each other (e.g. Biography--Dictionaries); two $$zs (very common), as in Education--England--London; two $xs (e.g. Biography--History and criticism). OK - that's fine, we can use each individually and in combination for any repeated headings I think - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of them in the database? Hadn't checked until you asked! We have 1 in the dataset in question (c.30k records) :) I'm not sure how possible it would be to come up with a definitive list of (reasonable) possible combinations. You are probably right - but I'm not too bothered about aiming at 'definitive' at this stage anyway - but I do want to get something relatively functional/useful Tom Thomas Meehan Head of Current Cataloguing University College London Library Services Owen Stephens wrote: We are working on converting some MARC library records to RDF, and looking at how we handle links to LCSH (id.loc.gov http://id.loc.gov) - and I'm looking for feedback on how we are proposing to do this... I'm not 100% confident about the approach, and to some extent I'm trying to work around the nature of how LCSH interacts with RDF at the moment I guess... but here goes - I would very much appreciate feedback/criticism/being told why what I'm proposing is wrong: I guess what I want to do is preserve aspects of the faceted nature of LCSH in a useful way, give useful links back to id.loc.gov http://id.loc.gov where possible, and give access to a wide range of facets on which the data set could be queried. Because of this I'm proposing not just expressing the whole of the 650 field as a LCSH and checking for it's existence on id.loc.gov http://id.loc.gov, but also checking for various combinations of topical term and subdivisions from the 650 field. So for any 650 field I'm proposing we should check on id.loc.gov http://id.loc.gov for labels matching: check(650$$a) -- topical term check(650$$b) -- topical term check(650$$v) -- Form subdivision check(650$$x) -- General subdivision check(650$$y) -- Chronological subdivision check(650$$z) -- Geographic subdivision Then using whichever elements exist (all as topical terms): Check(650$$a--650$$b) Check(650$$a--650$$v) Check(650$$a--650$$x) Check(650$$a--650$$y) Check(650$$a--650$$z) Check(650$$a--650$$b--650$$v) Check(650$$a--650$$b--650$$x) Check(650$$a--650$$b--650$$y) Check(650$$a--650$$b--650$$z) Check(650$$a--650$$b--650$$x--650$$v) Check(650$$a--650$$b--650$$x--650$$y) Check(650$$a--650$$b--650$$x--650$$z) Check(650$$a--650$$b--650$$x--650$$z--650$$v) Check(650$$a--650$$b--650$$x--650$$z--650$$y) Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v) As an example given: 650 00 $$aPopular music$$xHistory$$y20th century We would be checking id.loc.gov http://id.loc.gov for 'Popular music' as a topical term ( http://id.loc.gov/authorities/sh85088865) 'History' as a general subdivision ( http://id.loc.gov/authorities/sh99005024) '20th century' as a chronological subdivision ( http://id.loc.gov/authorities/sh2002012476) 'Popular music--History and criticism' as a topical term ( http://id.loc.gov/authorities/sh2008109787) 'Popular music--20th century' as a topical term (not authorised) 'Popular music--History and criticism--20th century' as a topical term (not authorised) And expressing all matches in our RDF. My understanding of LCSH isn't what it might be - but the ordering of terms in the combined string checking is based on what I understand to be the usual order - is this correct, and should we be checking for alternative orderings? Thanks Owen -- Owen
Re: [CODE4LIB] LCSH and Linked Data
*... Creating possibly invalid headings isn't necessarily a problem - as we won't get a match on id.loc.gov anyway ... *LCSH headings reflect materials cataloged by LC. You may have materials at your UK (or Albania, Tunisia, etc.) which were not cataloged yet at LC, thus nothing yet to match on. *Ya'aqov*
Re: [CODE4LIB] LCSH and Linked Data
After having done numerous matching and mapping projects, there are some issues that you will face with your strategy, assuming I understand it correctly. Trying to match a heading starting at the left most subfield and working forward will not necessarily produce correct results when matching against the LCSH authority file. Using your example: 650 _0 $a Education $z England $x Finance is a good example of why processing the heading starting at the left will not necessarily produce the correct results. Assuming I understand your proposal you would first search for: 150 __ $a Education and find the heading with LCCN sh85040989. Next you would look for: 181 __ $z England and you would NOT find this heading in LCSH. This is issue one. Unfortunately, LC does not create 181 in LCSH (actually I think there are some, but not if it’s a name), instead they create a 781 in the name authority record. So to find the corresponding $z England we need to go to the name authority record 150 England with LCCN n82068148. Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org. The second issue using your example is that you want to find the “longest” matching heading. While the pieces parts are there, so is the enumerated authority heading: 150 __ $a Education $z England as LCCN sh2008102746. So your heading is actually composed of the enumerated headings: sh2008102746150 __ $a Education $z England sh2002007885180 __ $x Finance and not the separate headings: sh85040989 150 __ $a Education n82068148 150 __ $a England sh2002007885180 __ $x Finance Although one could argue that either analysis is correct depending upon what you are trying to accomplish. The matching algorithm I have used in the past contains two routines. The first f(a) will accept a heading as a parameter, scrub the heading, e.g., remove unnecessary subfield like $0, $3, $6, $8, etc. and do any other pre-processing necessary on the heading, then call the second function f(b). The f(b) function accepts a heading as a parameter and recursively calls itself until it builds up the list LCCNs that comprise the heading. It first looks for the given heading when it doesn’t find it, it removes the *last* subfield and recursively calls itself, otherwise it appends the found LCCN to the returned list and exits. This strategy will find the longest match. The headings are search against an augmented LCSH database where the 781 name authority records have been transformed into 181 records keeping the LCCN of the name authority record. Not ideal, but it generally works well. Adjust algorithm per need. Hope this helps, Andy. From: public-lld-requ...@w3.org [mailto:public-lld-requ...@w3.org] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 08:11 To: Thomas Meehan Cc: Code for Libraries; public-lld; f.zabl...@open.ac.uk Subject: Re: LCSH and Linked Data Importance: Low Thanks Tom - very helpful Perhaps this suggests that rather using an order we should check combinations while preserving the order of the original 650 field (I assume this should in theory be correct always - or at least done to the best of the cataloguers knowledge)? So for: 650 _0 $$a Education $$z England $$x Finance. check: Education England (subdiv) Finance (subdiv) Education--England Education--Finance Education--England--Finance While for 650 _0 $$a Education $$x Economic aspects $$z England we check Education Economic aspects (subdiv) England (subdiv) Education--Economic aspects Education--England Education--Economic aspects--England - It is possible for other orders in special circumstances, e.g. with language dictionaries which can go something like: 650 _0 $$a English language $$v Dictionaries $$x Albanian. This possiblity would also covered by preserving the order - check: English Language Dictionaries (subdiv) Albanian (subdiv) English Language--Dictionaries English Language--Albanian English Language--Dictionaries-Albanian Creating possibly invalid headings isn't necessarily a problem - as we won't get a match on id.loc.gov anyway. (Instinctively English Language--Albanian doesn't feel right) - Some of these are repeatable, so you can have too $$vs following each other (e.g. Biography--Dictionaries); two $$zs (very common), as in Education--England--London; two $xs (e.g. Biography--History and criticism). OK - that's fine, we can use each individually and in combination for any repeated headings I think - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of them in the database? Hadn't checked until you asked! We have 1 in the dataset in question (c.30k records) :) I'm not sure how possible it would be to come up with a definitive list of (reasonable)
Re: [CODE4LIB] LCSH and Linked Data
Andrew, please see *[YZ]* below *181 __ $z England and you would NOT find this heading in LCSH. This is issue one. Unfortunately, LC does not create 181 in LCSH (actually I think there are some, but not if it’s a name), instead they create a 781 in the name authority record. * *[YZ]* MARC/LCSH distinguishes between names 100 and geographic names 151 in their authority record. You'll find all geographic names if you look for 151 records. *So to find the corresponding $z England we need to go to the name authority record 150 England with LCCN n82068148.* *[YZ]* *LCCN n82068148* authority record is for 151 England. Also Andrew, are you indicating there is a difference between the form of geographic name in 151$a and 781$z -- ? *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. makes little sense to mix personal/corporate names with geographic ones. Let's see what Ralph comments. *Ya'aqov*
Re: [CODE4LIB] LCSH and Linked Data
Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Actually, it appears to depend on whose Authority record you're looking at. The Canadians, Australians, and Israelis have it as a CorporateName (110), as do the French (210 - unimarc); LC and the Germans say it's a Geographic Name. In the case of LCSH, therefore, it would be a 151. Regardless, it is in VIAF. Warmly, Kevin From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of LeVan,Ralph [le...@oclc.org] Sent: Thursday, April 07, 2011 11:34 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Ralph, Owen's pointing to a list where corporate (110) and geographic names (151) are mixed. Thanks Owen, I haven't seen that the first time. I guess you got that mixed 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround. *Ya'aqov* On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- *ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456 *
Re: [CODE4LIB] LCSH and Linked Data
I'm out of my depth here :) But... this is what I understood Andrew to be saying. In this instance (?because 'England' is a Name Authority?) rather than create a separate LCSH authority record for 'England' (as the 151), rather the LCSH subdivision is recorded in the 781 of the existing Name Authority record. Searching on http://authorities.loc.gov for England, I find an Authorised heading, marked as a LCSH - but when I go to that record what I get is the name authority record n 82068148 - the name authority record as represented on VIAF by http://viaf.org/viaf/142995804/ (which links to http://errol.oclc.org/laf/n%20%2082068148.html) Just as this is getting interesting time differences mean I'm about to head home :) Owen On Thu, Apr 7, 2011 at 4:34 PM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Kevin, England exists as a corporate body and also as a geographic name. BOTH entities exist in LCSH. This doesn't apply to all geographic names, only to some. Andrew pointed us to VIAF, but I expect his algorithm to limit the search for LCSH. Let's wait for his reply. *Ya'aqov* *On Thu, Apr 7, 2011 at 10:44 AM, Ford, Kevin k...@loc.gov wrote: * * Actually, it appears to depend on whose Authority record you're looking at. The Canadians, Australians, and Israelis have it as a CorporateName (110), as do the French (210 - unimarc); LC and the Germans say it's a Geographic Name. In the case of LCSH, therefore, it would be a 151. Regardless, it is in VIAF. Warmly, Kevin * * *
Re: [CODE4LIB] LCSH and Linked Data
More confusing yet, if you look at the raw XML for that record (add viaf.xml to the end of the URI and then view source) you’ll see that the name type is indeed Geographic. My boss is puzzled. Ralph From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] Sent: Thursday, April 07, 2011 11:56 AM To: Code for Libraries Cc: LeVan,Ralph; Houghton,Andrew Subject: Re: [CODE4LIB] LCSH and Linked Data Ralph, Owen's pointing to a list where corporate (110) and geographic names (151) are mixed. Thanks Owen, I haven't seen that the first time. I guess you got that mixed 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround. Ya'aqov On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- ya'aqovZISO | yaaq...@gmail.com | 856 217 3456
Re: [CODE4LIB] LCSH and Linked Data
On 4/7/2011 10:46 AM, Houghton,Andrew wrote: to go to the name authority record 150 England with LCCN n82068148. Currently under id.loc.gov you will not find name authority records, If this would change, so name authority record elements used in 6xx subject cataloging were in id.loc.gov, it would make powerful use of id.loc.gov much more feasible. Is there anyone at LC this suggestion/request could be sent to, possibly en masse? I do sort of have the impression it's been an item of contention inside LC. Jonathan
Re: [CODE4LIB] LCSH and Linked Data
Jonathan, hi and thanks, 1. I believe id.loc.gov includes a list of MARC countries and a list for geographic areas (based on the geographic names in 151 fields. 2. cataloging rules instruct catalogers to use THOSE very name forms in 151 $a when a subject can be divided (limited) geographically using $z. 3. Not all subjects which can be divided geographically will have the geographical subdivision immediately after the subject. There could be 2 different sequences: 650 $a Picket lines $z Ohio 650 $a Picket Lines $x Economical aspects $z Ohio (where/when does the geographical subdivision follow immediately or not $a is part of the rules LC catalogers observe to the dot). There could be also two geographical subdivisions following each other 650 $a Picket lines $zOhio $z Columbus Oh yeah, these record elements could be used powerfully for our users. *Ya'aqov* *On Thu, Apr 7, 2011 at 11:29 AM, Jonathan Rochkind rochk...@jhu.eduwrote: * *On 4/7/2011 10:46 AM, Houghton,Andrew wrote: * * to go to the name authority record 150 England with LCCN n82068148. Currently under id.loc.gov you will not find name authority records, * * * * If this would change, so name authority record elements used in 6xx subject cataloging were in id.loc.gov, it would make powerful use of id.loc.gov much more feasible. Is there anyone at LC this suggestion/request could be sent to, possibly en masse? I do sort of have the impression it's been an item of contention inside LC. Jonathan *
Re: [CODE4LIB] LCSH and Linked Data
1. No disagreement, except that some 151 appears in the name file and some appear in the subject file: n82068148 008/11=a 008/14=a 151 _ _ $a England sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco Mountains (Mexico) 2. Yes, see n5359 151 _ _ $a Sonora (Mexico : State) 751 _ _ $z Mexico $z Sonora (State) 3. Oops, my apologies to my VIAF colleagues, I believe that geographic names are in the works… or at least I was under the impression they were from a discussion I had last night. From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] Sent: Thursday, April 07, 2011 11:18 To: Code for Libraries; Houghton,Andrew Cc: LeVan,Ralph Subject: Re: [CODE4LIB] LCSH and Linked Data Andrew, please see [YZ] below 181 __ $z England and you would NOT find this heading in LCSH. This is issue one. Unfortunately, LC does not create 181 in LCSH (actually I think there are some, but not if it’s a name), instead they create a 781 in the name authority record. [YZ] MARC/LCSH distinguishes between names 100 and geographic names 151 in their authority record. You'll find all geographic names if you look for 151 records. So to find the corresponding $z England we need to go to the name authority record 150 England with LCCN n82068148. [YZ] LCCN n82068148 authority record is for 151 England. Also Andrew, are you indicating there is a difference between the form of geographic name in 151$a and 781$z -- ? Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org. [YZ] viaf.org does not include geographic names. I just checked there England. makes little sense to mix personal/corporate names with geographic ones. Let's see what Ralph comments. Ya'aqov
Re: [CODE4LIB] LCSH and Linked Data
That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction... Andy. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ya'aqov Ziso Sent: Thursday, April 07, 2011 11:56 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Ralph, Owen's pointing to a list where corporate (110) and geographic names (151) are mixed. Thanks Owen, I haven't seen that the first time. I guess you got that mixed 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround. *Ya'aqov* On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote: If you look at the fields those names come from, I think they mean England as a corporation, not England as a place. Ralph -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen Stephens Sent: Thursday, April 07, 2011 11:28 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LCSH and Linked Data Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: *Currently under id.loc.gov you will not find name authority records, but you can find them at viaf.org*. *[YZ]* viaf.org does not include geographic names. I just checked there England. Is this not the relevant VIAF entry http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804 -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com -- *ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456 *
Re: [CODE4LIB] LCSH and Linked Data
*Andrew, as always, most helpful news, kindest thanks! more [YZ] below:* *1. No disagreement, except that some 151 appears in the name file and some appear in the subject file:* *n82068148 008/11=a 008/14=a 151 _ _ $a England* *sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco Mountains (Mexico)* *[YZ] would it be possible then to use both files as sources and create one file for geographical names for our purpose(s)?* *2. Yes, see n5359* *151 _ _ $a Sonora (Mexico : State)* *751 _ _ $z Mexico $z Sonora (State)* ***[YZ] Both stand for a distinct cataloging usage. Jonathan's suggestion to consult LC may answer the question of which field/when to use for geographical names * *3. Oops, my apologies to my VIAF colleagues, I believe that geographic names are in the works… * ***[YZ] inshAllah!* * * *4. That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction...* *[YZ] Exactly. This distinction called for creating both a 110 AND a 151. But we are talking about 151. The case where there is both a 110 and a 151 does NOT apply to geographic names, only to some.* * * *[YZ] VIAF would be helpful to provide a way to limit geographical names ONLY to 151 names and their cross references.*
Re: [CODE4LIB] LCSH and Linked Data
On Thu, Apr 7, 2011 at 12:58 PM, Ya'aqov Ziso yaaq...@gmail.com wrote: 1. I believe id.loc.gov includes a list of MARC countries and a list for geographic areas (based on the geographic names in 151 fields. 2. cataloging rules instruct catalogers to use THOSE very name forms in 151 $a when a subject can be divided (limited) geographically using $z. Yeah, this could get ugly pretty fast. It's a bit unclear to me what the distinction is between identical terms in both the geographic areas and the country codes (http://id.loc.gov/vocabulary/geographicAreas/e-uk-en http://id.loc.gov/vocabulary/countries/enk). Well, in LC's current representation, there *is* no distinction, they're both just skos:Concepts that (by virtue of skos:exactMatch) effectively interchangeable. See also http://id.loc.gov/vocabulary/geographicAreas/fa and http://id.loc.gov/authorities/sh85009230#concept. You have a single institution minting multiple URIs for what is effectively the same thing (albeit in different vocabularies), although, ironically, nothing points at any actual real world objects. VIAF doesn't do much better in this particular case (there are lots of examples where it does, mind you): http://viaf.org/viaf/142995804 (see: http://viaf.org/viaf/142995804/rdf.xml). We have all of these triangulations around the concept of England or Atlas mountains, but we can't actually refer to England or the Atlas mountains. Also, I am not somehow above this problem, either. With the linked MARC codes lists (http://purl.org/NET/marccodes/), I had to make a similar decision, I just chose to go the opposite route: define them as things, rather than concepts (http://purl.org/NET/marccodes/gacs/fa#location, http://purl.org/NET/marccodes/gacs/e-uk-en#location, http://purl.org/NET/marccodes/countries/enk#location, etc.), which presents its own set of problems (http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing no matter how liberal your definition). At some point, it's worth addressing what these things actually *are* and if, indeed, they are effectively the same thing, if it's worth preserving these redundancies, because I think they'll cause grief in the future. -Ross.
Re: [CODE4LIB] LCSH and Linked Data
My bad in (2) that should have been 781 and it’s LC’s way to indicate the geographic form used for a 181 when a heading may be geographically subdivided. The point is, when you are trying to do authority matching/mapping you have to match against the 181’s in LCSH *and* the 781’s in NAF. This is an oddity of the LC authority file that people may not be aware of, hence why I pointed it out. As I indicated, in my mapping projects I have taken LCSH and added new 181 records based on the 781’s found in NAF. This allows the matching process to work reasonably well without dragging in the entire NAF for searching and matching. However, this still doesn’t give the complete the picture since in LCSH the *construction rules* allow you to use things in the name authority file as subjects, ugh. Effectively, LCSH isn’t useful by itself when trying to match/decompose 6XX in bibliographic records. You really need access to NAF as well. Things get worst when talking about the Children’s headings… since you can pull from both LCSH and NAF, ugh-ugh. While LC would like us to think of the authority file as three separate authorities, LCSH, LCSHac, NAF, in reality the dependencies require you to ignore the thesaurus boundaries and just treat the entire authority file as one thesauri. We struggled with this in the terminology services project, especially when the references in one thesaurus cross over into the other thesauri. Andy. From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] Sent: Thursday, April 07, 2011 13:47 To: Code for Libraries; Houghton,Andrew Cc: Hickey,Thom; LeVan,Ralph Subject: Re: [CODE4LIB] LCSH and Linked Data Andrew, as always, most helpful news, kindest thanks! more [YZ] below: 1. No disagreement, except that some 151 appears in the name file and some appear in the subject file: n82068148 008/11=a 008/14=a 151 _ _ $a England sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco Mountains (Mexico) [YZ] would it be possible then to use both files as sources and create one file for geographical names for our purpose(s)? 2. Yes, see n5359 151 _ _ $a Sonora (Mexico : State) 751 _ _ $z Mexico $z Sonora (State) [YZ] Both stand for a distinct cataloging usage. Jonathan's suggestion to consult LC may answer the question of which field/when to use for geographical names 3. Oops, my apologies to my VIAF colleagues, I believe that geographic names are in the works… [YZ] inshAllah! 4. That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction... [YZ] Exactly. This distinction called for creating both a 110 AND a 151. But we are talking about 151. The case where there is both a 110 and a 151 does NOT apply to geographic names, only to some. [YZ] VIAF would be helpful to provide a way to limit geographical names ONLY to 151 names and their cross references.
Re: [CODE4LIB] LCSH and Linked Data
On 4/7/2011 1:21 PM, Houghton,Andrew wrote: That is probably correct. England may appear as both a 110 *and* a 151 because the 110 signifies the concept for the country entity while the 151 signifies the concept for the geographic place. A subtle distinction... This starts getting into categorization philosophy type issues, and reveal that LCSH isn't entirely consistent in it's modelling (as virtually no classification will be without being extraordinarily complex, the world is a messy place), along the lines Ross was talking about too, but I think it can be explicated a bit I'm not sure it's quite true to say that a 151 (corresponding to a 6xx $v subdivision) is a geographic place as entirely distinct from a 'country entity'. I might instead say the 151 is meant to be a sort of geo-historical place, that does take into account, well, either political entities or general contemporary conceptions of place distinctions at particular historical times. While the 110 is about a collective-body _actor_, a government All of these are $v's, which presumably are authorized by authority 151s: Soviet Union Russia Russia (Federation) Former Soviet Republics typically assigned for works about that area of the world at the time that area of the world was known as a particular thing, heh. Or: Italy / Roman Empire Byzantine Empire / Ottoman Empire / Turkey / Balkan Peninsula Now, all those things aren't the _exact_ same longitude and lattitude, but with significant overlap, different in different cases. At any rate, 151s aren't purely a name for a geographic boundary on the planet, they're some kind of, um, geo-political-historical concept. Compare to the terms you can put in an 048, which ARE meant to be history and political entity free. e-ur == Russia. Russian Empire. Soviet Union. Former Soviet Republics. Yeah, all of em together. Nevermind they dont' have exactly the same boundaries. (And of course the boundaries of any one of em can and did change over time). At least 048's MOSTLY try to be purely geographical, free of historical/political context, but then sometimes they go ahead and add weird ones that can't possibly follow that principle, like d= Developing Countries or dd=Developed Countries. But yeah, then we've got the 110 England, which isn't a geographical concept AT ALL, it refers really to the Government/political _actor_ (as a collective body) known as England. Which happens to have controlled or claimed certain geographic territory for itself at different times, but the 110 England isn't about the geographic territory, it's about the collective-body actor. (Does that even still exist? What is it's contemporary or historical relationship to the concepts United Kingdom and Great Britain, are those political actors too?) Somewhere I read an article about the particular messiness of geographic vocabularies, as discussed above, I forget where. Wish I could find it again, it would be helpful here. But modelling the real world with a subject vocabulary is inherently messy, especially so with geographic classification like this that is meant to somehow cover all of recorded human history too. The map is not the territory.
Re: [CODE4LIB] [dpla-discussion] Rethinking the library part of DPLA
The DPLA listserv is probably too impractical for most of Code4Lib, but Nate Hill (who's on this list as well) made this contribution there, which I think deserves attention from library coders here. On Apr 5, 2011, at 11:15 AM, Nate Hill wrote: It is awesome that the project Gutenberg stuff is out there, it is a great start. But libraries aren't using it right. There's been talk on this list about the changing role of the public library in people's lives, there's been talk about the library brand, and some talk about what 'local' might mean in this context. I'd suggest that we should find ways to make reading library ebooks feel local and connected to an immediate community. Brick and mortar library facilities are public spaces, and librarians are proud of that. We have collections of materials in there, and we host programs and events to give those materials context within the community. There's something special about watching a child find a good book, and then show it to his or her friend and talk about how awesome it is. There's also something special about watching a senior citizens book group get together and discuss a new novel every month. For some reason, libraries really struggle with treating their digital spaces the same way. I'd love to see libraries creating online conversations around ebooks in much the same way. Take a title from project Gutenberg: The Adventures of Huckleberry Finn. Why not host that book directly on my library website so that it can be found at an intuitive URL, www.sjpl.org/the-adventures-of-huckleberry-finn and then create a forum for it? The URL itself takes care of the 'local' piece; certainly my most likely visitors will be San Jose residents- especially if other libraries do this same thing. The brand remains intact, when I launch this web page that holds the book I can promote my library's identity. The interface is no problem because I can optimize the page to load well on any device and I can link to different formats of the book. Finally, and most importantly, I've created a local digital space for this book so that people can converse about it via comments, uploaded pictures, video, whatever. I really think this community conversation and context-creation around materials is a big part of what makes public libraries special. Eric Hellman President, Gluejar, Inc. http://www.gluejar.com/ Gluejar is hiring! e...@hellman.net http://go-to-hellman.blogspot.com/ @gluejar