[CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
We are working on converting some MARC library records to RDF, and looking
at how we handle links to LCSH (id.loc.gov) - and I'm looking for feedback
on how we are proposing to do this...

I'm not 100% confident about the approach, and to some extent I'm trying to
work around the nature of how LCSH interacts with RDF at the moment I
guess... but here goes - I would very much appreciate
feedback/criticism/being told why what I'm proposing is wrong:

I guess what I want to do is preserve aspects of the faceted nature of LCSH
in a useful way, give useful links back to id.loc.gov where possible, and
give access to a wide range of facets on which the data set could be
queried. Because of this I'm proposing not just expressing the whole of the
650 field as a LCSH and checking for it's existence on id.loc.gov, but also
checking for various combinations of topical term and subdivisions from the
650 field. So for any 650 field I'm proposing we should check on
id.loc.govfor labels matching:

check(650$$a) -- topical term
check(650$$b) -- topical term
check(650$$v) -- Form subdivision
check(650$$x) -- General subdivision
check(650$$y) -- Chronological subdivision
check(650$$z) -- Geographic subdivision

Then using whichever elements exist (all as topical terms):
Check(650$$a--650$$b)
Check(650$$a--650$$v)
Check(650$$a--650$$x)
Check(650$$a--650$$y)
Check(650$$a--650$$z)
Check(650$$a--650$$b--650$$v)
Check(650$$a--650$$b--650$$x)
Check(650$$a--650$$b--650$$y)
Check(650$$a--650$$b--650$$z)
Check(650$$a--650$$b--650$$x--650$$v)
Check(650$$a--650$$b--650$$x--650$$y)
Check(650$$a--650$$b--650$$x--650$$z)
Check(650$$a--650$$b--650$$x--650$$z--650$$v)
Check(650$$a--650$$b--650$$x--650$$z--650$$y)
Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v)


As an example given:

650 00 $$aPopular music$$xHistory$$y20th century

We would be checking id.loc.gov for

'Popular music' as a topical term (http://id.loc.gov/authorities/sh85088865)
'History' as a general subdivision (http://id.loc.gov/authorities/sh99005024
)
'20th century' as a chronological subdivision (
http://id.loc.gov/authorities/sh2002012476)
'Popular music--History and criticism' as a topical term (
http://id.loc.gov/authorities/sh2008109787)
'Popular music--20th century' as a topical term (not authorised)
'Popular music--History and criticism--20th century' as a topical term (not
authorised)


And expressing all matches in our RDF.

My understanding of LCSH isn't what it might be - but the ordering of terms
in the combined string checking is based on what I understand to be the
usual order - is this correct, and should we be checking for alternative
orderings?

Thanks

Owen


-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com


Re: [CODE4LIB] utf8 \xC2 does not map to Unicode

2011-04-07 Thread Tod Olson
yaz-marcdump does a really good job of charset and format conversion for MARC 
records, and is blindingly fast.

But yaz-marcdump seems to think there are a lot of separators in the wrong 
place and bad indicator data, whether treating the records as UTF-8 or MARC-8.  
The leaders in the records say they are UTF-8, but looking at the data, the 
byte sequences that Jon G. noticed reminds me of UTF-8 data that was 
UTF-8-encoded a second time.  I wonder if they go re-encoded in transmission 
somewhere along the way.  Maybe just in the download from zoila.

-Tod

On Apr 6, 2011, at 4:11 PM, Jonathan Rochkind wrote:

 That's hilarious, that Terry has had to do enough ugliness with Marc 
 encodings that he indeed can recognize 0xC2 off the bat as the Marc8 
 encoding it represents!  I am in awe, as well as sympathy.
 
 If the record is in Marc8, then you need to know if Perl Batch::Marc can 
 handle Marc8.  If it's supposed to be able to handle it, you need to 
 figure out why it's not. (leader byte says UTF-8 even though it's really 
 Marc8?).
 
 If Batch::Marc can't handle Marc8, you need to convert to UTF-8 first. 
 The only software package I know of that can convert from and to Marc8 
 encoding is Java Marc4J, but I wouldn't be shocked if there was 
 something in Perl to do it. (But yes, as you can tell by the name, 
 Marc8 is a character encoding ONLY used in Marc, nobody but library 
 people write software for dealing with it).
 
 On 4/6/2011 5:01 PM, Reese, Terry wrote:
 I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker 
 in MARC-8.  I'd guess the file isn't in UTF8.
 
 --TR
 
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Jonathan Rochkind
 Sent: Wednesday, April 06, 2011 1:28 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] utf8 \xC2 does not map to Unicode
 
 I am not familar with that Perl module. But I'm more familiar then I'd want
 with char encoding in Marc.
 
 I don't recognize the bytes 0xC2 (there are some bytes I became pathetically
 familiar with in past debugging, but I've forgotten em), but the first 
 things to
 look at:
 
 1. Is your Marc file encoded in Marc8 or UTF-8?  I'm betting Marc8.
 Theoretically there is a Marc leader byte that tells you whether it's
 Marc8 or UTF-8, but the leader byte is often wrong in real world records.  
 Is it
 wrong?
 
 2. Does Perl MARC::Batch  have a function to convert from Marc8 to
 UTF-8?   If so, how does it decide whether to convert? Is it trying to
 do that?  Is it assuming that the leader byte the record accurately
 identifies the encoding, and if so, is the leader byte wrong?   Is it
 trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the
 first place?  Or is it assuming the source was UTF-8 in the first place, 
 when in
 fact it was Marc8?
 
 Not the answer you wanted, maybe someone else will have that. Debugging
 char encoding is hands down the most annoying kind of debugging I ever do.
 
 On 4/6/2011 4:13 PM, Eric Lease Morgan wrote:
 Ack! While using the venerable Perl MARC::Batch module I get the
 following error while trying to read a MARC record:
utf8 \xC2 does not map to Unicode
 
 This is a real pain, and I'm hoping someone here can help me either: 1) 
 trap
 this error allowing me to move on, or 2) figure out how to open the file
 correctly.

Tod Olson t...@uchicago.edu
Systems Librarian
University of Chicago Library


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
Thanks Tom - very helpful

Perhaps this suggests that rather using an order we should check
combinations while preserving the order of the original 650 field (I assume
this should in theory be correct always - or at least done to the best of
the cataloguers knowledge)?

So for:

650 _0 $$a Education $$z England $$x Finance.

check:

Education
England (subdiv)
Finance (subdiv)
Education--England
Education--Finance
Education--England--Finance

While for 650 _0 $$a Education $$x Economic aspects $$z England we check

Education
Economic aspects (subdiv)
England (subdiv)
Education--Economic aspects
Education--England
Education--Economic aspects--England


 - It is possible for other orders in special circumstances, e.g. with
 language dictionaries which can go something like:

 650 _0 $$a English language $$v Dictionaries $$x Albanian.


This possiblity would also covered by preserving the order - check:

English Language
Dictionaries (subdiv)
Albanian (subdiv)
English Language--Dictionaries
English Language--Albanian
English Language--Dictionaries-Albanian

Creating possibly invalid headings isn't necessarily a problem - as we won't
get a match on id.loc.gov anyway. (Instinctively English Language--Albanian
doesn't feel right)



 - Some of these are repeatable, so you can have too $$vs following each
 other (e.g. Biography--Dictionaries); two $$zs (very common), as in
 Education--England--London; two $xs (e.g. Biography--History and criticism).

 OK - that's fine, we can use each individually and in combination for any
repeated headings I think


 - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of
 them in the database?

 Hadn't checked until you asked! We have 1 in the dataset in question (c.30k
records) :)


 I'm not sure how possible it would be to come up with a definitive list of
 (reasonable) possible combinations.

 You are probably right - but I'm not too bothered about aiming at
'definitive' at this stage anyway - but I do want to get something
relatively functional/useful


 Tom

 Thomas Meehan
 Head of Current Cataloguing
 University College London Library Services

 Owen Stephens wrote:

 We are working on converting some MARC library records to RDF, and looking
 at how we handle links to LCSH (id.loc.gov http://id.loc.gov) - and I'm
 looking for feedback on how we are proposing to do this...


 I'm not 100% confident about the approach, and to some extent I'm trying
 to work around the nature of how LCSH interacts with RDF at the moment I
 guess... but here goes - I would very much appreciate
 feedback/criticism/being told why what I'm proposing is wrong:

 I guess what I want to do is preserve aspects of the faceted nature of
 LCSH in a useful way, give useful links back to id.loc.gov 
 http://id.loc.gov where possible, and give access to a wide range of
 facets on which the data set could be queried. Because of this I'm proposing
 not just expressing the whole of the 650 field as a LCSH and checking for
 it's existence on id.loc.gov http://id.loc.gov, but also checking for
 various combinations of topical term and subdivisions from the 650 field. So
 for any 650 field I'm proposing we should check on id.loc.gov 
 http://id.loc.gov for labels matching:


 check(650$$a) -- topical term
 check(650$$b) -- topical term
 check(650$$v) -- Form subdivision
 check(650$$x) -- General subdivision
 check(650$$y) -- Chronological subdivision
 check(650$$z) -- Geographic subdivision

 Then using whichever elements exist (all as topical terms):
 Check(650$$a--650$$b)
 Check(650$$a--650$$v)
 Check(650$$a--650$$x)
 Check(650$$a--650$$y)
 Check(650$$a--650$$z)
 Check(650$$a--650$$b--650$$v)
 Check(650$$a--650$$b--650$$x)
 Check(650$$a--650$$b--650$$y)
 Check(650$$a--650$$b--650$$z)
 Check(650$$a--650$$b--650$$x--650$$v)
 Check(650$$a--650$$b--650$$x--650$$y)
 Check(650$$a--650$$b--650$$x--650$$z)
 Check(650$$a--650$$b--650$$x--650$$z--650$$v)
 Check(650$$a--650$$b--650$$x--650$$z--650$$y)
 Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v)


 As an example given:

 650 00 $$aPopular music$$xHistory$$y20th century

 We would be checking id.loc.gov http://id.loc.gov for


 'Popular music' as a topical term (
 http://id.loc.gov/authorities/sh85088865)
 'History' as a general subdivision (
 http://id.loc.gov/authorities/sh99005024)
 '20th century' as a chronological subdivision (
 http://id.loc.gov/authorities/sh2002012476)
 'Popular music--History and criticism' as a topical term (
 http://id.loc.gov/authorities/sh2008109787)
 'Popular music--20th century' as a topical term (not authorised)
 'Popular music--History and criticism--20th century' as a topical term
 (not authorised)


 And expressing all matches in our RDF.

 My understanding of LCSH isn't what it might be - but the ordering of
 terms in the combined string checking is based on what I understand to be
 the usual order - is this correct, and should we be checking for alternative
 orderings?

 Thanks

 Owen


 --
 Owen 

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
*... Creating possibly invalid headings isn't necessarily a problem - as we
won't get a match on id.loc.gov anyway ...


*LCSH headings reflect materials cataloged by LC. You may have materials at
your UK (or Albania, Tunisia, etc.) which were not cataloged yet at LC, thus
nothing yet to match on.
*Ya'aqov*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
After having done numerous matching and mapping projects, there are some issues 
that you will face with your strategy, assuming I understand it correctly. 
Trying to match a heading starting at the left most subfield and working 
forward will not necessarily produce correct results when matching against the 
LCSH authority file. Using your example:

 

650 _0 $a Education $z England $x Finance

 

is a good example of why processing the heading starting at the left will not 
necessarily produce the correct results.  Assuming I understand your proposal 
you would first search for:

 

150 __ $a Education

 

and find the heading with LCCN sh85040989. Next you would look for:

 

181 __ $z England

 

and you would NOT find this heading in LCSH. This is issue one. Unfortunately, 
LC does not create 181 in LCSH (actually I think there are some, but not if 
it’s a name), instead they create a 781 in the name authority record. So to 
find the corresponding $z England we need to go to the name authority record 
150 England with LCCN n82068148. Currently under id.loc.gov you will not find 
name authority records, but you can find them at viaf.org. The second issue 
using your example is that you want to find the “longest” matching heading. 
While the pieces parts are there, so is the enumerated authority heading:

 

150 __ $a Education $z England

 

as LCCN sh2008102746. So your heading is actually composed of the enumerated 
headings:

 

sh2008102746150 __ $a Education $z England

sh2002007885180 __ $x Finance

 

and not the separate headings:

 

sh85040989 150 __ $a Education

n82068148   150 __ $a England

sh2002007885180 __ $x Finance

 

Although one could argue that either analysis is correct depending upon what 
you are trying to accomplish.

 

The matching algorithm I have used in the past contains two routines. The first 
f(a) will accept a heading as a parameter, scrub the heading, e.g., remove 
unnecessary subfield like $0, $3, $6, $8, etc. and do any other pre-processing 
necessary on the heading, then call the second function f(b). The f(b) function 
accepts a heading as a parameter and recursively calls itself until it builds 
up the list LCCNs that comprise the heading. It first looks for the given 
heading when it doesn’t find it, it removes the *last* subfield and recursively 
calls itself, otherwise it appends the found LCCN to the returned list and 
exits. This strategy will find the longest match. The headings are search 
against an augmented LCSH database where the 781 name authority records have 
been transformed into 181 records keeping the LCCN of the name authority 
record. Not ideal, but it generally works well. Adjust algorithm per need.

 

Hope this helps, Andy.

 

 

From: public-lld-requ...@w3.org [mailto:public-lld-requ...@w3.org] On Behalf Of 
Owen Stephens
Sent: Thursday, April 07, 2011 08:11
To: Thomas Meehan
Cc: Code for Libraries; public-lld; f.zabl...@open.ac.uk
Subject: Re: LCSH and Linked Data
Importance: Low

 

Thanks Tom - very helpful

Perhaps this suggests that rather using an order we should check combinations 
while preserving the order of the original 650 field (I assume this should in 
theory be correct always - or at least done to the best of the cataloguers 
knowledge)?

 

So for:

 

650 _0 $$a Education $$z England $$x Finance.

 

check:

 

Education

England (subdiv)

Finance (subdiv)

Education--England

Education--Finance

Education--England--Finance

 

While for 650 _0 $$a Education $$x Economic aspects $$z England we check

 

Education

Economic aspects (subdiv)

England (subdiv)

Education--Economic aspects

Education--England

Education--Economic aspects--England


- It is possible for other orders in special circumstances, e.g. with 
language dictionaries which can go something like:

650 _0 $$a English language $$v Dictionaries $$x Albanian.

 

This possiblity would also covered by preserving the order - check:

 

English Language

Dictionaries (subdiv)

Albanian (subdiv)

English Language--Dictionaries

English Language--Albanian

English Language--Dictionaries-Albanian

 

Creating possibly invalid headings isn't necessarily a problem - as we won't 
get a match on id.loc.gov anyway. (Instinctively English Language--Albanian 
doesn't feel right)

 


- Some of these are repeatable, so you can have too $$vs following each 
other (e.g. Biography--Dictionaries); two $$zs (very common), as in 
Education--England--London; two $xs (e.g. Biography--History and criticism).

OK - that's fine, we can use each individually and in combination for any 
repeated headings I think

 

- I'm not I've ever come across a lot of $$bs in 650s. Do you have a 
lot of them in the database?

Hadn't checked until you asked! We have 1 in the dataset in question (c.30k 
records) :)

 

I'm not sure how possible it would be to come up with a definitive list 
of (reasonable) 

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Andrew, please see *[YZ]* below

*181 __ $z England  and you would NOT find this heading in LCSH. This is
issue one. Unfortunately, LC does not create 181 in LCSH (actually I think
there are some, but not if it’s a name), instead they create a 781 in the
name authority record. *
*[YZ]*  MARC/LCSH distinguishes between names 100 and geographic names 151
in their authority record. You'll find all geographic names if you look for
151 records.

*So to find the corresponding $z England we need to go to the name authority
record 150 England with LCCN n82068148.*
*[YZ]*  *LCCN n82068148* authority record is  for 151 England.
Also Andrew, are you indicating there is a difference between the form of
geographic name in 151$a and 781$z   -- ?

*Currently under id.loc.gov you will not find name authority records, but
you can find them at viaf.org*.
*[YZ]*  viaf.org does not include geographic names. I just checked there
England. makes little sense to mix personal/corporate names with geographic
ones. Let's see what Ralph comments.

*Ya'aqov*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
Still digesting Andrew's response (thanks Andrew), but

On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote:

 *Currently under id.loc.gov you will not find name authority records, but
 you can find them at viaf.org*.
 *[YZ]*  viaf.org does not include geographic names. I just checked there
 England.


Is this not the relevant VIAF entry
http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804


-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread LeVan,Ralph
If you look at the fields those names come from, I think they mean
England as a corporation, not England as a place.

Ralph

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Owen Stephens
 Sent: Thursday, April 07, 2011 11:28 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data
 
 Still digesting Andrew's response (thanks Andrew), but
 
 On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
wrote:
 
  *Currently under id.loc.gov you will not find name authority
records, but
  you can find them at viaf.org*.
  *[YZ]*  viaf.org does not include geographic names. I just checked
there
  England.
 
 
 Is this not the relevant VIAF entry
 http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
 
 
 --
 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ford, Kevin
Actually, it appears to depend on whose Authority record you're looking at.  
The Canadians, Australians, and Israelis have it as a CorporateName (110), as 
do the French (210 - unimarc); LC and the Germans say it's a Geographic Name.

In the case of LCSH, therefore, it would be a 151.  Regardless, it is in VIAF.

Warmly,

Kevin




From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of LeVan,Ralph 
[le...@oclc.org]
Sent: Thursday, April 07, 2011 11:34
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] LCSH and Linked Data

If you look at the fields those names come from, I think they mean
England as a corporation, not England as a place.

Ralph

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Owen Stephens
 Sent: Thursday, April 07, 2011 11:28 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data

 Still digesting Andrew's response (thanks Andrew), but

 On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
wrote:

  *Currently under id.loc.gov you will not find name authority
records, but
  you can find them at viaf.org*.
  *[YZ]*  viaf.org does not include geographic names. I just checked
there
  England.
 

 Is this not the relevant VIAF entry
 http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804


 --
 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Ralph, Owen's pointing to a list where corporate (110) and geographic names
(151) are mixed.

Thanks Owen, I haven't seen that the first time. I guess you got that mixed
110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround.

*Ya'aqov*





On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote:

 If you look at the fields those names come from, I think they mean
 England as a corporation, not England as a place.

 Ralph

  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of
  Owen Stephens
  Sent: Thursday, April 07, 2011 11:28 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] LCSH and Linked Data
 
  Still digesting Andrew's response (thanks Andrew), but
 
  On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
 wrote:
 
   *Currently under id.loc.gov you will not find name authority
 records, but
   you can find them at viaf.org*.
   *[YZ]*  viaf.org does not include geographic names. I just checked
 there
   England.
  
 
  Is this not the relevant VIAF entry
  http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
 
 
  --
  Owen Stephens
  Owen Stephens Consulting
  Web: http://www.ostephens.com
  Email: o...@ostephens.com




-- 
*ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456

*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
I'm out of my depth here :)

But... this is what I understood Andrew to be saying. In this instance
(?because 'England' is a Name Authority?) rather than create a separate LCSH
authority record for 'England' (as the 151), rather the LCSH subdivision is
recorded in the 781 of the existing Name Authority record.

Searching on http://authorities.loc.gov for England, I find an Authorised
heading, marked as a LCSH - but when I go to that record what I get is the
name authority record n 82068148 - the name authority record as represented
on VIAF by http://viaf.org/viaf/142995804/ (which links to
http://errol.oclc.org/laf/n%20%2082068148.html)

Just as this is getting interesting time differences mean I'm about to head
home :)

Owen

On Thu, Apr 7, 2011 at 4:34 PM, LeVan,Ralph le...@oclc.org wrote:

 If you look at the fields those names come from, I think they mean
 England as a corporation, not England as a place.

 Ralph

  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of
  Owen Stephens
  Sent: Thursday, April 07, 2011 11:28 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] LCSH and Linked Data
 
  Still digesting Andrew's response (thanks Andrew), but
 
  On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
 wrote:
 
   *Currently under id.loc.gov you will not find name authority
 records, but
   you can find them at viaf.org*.
   *[YZ]*  viaf.org does not include geographic names. I just checked
 there
   England.
  
 
  Is this not the relevant VIAF entry
  http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
 
 
  --
  Owen Stephens
  Owen Stephens Consulting
  Web: http://www.ostephens.com
  Email: o...@ostephens.com




-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Kevin,

England exists as a corporate body and also as a geographic name. BOTH
entities exist in LCSH. This doesn't apply to all geographic names, only to
some.

Andrew pointed us to VIAF, but I expect his algorithm to limit the search
for LCSH. Let's wait for his reply.

*Ya'aqov*

*On Thu, Apr 7, 2011 at 10:44 AM, Ford, Kevin k...@loc.gov wrote:
*

 * Actually, it appears to depend on whose Authority record you're looking
 at.  The Canadians, Australians, and Israelis have it as a CorporateName
 (110), as do the French (210 - unimarc); LC and the Germans say it's a
 Geographic Name.

 In the case of LCSH, therefore, it would be a 151.  Regardless, it is in
 VIAF.

 Warmly,

 Kevin


 *
 

*
*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread LeVan,Ralph
More confusing yet, if you look at the raw XML for that record (add viaf.xml to 
the end of the URI and then view source) you’ll see that the name type is 
indeed Geographic.

 

My boss is puzzled.

 

Ralph

 

From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] 
Sent: Thursday, April 07, 2011 11:56 AM
To: Code for Libraries
Cc: LeVan,Ralph; Houghton,Andrew
Subject: Re: [CODE4LIB] LCSH and Linked Data

 

Ralph, Owen's pointing to a list where corporate (110) and geographic names 
(151) are mixed. 

 

Thanks Owen, I haven't seen that the first time. I guess you got that mixed 
110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround.

 

Ya'aqov

 

 

 

 

On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote:

If you look at the fields those names come from, I think they mean
England as a corporation, not England as a place.

Ralph


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Owen Stephens

 Sent: Thursday, April 07, 2011 11:28 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data

 Still digesting Andrew's response (thanks Andrew), but

 On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
wrote:

  *Currently under id.loc.gov you will not find name authority
records, but
  you can find them at viaf.org*.
  *[YZ]*  viaf.org does not include geographic names. I just checked
there
  England.
 

 Is this not the relevant VIAF entry

 http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804


 --

 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com




-- 
ya'aqovZISO | yaaq...@gmail.com | 856 217 3456





Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Jonathan Rochkind

On 4/7/2011 10:46 AM, Houghton,Andrew wrote:

to go to the name authority record 150 England with LCCN n82068148. Currently 
under id.loc.gov you will not find name authority records,


If this would change, so name authority record elements used in 6xx 
subject cataloging were in id.loc.gov, it would make powerful use of 
id.loc.gov much more feasible.


Is there anyone at LC this suggestion/request could be sent to, possibly 
en masse?  I do sort of have the impression it's been an item of 
contention inside LC.


Jonathan


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Jonathan, hi and thanks,

1. I believe id.loc.gov includes a list of MARC countries and a list for
geographic areas (based on the geographic names in 151 fields.
2. cataloging rules instruct catalogers to use THOSE very name forms in 151
$a when a subject can be divided (limited)  geographically using $z.
3. Not all subjects which can be divided geographically will have the
geographical subdivision immediately after the subject. There could be 2
different sequences:

650  $a Picket lines $z Ohio
650 $a Picket Lines $x Economical aspects $z Ohio
(where/when does the geographical subdivision follow immediately or not $a
is part of the rules LC catalogers observe to the dot).

There could be also two geographical subdivisions following each other
650 $a Picket lines $zOhio $z Columbus

Oh yeah, these record elements could be used powerfully for our users.
*Ya'aqov*

*On Thu, Apr 7, 2011 at 11:29 AM, Jonathan Rochkind rochk...@jhu.eduwrote:
*

 *On 4/7/2011 10:46 AM, Houghton,Andrew wrote:
 *

 * to go to the name authority record 150 England with LCCN n82068148.
 Currently under id.loc.gov you will not find name authority records,
 *

 *
 *
 * If this would change, so name authority record elements used in 6xx
 subject cataloging were in id.loc.gov, it would make powerful use of
 id.loc.gov much more feasible.

 Is there anyone at LC this suggestion/request could be sent to, possibly en
 masse?  I do sort of have the impression it's been an item of contention
 inside LC.

 Jonathan
 *


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
1.   No disagreement, except that some 151 appears in the name file and 
some appear in the subject file:
n82068148   008/11=a 008/14=a 151 _ _ $a 
England 
sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco 
Mountains (Mexico)



2.   Yes, see n5359
151 _ _ $a Sonora (Mexico : State)
751 _ _ $z Mexico $z Sonora (State)



3.   Oops, my apologies to my VIAF colleagues, I believe that geographic 
names are in the works… or at least I was under the impression they were from a 
discussion I had last night.



 

From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] 
Sent: Thursday, April 07, 2011 11:18
To: Code for Libraries; Houghton,Andrew
Cc: LeVan,Ralph
Subject: Re: [CODE4LIB] LCSH and Linked Data

 

Andrew, please see [YZ] below

 

181 __ $z England  and you would NOT find this heading in LCSH. This is issue 
one. Unfortunately, LC does not create 181 in LCSH (actually I think there are 
some, but not if it’s a name), instead they create a 781 in the name authority 
record. 

[YZ]  MARC/LCSH distinguishes between names 100 and geographic names 151 in 
their authority record. You'll find all geographic names if you look for 151 
records.

 

So to find the corresponding $z England we need to go to the name authority 
record 150 England with LCCN n82068148. 

[YZ]  LCCN n82068148 authority record is  for 151 England.

Also Andrew, are you indicating there is a difference between the form of 
geographic name in 151$a and 781$z   -- ?

 

Currently under id.loc.gov you will not find name authority records, but you 
can find them at viaf.org. 

[YZ]  viaf.org does not include geographic names. I just checked there England. 
makes little sense to mix personal/corporate names with geographic ones. Let's 
see what Ralph comments.

 

Ya'aqov



Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
That is probably correct. England may appear as both a 110 *and* a 151 because 
the 110 signifies the concept for the country entity while the 151 signifies 
the concept for the geographic place. A subtle distinction...

Andy.

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Ya'aqov Ziso
 Sent: Thursday, April 07, 2011 11:56
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data
 
 Ralph, Owen's pointing to a list where corporate (110) and geographic
 names
 (151) are mixed.
 
 Thanks Owen, I haven't seen that the first time. I guess you got that
 mixed
 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround.
 
 *Ya'aqov*
 
 
 
 
 
 On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote:
 
  If you look at the fields those names come from, I think they mean
  England as a corporation, not England as a place.
 
  Ralph
 
   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf
  Of
   Owen Stephens
   Sent: Thursday, April 07, 2011 11:28 AM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] LCSH and Linked Data
  
   Still digesting Andrew's response (thanks Andrew), but
  
   On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
  wrote:
  
*Currently under id.loc.gov you will not find name authority
  records, but
you can find them at viaf.org*.
*[YZ]*  viaf.org does not include geographic names. I just
 checked
  there
England.
   
  
   Is this not the relevant VIAF entry
   http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
  
  
   --
   Owen Stephens
   Owen Stephens Consulting
   Web: http://www.ostephens.com
   Email: o...@ostephens.com
 
 
 
 
 --
 *ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456
 
 *


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
*Andrew, as always, most helpful news, kindest thanks! more [YZ] below:*

*1.   No disagreement, except that some 151 appears in the name file and
some appear in the subject file:*
*n82068148   008/11=a 008/14=a 151 _ _ $a
England*
*sh2010015057008/11=a 008/14=b 151 _ _ $a
Tabasco Mountains (Mexico)*
*[YZ] would it be possible then to use both files as sources and create one
file for geographical names for our purpose(s)?*

*2.   Yes, see n5359*
*151 _ _ $a Sonora (Mexico : State)*
*751 _ _ $z Mexico $z Sonora (State)*
***[YZ]  Both stand for a distinct cataloging usage. Jonathan's suggestion
to consult LC may answer the question of which field/when to use for
geographical names
*
*3.   Oops, my apologies to my VIAF colleagues, I believe that
geographic names are in the works… *
***[YZ] inshAllah!*
*
*
*4. That is probably correct. England may appear as both a 110 *and* a 151
because the 110 signifies the concept for the country entity while the 151
signifies the concept for the geographic place. A subtle distinction...*
*[YZ] Exactly. This distinction called for creating both a 110 AND a 151.
But we are talking about 151. The case where there is both a 110 and a 151
does NOT apply to geographic names, only to some.*
*
*
*[YZ] VIAF would be helpful to provide a way to limit geographical names
ONLY to 151 names and their cross references.*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ross Singer
On Thu, Apr 7, 2011 at 12:58 PM, Ya'aqov Ziso yaaq...@gmail.com wrote:

 1. I believe id.loc.gov includes a list of MARC countries and a list for
 geographic areas (based on the geographic names in 151 fields.
 2. cataloging rules instruct catalogers to use THOSE very name forms in 151
 $a when a subject can be divided (limited)  geographically using $z.

Yeah, this could get ugly pretty fast.  It's a bit unclear to me what
the distinction is between identical terms in both the geographic
areas and the country codes
(http://id.loc.gov/vocabulary/geographicAreas/e-uk-en 
http://id.loc.gov/vocabulary/countries/enk).  Well, in LC's current
representation, there *is* no distinction, they're both just
skos:Concepts that (by virtue of skos:exactMatch) effectively
interchangeable.

See also http://id.loc.gov/vocabulary/geographicAreas/fa and
http://id.loc.gov/authorities/sh85009230#concept.  You have a single
institution minting multiple URIs for what is effectively the same
thing (albeit in different vocabularies), although, ironically,
nothing points at any actual real world objects.

VIAF doesn't do much better in this particular case (there are lots of
examples where it does, mind you):  http://viaf.org/viaf/142995804
(see: http://viaf.org/viaf/142995804/rdf.xml).  We have all of these
triangulations around the concept of England or Atlas mountains,
but we can't actually refer to England or the Atlas mountains.

Also, I am not somehow above this problem, either.  With the linked
MARC codes lists (http://purl.org/NET/marccodes/), I had to make a
similar decision, I just chose to go the opposite route:  define them
as things, rather than concepts
(http://purl.org/NET/marccodes/gacs/fa#location,
http://purl.org/NET/marccodes/gacs/e-uk-en#location,
http://purl.org/NET/marccodes/countries/enk#location, etc.), which
presents its own set of problems
(http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing
no matter how liberal your definition).

At some point, it's worth addressing what these things actually *are*
and if, indeed, they are effectively the same thing, if it's worth
preserving these redundancies, because I think they'll cause grief in
the future.

-Ross.


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
My bad in (2) that should have been 781 and it’s LC’s way to indicate the 
geographic form used for a 181 when a heading may be geographically subdivided. 
The point is, when you are trying to do authority matching/mapping you have to 
match against the 181’s in LCSH *and* the 781’s in NAF.  This is an oddity of 
the LC authority file that people may not be aware of, hence why I pointed it 
out.  As I indicated, in my mapping projects I have taken LCSH and added new 
181 records based on the 781’s found in NAF.  This allows the matching process 
to work reasonably well without dragging in the entire NAF for searching and 
matching.  However, this still doesn’t give the complete the picture since in 
LCSH the *construction rules* allow you to use things in the name authority 
file as subjects, ugh.  Effectively, LCSH isn’t useful by itself when trying to 
match/decompose 6XX in bibliographic records.  You really need access to NAF as 
well.  Things get worst when talking about the Children’s headings… since you 
can pull from both LCSH and NAF, ugh-ugh.  While LC would like us to think of 
the authority file as three separate authorities, LCSH, LCSHac, NAF, in reality 
the dependencies require you to ignore the thesaurus boundaries and just treat 
the entire authority file as one thesauri.  We struggled with this in the 
terminology services project, especially when the references in one thesaurus 
cross over into the other thesauri.

 

Andy.

 

From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] 
Sent: Thursday, April 07, 2011 13:47
To: Code for Libraries; Houghton,Andrew
Cc: Hickey,Thom; LeVan,Ralph
Subject: Re: [CODE4LIB] LCSH and Linked Data

 

Andrew, as always, most helpful news, kindest thanks! more [YZ] below:

 

1.   No disagreement, except that some 151 appears in the name file and 
some appear in the subject file:
n82068148   008/11=a 008/14=a 151 _ _ $a England
sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco 
Mountains (Mexico)
[YZ] would it be possible then to use both files as sources and create one file 
for geographical names for our purpose(s)?

2.   Yes, see n5359
151 _ _ $a Sonora (Mexico : State)
751 _ _ $z Mexico $z Sonora (State)

[YZ]  Both stand for a distinct cataloging usage. Jonathan's suggestion to 
consult LC may answer the question of which field/when to use for geographical 
names

3.   Oops, my apologies to my VIAF colleagues, I believe that geographic 
names are in the works… 

[YZ] inshAllah!

 

4. That is probably correct. England may appear as both a 110 *and* a 151 
because the 110 signifies the concept for the country entity while the 151 
signifies the concept for the geographic place. A subtle distinction...

[YZ] Exactly. This distinction called for creating both a 110 AND a 151. But we 
are talking about 151. The case where there is both a 110 and a 151 does NOT 
apply to geographic names, only to some.

 

[YZ] VIAF would be helpful to provide a way to limit geographical names ONLY to 
151 names and their cross references.



Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Jonathan Rochkind

On 4/7/2011 1:21 PM, Houghton,Andrew wrote:

That is probably correct. England may appear as both a 110 *and* a 151 because 
the 110 signifies the concept for the country entity while the 151 signifies 
the concept for the geographic place. A subtle distinction...


This starts getting into categorization philosophy type issues, and 
reveal that LCSH isn't entirely consistent in it's modelling (as 
virtually no classification will be without being extraordinarily 
complex, the world is a messy  place), along the lines Ross was talking 
about too, but I think it can be explicated a bit


I'm not sure it's quite true to say that a 151 (corresponding to a 
6xx $v subdivision) is a geographic place as entirely distinct from a 
'country entity'.   I might instead say the 151 is meant to be a sort of 
geo-historical place,  that does take into account, well, either 
political entities or general contemporary conceptions of place 
distinctions at particular historical times.  While the 110 is about a 
collective-body _actor_, a government


All of these are $v's, which presumably are authorized by authority 151s:

Soviet Union
Russia
Russia (Federation)
Former Soviet Republics

typically assigned for works about that area of the world at the time 
that area of the world was known as a particular thing, heh.


Or: Italy / Roman Empire
Byzantine Empire / Ottoman Empire / Turkey / Balkan Peninsula

Now, all those things aren't the _exact_ same longitude and lattitude, 
but with significant overlap, different in different cases. At any rate, 
151s aren't  purely a name for a geographic boundary on the planet, 
they're some kind of, um, geo-political-historical concept.


Compare to the terms you can put in an 048, which ARE meant to be 
history and political entity free. e-ur == Russia. Russian Empire. 
Soviet Union. Former Soviet Republics. Yeah, all of em together. 
Nevermind they dont' have exactly the same boundaries. (And of course 
the boundaries of any one of em can and did change over time).  At least 
048's MOSTLY try to be purely geographical, free of historical/political 
context, but then sometimes they go ahead and add weird ones that can't 
possibly follow that principle, like d= Developing Countries or 
dd=Developed Countries.


But yeah, then we've got the 110 England, which isn't a geographical 
concept AT ALL, it refers really to the Government/political _actor_  
(as a collective body)  known as England. Which happens to have 
controlled or claimed certain geographic territory for itself at 
different times, but the 110 England isn't about the geographic 
territory, it's about the collective-body actor. (Does that even still 
exist? What is it's contemporary or historical relationship to the 
concepts United Kingdom and Great Britain, are those political 
actors too?)


Somewhere I read an article about the particular messiness of geographic 
vocabularies, as discussed above, I forget where.  Wish I could find it 
again, it would be helpful here.  But modelling the real world with a 
subject vocabulary is inherently messy, especially so with geographic 
classification like this that is meant to somehow cover all of recorded 
human history too. The map is not the territory.


Re: [CODE4LIB] [dpla-discussion] Rethinking the library part of DPLA

2011-04-07 Thread Eric Hellman
The DPLA listserv is probably too impractical for most of Code4Lib, but Nate 
Hill (who's on this list as well) made this contribution there, which I think 
deserves attention from library coders here.

On Apr 5, 2011, at 11:15 AM, Nate Hill wrote:

 It is awesome that the project Gutenberg stuff is out there, it is a great 
 start.  But libraries aren't using it right.  There's been talk on this list 
 about the changing role of the public library in people's lives, there's been 
 talk about the library brand, and some talk about what 'local' might mean in 
 this context.  I'd suggest that we should find ways to make reading library 
 ebooks feel local and connected to an immediate community.  Brick and mortar 
 library facilities are public spaces, and librarians are proud of that.  We 
 have collections of materials in there, and we host programs and events to 
 give those materials context within the community.  There's something special 
 about watching a child find a good book, and then show it to his  or her 
 friend and talk about how awesome it is.  There's also something special 
 about watching a senior citizens book group get together and discuss a new 
 novel every month.  For some reason, libraries really struggle with treating 
 their digital spaces the same way.
 
 I'd love to see libraries creating online conversations around ebooks in much 
 the same way.  Take a title from project Gutenberg: The Adventures of 
 Huckleberry Finn.  Why not host that book directly on my library website so 
 that it can be found at an intuitive URL, 
 www.sjpl.org/the-adventures-of-huckleberry-finn and then create a forum for 
 it?  The URL itself takes care of the 'local' piece; certainly my most likely 
 visitors will be San Jose residents- especially if other libraries do this 
 same thing.  The brand remains intact, when I launch this web page that holds 
 the book I can promote my library's identity.  The interface is no problem 
 because I can optimize the page to load well on any device and I can link to 
 different formats of the book.  Finally, and most importantly, I've created a 
 local digital space for this book so that people can converse about it via 
 comments, uploaded pictures, video, whatever.  I really think this community 
 conversation and context-creation around materials is a big part of what 
 makes public libraries special.

Eric Hellman
President, Gluejar, Inc.
http://www.gluejar.com/   Gluejar is hiring!

e...@hellman.net 
http://go-to-hellman.blogspot.com/
@gluejar