Re: [CODE4LIB] LCSH and Linked Data

2011-04-21 Thread Kyle Banerjee
The short version of this lengthy post is that there's really no value in
worrying about how to handle precoordinated strings except for purposes of
busting them up.

The Rube Goldberg style precoordination rules that cause so many headaches
were developed to address challenges brought about by paper card catalogs.
The physicality of paper required a mechanism to ensure a limited number of
cards would file together. Unless you still use a paper catalog, they're as
relevant as spurs are to race car drivers.

The order you see in the MARC record mimics the paper rules exactly (because
MARC was used mostly for card printing for decades) and has also lead to
literally tens of millions of unique subject strings as there are so many
permutations.  As a practical matter, even highly trained librarians cannot
guess how these were put together without going through a substantial
research process.

I hate to dig up stuff written in the 1920's that's rammed down the throats
of first semester library school students. However, in the case at hand,
logic from these works has direct application for purposes of making MARC
data usable.

To summarize, the concept is that subjects can be broken down into aspects
(i.e. facets) with the primary ones time, place, action, material, and
personality -- you can think of this last category as natural groupings of
the type that standardized subdivisions can be applied to such as materials,
animals, corporate entitities, diseases, body parts, etc.

It's much better to think of the facets (time, place, etc) as attributes
rather than occuring in any particular order as this allows interactive and
relatively precise drilling through huge amounts of data. You'll notice that
good search engines effectively do just that.

kyle




One of the challenges for pre-coordinated strings at least as currently
 implemented (that facets evade) is that no order will suit everyone. Which
 of the following is better?

 Dwellings $z Australia $x History $y 20th century
 Dwellings $z Indonesia $x Economic aspects
 Dwellings $z Indonesia $x Psychological aspects
 Dwellings $z Indonesia $x Social aspects
 Dwellings $z Ireland $x Economic aspects
 Dwellings $z Ireland $x Psychological aspects
 Dwellings $z Ireland $x Social aspects
 Dwellings $z Japan $x Economic aspects
 Dwellings $z Japan $x Psychological aspects
 Dwellings $z Japan $x Social aspects

 OR (mostly current practice)

 *Dwellings $z Australia $x History $y 20th century  **Current practice
 Dwellings $x Economic aspects $z Indonesia
 Dwellings $x Economic aspects $z Ireland
 Dwellings $x Economic aspects $z Japan
 *Dwellings $x History $z Australia $y 20th century  **Airlie recommendation
 Dwellings $x Psychological aspects $z Indonesia
 Dwellings $x Psychological aspects $z Ireland
 Dwellings $x Psychological aspects $z Japan
 Dwellings $x Social aspects $z Indonesia
 Dwellings $x Social aspects $z Ireland
 Dwellings $x Social aspects $z Japan



Re: [CODE4LIB] LCSH and Linked Data [cataloging]

2011-04-18 Thread Eric Lease Morgan
On Apr 17, 2011, at 10:58 AM, Bill Dueber wrote:

 OK, so I've been trying to follow all of this, and have to say, I'm finding
 it all very interesting. I want to give a special shout-out to the cataloger
 who have joined in; I (and, I think, much of code4lib) need this kind of
 input on a much more regular basis than we've been getting it.


I concur. It is nice to have the balance of traditional cataloging mixed with 
21st Century hacking. At the same time, it behooves some of us Code4Libbers to 
be a part of hardcore mailing lists like AUTOCAT. Think Hillmann's talk at the 
most recent conference.

-- 
Eric Needs To Practice What He Preaches Morgan
University of Notre Dame


Re: [CODE4LIB] LCSH and Linked Data

2011-04-18 Thread Jonathan Rochkind

On 4/17/2011 10:58 AM, Bill Dueber wrote:

At the same time, I'm finding it hard to determine if we're converging on
when trying to turn LCSH into reasonable facets, here's what you need to
do or when trying to turn LCSH into reasonable facets, you've haven't got
a freakin' prayer.  Can someone help me here?


FAST has done it somehow -- turned LCSH into reasonable facets.  But I'm 
not sure if there's a good overview available of how.


reasonable is certainly something of degree too. I don't think you can 
do 'reasonably' with a pretty rough and ready approach. I need to blog 
about the just _few_ things I've done to normalize LCSH for facetting in 
my blacklight-based catalog, which I think gets it pretty close to 
'reasonable'. Heck, some people think just taking the subdivisions on 
marc subfields and splitting em into facets is 'reasonable', although it 
results in many oddities.


But, I think the utlimate answer to your question with full precision 
and based on full knowledge of LCSH is not yet determiend -- unless the 
FAST people have figured it out and can share what they've figured out 
in a useful way.


Re: [CODE4LIB] LCSH and Linked Data

2011-04-18 Thread Simon Spero
For FAST, see Chan and O'Neill (2010).  There are large parts of FAST where
the editors wisely opted to  punt on the more intractable parts.

Simon

Chan, Lois Mai and O'Neill, Ed (2010). FAST, Faceted Application of Subject
Terminology: Principles and Application. Libraries Unlimited. ISBN:
9781591587224

On Mon, Apr 18, 2011 at 11:07 AM, Jonathan Rochkind rochk...@jhu.eduwrote:

 On 4/17/2011 10:58 AM, Bill Dueber wrote:

 At the same time, I'm finding it hard to determine if we're converging on
 when trying to turn LCSH into reasonable facets, here's what you need to
 do or when trying to turn LCSH into reasonable facets, you've haven't
 got
 a freakin' prayer.  Can someone help me here?


 FAST has done it somehow -- turned LCSH into reasonable facets.  But I'm
 not sure if there's a good overview available of how.

 reasonable is certainly something of degree too. I don't think you can do
 'reasonably' with a pretty rough and ready approach. I need to blog about
 the just _few_ things I've done to normalize LCSH for facetting in my
 blacklight-based catalog, which I think gets it pretty close to
 'reasonable'. Heck, some people think just taking the subdivisions on marc
 subfields and splitting em into facets is 'reasonable', although it results
 in many oddities.

 But, I think the utlimate answer to your question with full precision and
 based on full knowledge of LCSH is not yet determiend -- unless the FAST
 people have figured it out and can share what they've figured out in a
 useful way.



Re: [CODE4LIB] LCSH and Linked Data [cataloging]

2011-04-18 Thread Diane I. Hillmann
 Oh jeez, I'm not sure I'd suggest AutoCat.  Even I can't bear that. 
 But the RDA-L list has a fair amount of discussion that still dusts 
off the traditional issues and tries to figure out what sill matters.


Diane Hillmann


On Apr 17, 2011, at 10:58 AM, Bill Dueber wrote:


OK, so I've been trying to follow all of this, and have to say, I'm finding
it all very interesting. I want to give a special shout-out to the cataloger
who have joined in; I (and, I think, much of code4lib) need this kind of
input on a much more regular basis than we've been getting it.


I concur. It is nice to have the balance of traditional cataloging mixed with 
21st Century hacking. At the same time, it behooves some of us Code4Libbers to 
be a part of hardcore mailing lists like AUTOCAT. Think Hillmann's talk at the 
most recent conference.



Re: [CODE4LIB] LCSH and Linked Data

2011-04-18 Thread Kelley McGrath

On Sun, Apr 17, 2011 at 7:40 AM, Simon Spero s...@unc.edu wrote:


The main study on this subject was the Michigan study performed/led 
by Karen
Markey (some reports were written as Karen M. Drabenstott.  The 
final report

of the project is available at
http://deepblue.lib.umich.edu/handle/2027.42/57992 .  The work took 
place in

the mid to late 90s, after  Airlie .

...

The most perplexing results were those that showed that measured
understanding was lower when headings were displayed in the context 
of a
bibliographic record rather than on their own. This indicates either 
a
problem in the measurement process, or an either more fundamental 
problem
with subdivided headings that may so negate the significant 
theoretical
advantages of pre-coordination that the value of the whole practice 
is

thrown in to doubt.


That is fascinating. And disturbing. I don't think I ever read the 
original study, but now I'll have to.


Touching on another topic, I believe that   the movement of 
geographical

subdivisions to follow the right most geographically sub-dividable
subdivision can sometimes be interrupted by the interposition of a 
$x
topical subdivision, but I haven't determined whether this is a 
legacy
exception (the ones that came to mind were related to subtopics of 
the US
Civil War, which seems inevitable given that  the first elements are 
United

States--History--Civil War, 1861-1865--).

I think the key here is partly In 1992, it was decided to adopt that 
order where it could be applied. so LC didn't promise to do them all. 
$x History is probably the biggest one that hasn't been made 
geographically subdividable, but it's hard to say if that's on principle 
or because of practical concerns about the huge amount of disruption 
that would cause in individual systems. It's interesting that some of 
the biggies like economic aspects are more recent.


One of the challenges for pre-coordinated strings at least as currently 
implemented (that facets evade) is that no order will suit everyone. 
Which of the following is better?


Dwellings $z Australia $x History $y 20th century
Dwellings $z Indonesia $x Economic aspects
Dwellings $z Indonesia $x Psychological aspects
Dwellings $z Indonesia $x Social aspects
Dwellings $z Ireland $x Economic aspects
Dwellings $z Ireland $x Psychological aspects
Dwellings $z Ireland $x Social aspects
Dwellings $z Japan $x Economic aspects
Dwellings $z Japan $x Psychological aspects
Dwellings $z Japan $x Social aspects

OR (mostly current practice)

*Dwellings $z Australia $x History $y 20th century  **Current practice
Dwellings $x Economic aspects $z Indonesia
Dwellings $x Economic aspects $z Ireland
Dwellings $x Economic aspects $z Japan
*Dwellings $x History $z Australia $y 20th century  **Airlie 
recommendation

Dwellings $x Psychological aspects $z Indonesia
Dwellings $x Psychological aspects $z Ireland
Dwellings $x Psychological aspects $z Japan
Dwellings $x Social aspects $z Indonesia
Dwellings $x Social aspects $z Ireland
Dwellings $x Social aspects $z Japan

Probably not helpful to have history be an outlier, though.

Kelley


Re: [CODE4LIB] LCSH and Linked Data

2011-04-17 Thread Simon Spero
On Fri, Apr 15, 2011 at 7:21 PM, Kelley McGrath kell...@uoregon.edu wrote:


 It used to be that geographical subdivision was much more flexible and was
 supposed to convey different meanings depending on where it occurred in the
 string. Then there was some research showing that not only did users not
 know how to interpret this, but catalogers did not understand these rules
 and were constructing inconsistent headings.


The main study on this subject was the Michigan study performed/led by Karen
Markey (some reports were written as Karen M. Drabenstott.  The final report
of the project is available at
http://deepblue.lib.umich.edu/handle/2027.42/57992 .  The work took place in
the mid to late 90s, after  Airlie .

This study had serious methodological problems; these became apparent during
the course of the study, and were partly due to the results being so
unexpected. Unfortunately, there have not been any follow up studies at
scale that would correct for these methodological issues.  Some of the
scoring approaches used by the Gleitmans for  Phrase and Paraphrase might
be revealing.

The most perplexing results were those that showed that measured
understanding was lower when headings were displayed in the context of a
bibliographic record rather than on their own. This indicates either a
problem in the measurement process, or an either more fundamental problem
with subdivided headings that may so negate the significant theoretical
advantages of pre-coordination that the value of the whole practice is
thrown in to doubt.
(Incidentally, this year is the diamond anniversary of the  pre- v. post-
debate)

Touching on another topic, I believe that   the movement of geographical
subdivisions to follow the right most geographically sub-dividable
subdivision can sometimes be interrupted by the interposition of a $x
topical subdivision, but I haven't determined whether this is a legacy
exception (the ones that came to mind were related to subtopics of the US
Civil War, which seems inevitable given that  the first elements are United
States--History--Civil War, 1861-1865--).

Simon


Re: [CODE4LIB] LCSH and Linked Data

2011-04-15 Thread Kelley McGrath
A few belated ramblings from a cataloger:

 

1) GEOGRAPHICAL SUBDIVISION

 

It used to be that geographical subdivision was much more flexible and was 
supposed to convey different meanings depending on where it occurred in the 
string. Then there was some research showing that not only did users not know 
how to interpret this, but catalogers did not understand these rules and were 
constructing inconsistent headings. This led to a movement for simplification. 
From LC's Subject Heading Manual:

 

The Subject Subdivisions Conference that took place at Airlie, Virginia, in 
1991 recommended that the standard order of subdivisions be 
[topic]–[place]–[chronology]–[form].  In 1992, it was decided to adopt that 
order where it could be applied. 

 

This leaves a standard order of $a, $b [rare], $x, $z, $y, $v with some 
exceptions.

 

As was pointed out earlier, the current rule is to put the geographic 
subdivision ($$z) as near the end as is legal. This can be mechanically 
determined based on a fixed field in the authority record. Although fixed 
fields in bib records are often unreliable, those in authority records are 
probably as accurate as they can reasonably be made to be, allowing for human 
error. This is both because LC coordinates training and reviews records and 
because the fixed fields are used as decision points so there are short-term 
consequences for later catalogers if they're not done right.

 

The fixed field (008/06) in LCSH authority records that tells you if a 
geographic subdivision can come after the heading 
(http://www.loc.gov/marc/authority/ad008.html). Id.loc.gov doesn't seem to give 
you that info, but it might be nice if it did.

 

650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided 
geographically-indirect] $z England [n  82068148] $x Finance [sh2002007885, Geo 
Subd = # = Not subdivided geographically]

 

650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided 
geographically-indirect] $x Economic aspects [sh 99005484 Geo Subd = i = 
Subdivided geographically-indirect] $z England [n  82068148].

 

One reason not to rely on found order is that LC has been moving in the 
direction of the Airlie House recommendation so in addition to the usual 
mistakes, you'll probably come across a lot of older forms if you take data 
from the wild. For example, until somewhat recently, the economic aspects 
record above looked like the finance one so you'll probably still see records 
like 

 

650 _0 $a Education $z England $x Economic aspects.

 

A) Indirect Subdivision

 

In general, when a heading string starts with a geographic name, it is in 
direct order:

 

651 _0 $a London (England) [n  79005665] $x Economic conditions [sh 99005736].

 

If a geographic name is modifying a topical heading, it is given in indirect 
order:

 

650 _0 $a Education [sh 85040989] $z England $z London [n  79005665; covers 
both $z subfields].

 

Thanks to a project that OCLC did for FAST (which uses only the indirect 
style), in most cases both of these can be extracted from the authority record, 
which will have a 781 with the indirect form added:

 

n  79005665

151  $a London (England)

451  $a Londinium (England)

...

781 0 $z England $z London

 

Some records (usually for geographic areas within cities) cannot be used to 
modify topical headings, but can be used in 651$a as the main term in a heading 
string. There are identified by a note and lack of 781.

 

n  85192245

151  $a Hackney (London, England)

667  $a SUBJECT USAGE: This heading is not valid for use as a geographic 
subdivision.

 

B) Geographic Entities and Name vs. Subject Headings

 

Notice that in the above example, the control number/identifier for Education 
starts with sh while the one for London starts with n. This is an important 
distinction. Heading identifiers that start with sh are LCSH terms found in the 
subject authority file and are available from id.loc.gov. I think these all 
fall into FRBR's group 3 bib entities. Heading identifiers that start with n 
are stored in the LC NAF (Name Authority File) and are not available as linked 
data. These are the FRBR group 1 and 2 entities and maybe some from group 3. 
Most of these can also be used as subjects in LCSH. So you can't actually get 
at all the building blocks of LCSH strings nor use linked data for all subjects.

 

Named geographic features (e.g., mountains, lakes, continents) are established 
in the subject authority file using the rules in the Subject Cataloging Manual 
for LCSH. The headings are tagged 151 and can be found at id.loc.gov.

 

sh 85082617 

151  $a McKinley, Mount (Alaska)

 

sh 85044620 

151  $a Erie, Lake

 

sh 85008606

151 $a Asia

 

Geographic features appear in bib records only as 651 or 650+ $z subject terms.

 

Jurisdiction names (e.g., cities, states, countries) are established in the 
name authority file using descriptive cataloging rules (e.g., AACR2 ch 23 and 
the NACO Participants' Manual).  They 

Re: [CODE4LIB] LCSH and Linked Data

2011-04-15 Thread Ross Singer
On Fri, Apr 15, 2011 at 7:21 PM, Kelley McGrath kell...@uoregon.edu wrote:

 I’m sure this is way too much info for most (or all) on this list, but in 
 case it is helpful, I thought I’d throw it out there.

I disagree.  I think this was fantastic and most enlightening.  Most
of us deal with this stuff all the time, yet we (obviously) have zero
idea how it actually works, so it's nice to be schooled (and have this
mini-lesson in LCSH contextually in the mailing list archives).

Thanks for putting this out there, Kelley.
-Ross.


Re: [CODE4LIB] LCSH and Linked Data

2011-04-11 Thread Kyle Banerjee
 There is a lot of redundant data in MARC that is an encoded form of
 something that elsewhere is expressed as text -- somewhat controlled text,
 but text Much of this redundant input (think of the time!) could
 be eliminated if we quit keying text strings but allowed the display to
 derive from the coded data.

 ..because it does not get input consistently, it's hard to base any
 functionality on it since that functionality would apply only to a somewhat
 random subset of the records in the database.


The reality with fixed fields is that few are used by *any* system. That
provides a disincentive to spend loads of time (i.e. money) mucking about
with them, particularly since they lack expressivity and practical use cases
are not that compelling.

In any case, even if everything suddenly started getting entered
consistently today, you'd still have to deal with all the legacy data.
Cataloging practices change. For example, the form subdivisions mentioned
early in this thread have only been stored in |v for a few years now.
Thoroughness of records is highly variable.

This means that systems need to be built around the assumption that data are
only somewhat consistent. As a result parsing and normalizing text is a far
more realistic approach than messing with fixed fields.

kyle


Re: [CODE4LIB] LCSH and Linked Data

2011-04-10 Thread Karen Coyle

Quoting Ross Singer rossfsin...@gmail.com:




Yeah, this could get ugly pretty fast.  It's a bit unclear to me what
the distinction is between identical terms in both the geographic
areas and the country codes
(http://id.loc.gov/vocabulary/geographicAreas/e-uk-en 
http://id.loc.gov/vocabulary/countries/enk).  Well, in LC's current
representation, there *is* no distinction, they're both just
skos:Concepts that (by virtue of skos:exactMatch) effectively
interchangeable.


The distinction is MARC-based. There is a lot of redundant data in  
MARC that is an encoded form of something that elsewhere is expressed  
as text -- somewhat controlled text, but text. The geographic area  
code is input in the coded data area of MARC (0XX) to make up for  
the fact that figuring out a geographic area from LC subject headings  
is difficult. This is not unlike having publication dates as text in  
the 260 $c and again in a fixed format in the 008 field. Much of this  
redundant input (think of the time!) could be eliminated if we quit  
keying text strings but allowed the display to derive from the coded  
data.


The existence of all of the coded data fields in MARC is proof that  
there is some consciousness that text is not sufficient for some of  
the functionality that we would like to have in our systems.  
Unfortunately, because the coded data is not human-friendly AND is  
redundant, it does not get input consistently. And because it does not  
get input consistently, it's hard to base any functionality on it  
since that functionality would apply only to a somewhat random subset  
of the records in the database. So... here we are.


kc



See also http://id.loc.gov/vocabulary/geographicAreas/fa and
http://id.loc.gov/authorities/sh85009230#concept.  You have a single
institution minting multiple URIs for what is effectively the same
thing (albeit in different vocabularies), although, ironically,
nothing points at any actual real world objects.

VIAF doesn't do much better in this particular case (there are lots of
examples where it does, mind you):  http://viaf.org/viaf/142995804
(see: http://viaf.org/viaf/142995804/rdf.xml).  We have all of these
triangulations around the concept of England or Atlas mountains,
but we can't actually refer to England or the Atlas mountains.

Also, I am not somehow above this problem, either.  With the linked
MARC codes lists (http://purl.org/NET/marccodes/), I had to make a
similar decision, I just chose to go the opposite route:  define them
as things, rather than concepts
(http://purl.org/NET/marccodes/gacs/fa#location,
http://purl.org/NET/marccodes/gacs/e-uk-en#location,
http://purl.org/NET/marccodes/countries/enk#location, etc.), which
presents its own set of problems
(http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing
no matter how liberal your definition).

At some point, it's worth addressing what these things actually *are*
and if, indeed, they are effectively the same thing, if it's worth
preserving these redundancies, because I think they'll cause grief in
the future.

-Ross.





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] LCSH and Linked Data

2011-04-10 Thread Ya'aqov Ziso
Karen Miller works at Northwestern University where an authorities librarian
has been maintaining, to the dot, the authority related records (headings,
subdivisions, encoding, etc.) for over 20 years. If a cataloger there makes
a mistake, that will be fixed by the refined set of procedures run
consistently on their bibliographic vis-a-vis authorities files. There is no
other such institution catalog in the US. LC have often invited that
authorities librarian to come fix their collection as well.

Bill, which new evidence have you found for almost rupture or depression
regarding reflection on geographic names?

The fact AUTOCAT librarians started to assist our discussions is in fact
grounds for rapture (pun intended) as we improve analysis, *Ya'aqov*


*
*
*
*
*
*
*On Fri, Apr 8, 2011 at 5:07 PM, Bill Dueber b...@dueber.com wrote:
*

 *2011/4/8 Karen Miller k-mill...@northwestern.edu

  I hope I'm not pointing out the obvious,


 *
 *That made me laugh so hard I almost ruptured something.

 Thank you so much for such a complete (please, god, tell me
 it's complete...) explanation. It's a little depressing, but at least now I
 now why I'm depressed :-)


 --
 Bill Dueber
 Library Systems Programmer
 University of Michigan Library
 *

*

**
*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Till Kinstler
Am 07.04.2011 17:44, schrieb Ford, Kevin:

 Actually, it appears to depend on whose Authority record you're looking at.  
 The Canadians, Australians, and Israelis have it as a CorporateName (110), as 
 do the French (210 - unimarc); LC and the Germans say it's a Geographic Name.

No, the original England record linked to VIAF in the German GND says
it is a Gebietskörperschaft, which is a corporate body in English.
See http://d-nb.info/gnd/15138-5/about/html and the RDF representation
at http://d-nb.info/gnd/15138-5/about/rdf
Perhaps something went wrong in the mapping of the German authority
record to MARC21, so England got into the 151 (or there might be good
reasons to do it that way, ask metadata experts...). The original record
is not maintained in MARC21, we don't do MARC21 (or any MARC at all)
here, we are just starting to switch to it as future(!) exchange
format... :-).
Sorry for being pedantic, early morning and not enough coffee yet...

Till

-- 
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Owen Stephens
Thanks for all the information and discussion.

I don't think I'm familiar enough with Authority file formats to completely
comprehend - but I certainly understand the issues around the question of
'place' vs 'histo-geo-poltical entity'. Some of this makes me worry about
the immediate applicability of the LC Authority files in the Linked Data
space - someone said to me recently 'SKOS is just a way of avoiding dealing
with the real semantics' :)

Anyway - putting that to one side, the simplest approach for me at the
moment seems to only look at authorised LCSH as represented on id.loc.gov.
Picking up on Andy's first response:

On Thu, Apr 7, 2011 at 3:46 PM, Houghton,Andrew hough...@oclc.org wrote:

 After having done numerous matching and mapping projects, there are some
 issues that you will face with your strategy, assuming I understand it
 correctly. Trying to match a heading starting at the left most subfield and
 working forward will not necessarily produce correct results when matching
 against the LCSH authority file. Using your example:



 650 _0 $a Education $z England $x Finance



 is a good example of why processing the heading starting at the left will
 not necessarily produce the correct results.  Assuming I understand your
 proposal you would first search for:



 150 __ $a Education



 and find the heading with LCCN sh85040989. Next you would look for:



 181 __ $z England



 and you would NOT find this heading in LCSH.


OK - ignoring the question of where the best place to look for this is - I
can live with not matching it for now. Later (perhaps when I understand it
better, or when these headings are added to id.loc.gov we can revisit this)


 The second issue using your example is that you want to find the “longest”
 matching heading. While the pieces parts are there, so is the enumerated
 authority heading:



 150 __ $a Education $z England



 as LCCN sh2008102746. So your heading is actually composed of the
 enumerated headings:



 sh2008102746150 __ $a Education $z England

 sh2002007885180 __ $x Finance



 and not the separate headings:



 sh85040989 150 __ $a Education

 n82068148   150 __ $a England

 sh2002007885180 __ $x Finance



 Although one could argue that either analysis is correct depending upon
 what you are trying to accomplish.




What I'm interested in is representing the data as RDF/Linked Data in a way
that opens up the best opportunities for both understanding and querying the
data. Unfortunately at the moment there isn't a good way of representing
LCSH directly in RDF (the MADS work may help I guess but to be honest at the
moment I see that as overly complex - but that's another discussion).

What I can do is make statements that an item is 'about' a subject (probably
using dc:subject) and then point at an id.loc.gov URI. However, if I only
express individual headings:
Education
England (natch)
Finance

Then obviously I lose the context of the full heading - so I also want to
look for
Education--England--Finance (which I won't find on id.loc.gov as not
authorised)

At this point I could stop, but my feeling is that it is useful to also look
for other combinations of the terms:

Education--England (not authorised)
Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008)

My theory is that as long as I stick to combinations that start with a
topical term I'm not going to make startlingly inaccurate statements?


 The matching algorithm I have used in the past contains two routines. The
 first f(a) will accept a heading as a parameter, scrub the heading, e.g.,
 remove unnecessary subfield like $0, $3, $6, $8, etc. and do any other
 pre-processing necessary on the heading, then call the second function f(b).
 The f(b) function accepts a heading as a parameter and recursively calls
 itself until it builds up the list LCCNs that comprise the heading. It first
 looks for the given heading when it doesn’t find it, it removes the **last
 ** subfield and recursively calls itself, otherwise it appends the found
 LCCN to the returned list and exits. This strategy will find the longest
 match.


Unless I've misunderstood this, this strategy would not find
'Education--Finance'? Instead I need to remove each *subdivision* in turn
(no matter where it appears in the heading order) and try all possible
combinations checking each for a match on id.loc.gov. Again, I can do this
without worrying about possible invalid headings, as these wouldn't have
been authorised anyway...

I can check the number of variations around this but I guess that in my
limited set of records (only 30k) there will be a relatively small number of
possible patterns to check.

Does that make sense?


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Ross Singer
On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens o...@ostephens.com wrote:

 Then obviously I lose the context of the full heading - so I also want to
 look for
 Education--England--Finance (which I won't find on id.loc.gov as not
 authorised)

 At this point I could stop, but my feeling is that it is useful to also look
 for other combinations of the terms:

 Education--England (not authorised)
 Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008)

 My theory is that as long as I stick to combinations that start with a
 topical term I'm not going to make startlingly inaccurate statements?

I would definitely ask this question somewhere other than Code4lib
(autocat, maybe?), since I think the answer is more complicated than
this (although they could validate/invalidate your assumption about
whether or not this approach would get you close enough).

My understanding is that Education--England--Finance *is* authorized,
because Education--Finance is and England is a free-floating
geographic subdivision.  Because it's also an authorized heading,
Education--England--Finance is, in fact, an authority.  The problem
is that free-floating subdivisions cause an almost infinite number of
permutations, so there aren't LCCNs issued for them.

This is where things get super-wonky.  It's also the reason I
initially created lcsubjects.org, specifically to give these (and,
ideally, locally controlled subject headings) a publishing
platform/centralized repository, but it quickly grew to be more than
just a side project.  There were issues of how the data would be
constructed (esp. since, at the time, I had no access to the NAF), how
to reconcile changes, provenance, etc.  Add to the fact that 2 years
ago, there wasn't much linked library data going on, it was really
hard to justify the effort.

But, yeah, it would be worth running your ideas by a few catalogers to
see what they think.

-Ross.


[CODE4LIB] LCSH and Linked Data / Ross

2011-04-08 Thread Ya'aqov Ziso
*Hi and thank you Ross, Jonathan, and Andy,

I do wish someone from LC would answer Jonathan's questions for all codes
and geographic subdivision or subject implications. There's so much
self-inflicted pain I can go through trying to revive my cataloging days.
Here are some clarifications though:

List of Geographic Areas is the macro list, whereby List of countries
includes only countries as a subset from the macro list.

MARC Code List for Countries [choice of a MARC code is generally related to
information in field 260 (Publication, Distribution, etc. (Imprint)).  The
code recorded in 008/15-17 is used in conjunction with field 044 (Country of
Producer Code) when more than one code is appropriate to an item.]

MARC Geographic Area Codes are codes entered (according to geographic names
in the 6xx fields) in field 043.*
*
*
*The Country Codes and Geographic Area Codes are entered bureaucratically,
bypassing Jonathan's refined distinctions. These tasks are outsourced to
agencies separate from the catalogers assigning LCSH.*
*
*
*Now it starts getting uglier, since upkeep for these lists differs in time
and agency. Possibly new territory names are done now by NATO ... You would
expect to see the same name in a code list and in a geographic name (151) .
Sometimes you won't. Sometimes you'll see redundancies which confuse even
more.

So since:*

   1. *LCSH has mistakes, inconsistencies*
   2. *LC doesn't talk to CODE4LIB to answer our questions*
   3. *OCLC will not talk to LC on our behalf*

*we can create the geographic name list(s) we need. Since we know that 6xx
forms for geographic names appear in 151 and 781 fields, we can create an
index for those names for matching to 6xx in LCSH. Andrew, please
complete/comment-on this list.*
*
*
*Ya'aqov*
*
*
*



*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Bill Dueber
On Fri, Apr 8, 2011 at 10:10 AM, Ross Singer rossfsin...@gmail.com wrote:

 But, yeah, it would be worth running your ideas by a few catalogers to
 see what they think.



And if anyone does this...please please *please* write it up!

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Owen Stephens
Thanks Ross - I have been pushing some cataloguing folk to comment on some
of this as well (and have some feedback) - but I take the point that wider
consultation via autocat could be a good idea. (for some reason this makes
me slightly nervous!)s

In terms of whether Education--England--Finance is authorised or not - I
think I took from Andy's response that it wasn't, but also looking at it on
authorities.loc.gov it isn't marked as 'authorised'. Anyway - the relevant
thing for me at this stage is that I won't find a match via id.loc.gov - so
I can't get a URI for it anyway.

There are clearly quite a few issues with interacting with LCSH as Linked
Data at the moment - I'm not that keen on how this currently works, and my
reaction to the MADS/RDF ontology is similar to that of Bruce D'Arcus (see
http://metadata.posterous.com/lcs-madsrdf-ontology-and-the-future-of-the-se),
but on the otherhand I want to embrace the opportunity to start joining some
stuff up and seeing what happens :)

Owen

On Fri, Apr 8, 2011 at 3:10 PM, Ross Singer rossfsin...@gmail.com wrote:

 On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens o...@ostephens.com wrote:

  Then obviously I lose the context of the full heading - so I also want to
  look for
  Education--England--Finance (which I won't find on id.loc.gov as not
  authorised)
 
  At this point I could stop, but my feeling is that it is useful to also
 look
  for other combinations of the terms:
 
  Education--England (not authorised)
  Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008
 )
 
  My theory is that as long as I stick to combinations that start with a
  topical term I'm not going to make startlingly inaccurate statements?

 I would definitely ask this question somewhere other than Code4lib
 (autocat, maybe?), since I think the answer is more complicated than
 this (although they could validate/invalidate your assumption about
 whether or not this approach would get you close enough).

 My understanding is that Education--England--Finance *is* authorized,
 because Education--Finance is and England is a free-floating
 geographic subdivision.  Because it's also an authorized heading,
 Education--England--Finance is, in fact, an authority.  The problem
 is that free-floating subdivisions cause an almost infinite number of
 permutations, so there aren't LCCNs issued for them.

 This is where things get super-wonky.  It's also the reason I
 initially created lcsubjects.org, specifically to give these (and,
 ideally, locally controlled subject headings) a publishing
 platform/centralized repository, but it quickly grew to be more than
 just a side project.  There were issues of how the data would be
 constructed (esp. since, at the time, I had no access to the NAF), how
 to reconcile changes, provenance, etc.  Add to the fact that 2 years
 ago, there wasn't much linked library data going on, it was really
 hard to justify the effort.

 But, yeah, it would be worth running your ideas by a few catalogers to
 see what they think.

 -Ross.




-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Shirley Lincicum
I'm a cataloger who has been following this discussion with interest,
but not necessarily understanding all of it. I'll try to add what I
can regarding the rules for constructing LCSH headings.

 My understanding is that Education--England--Finance *is* authorized,
 because Education--Finance is and England is a free-floating
 geographic subdivision.  Because it's also an authorized heading,
 Education--England--Finance is, in fact, an authority.  The problem
 is that free-floating subdivisions cause an almost infinite number of
 permutations, so there aren't LCCNs issued for them.

Ross is essentially correct. Education is an authorized subject term
that can be subdivided geographically. Finance is a free-floating
subdivision that is authorized for use under subject terms that
conform to parameters given in the scope notes in its authority record
(680 fields), but it cannot be subdivided geographically. England is
an authorized geographic subject term that can be added to any heading
that can be subdivided geographically. Thus, Education -- England --
Finance is a valid LCSH heading, whereas Education -- Finance --
England would not be. This is wonky, and it's stuff like this that
makes LCSH so unwieldy and difficult to validate, even for humans who
actually have the capacity to learn and adjust to all of the various
inconsistencies.

I don't know how relevant it is to this particular discussion, but
going forward I'm not sure how important it is to validate LCSH
headings. I really appreciate developers who seek to preserve the
semantic relationships present in the headings as much as possible; I
believe many of them have value. But aren't there ways to
preserve/extract that value without getting too bogged down in the
inconsistent left-to-right structure of the existing headings?

I hope this helps, at least a little bit. I'd be happy to answer
additional questions.

Shirley

Shirley Lincicum
Frustrated Cataloger


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Bill Dueber
On Fri, Apr 8, 2011 at 1:50 PM, Shirley Lincicum shirley.linci...@gmail.com
 wrote:

 Ross is essentially correct. Education is an authorized subject term
 that can be subdivided geographically. Finance is a free-floating
 subdivision that is authorized for use under subject terms that
 conform to parameters given in the scope notes in its authority record
 (680 fields), but it cannot be subdivided geographically. England is
 an authorized geographic subject term that can be added to any heading
 that can be subdivided geographically.


Wait, so is it possible to know if England means the free-floating
geographic entity or the country? Or is that just plain unknowable.

Suddenly, my mouth is hungering for something gun-flavored.

I know OCLC did some work trying to dis-integrate different types of terms
with the FAST stuff, but it's not clear to me how I can leverage that (or
anything else) to make LCSH at all useful as a search target or (even
better) facet.  Has anyone done anything with it?


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Karen Miller
OK, as a cataloger who has been confused by the jurisdictional/place name
distinction, I'm going to jump in here. 

Whether England means the free-floating geographic entity or the country
is not quite unknowable -- it depends on the MARC codes that accompany it. 

The brief answer is this: a field used in a 651$a or a $z should match a 151
in the LC authorities.

If the MARC field is 151 or 651 (let's just say x51), then the $a should
match a 151 in the authority file.
MARC subfield z ($z) is always a geographic subdivision and should match a
151.

Here's where it gets tricky: 
If the MARC field is a x10 (110, 610, 710 – corporate bodies), then the $a
should match a 110 or a 151 in the authority file. If the first indicator of
such a MARC field is a 1, then it will probably match a 151 – first
indicator 1 means that a heading is jurisdictional and may match a 151.

For  example:

110 1_ United States. ‡b Dept. of Agriculture

There is a 

151 United States 

in the LC authorities, but no 

110 United States

yet it can be used as a corporate body name in a bib. record with a 110
field. 

This is further confused by the VIAF, in which some national libraries have
established the United States as a corporate body (110).

At the risk of confusing things, I'd suggest looking at countries like the
United States, Kenya or Canada as examples. England is not a great example
because it's not a current jurisdiction name - there is a note in the LC
authority record that reads Heading for England valid as a jurisdiction
before 1536 only. Use (England) as qualifier for places (23.4D) and for
nongovernment bodies (24.4C2). It is established as a 110 because it *used
to be* a jurisdiction name and would be valid for works issued by the
government prior to 1536. Obviously this note is of no use to a machine, but
it explains why we aren't seeing it used as a jurisdiction (a corporate
body) with subordinate bodies.

I hope I'm not pointing out the obvious, but the use of names that appear in
151 fields in the authority file as 110 fields in bibliographic records
confused me for a very long time; our authorities librarian explained it to
me at least twice before the proverbial light bulb went on for me. 

Karen

Karen D. Miller
Monographic/Digital Projects Cataloger
Bibliographic Services Dept.
Northwestern University Library
Evanston, IL 
k-mill...@northwestern.edu
847-467-3462


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill
Dueber
Sent: Friday, April 08, 2011 1:40 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] LCSH and Linked Data

On Fri, Apr 8, 2011 at 1:50 PM, Shirley Lincicum shirley.linci...@gmail.com
 wrote:

 Ross is essentially correct. Education is an authorized subject term
 that can be subdivided geographically. Finance is a free-floating
 subdivision that is authorized for use under subject terms that
 conform to parameters given in the scope notes in its authority record
 (680 fields), but it cannot be subdivided geographically. England is
 an authorized geographic subject term that can be added to any heading
 that can be subdivided geographically.


Wait, so is it possible to know if England means the free-floating
geographic entity or the country? Or is that just plain unknowable.

Suddenly, my mouth is hungering for something gun-flavored.

I know OCLC did some work trying to dis-integrate different types of terms
with the FAST stuff, but it's not clear to me how I can leverage that (or
anything else) to make LCSH at all useful as a search target or (even
better) facet.  Has anyone done anything with it?


Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Bill Dueber
2011/4/8 Karen Miller k-mill...@northwestern.edu

 I hope I'm not pointing out the obvious,


That made me laugh so hard I almost ruptured something.

Thank you so much for such a complete (please, god, tell me it's
complete...) explanation. It's a little depressing, but at least now I now
why I'm depressed :-)


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


[CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
We are working on converting some MARC library records to RDF, and looking
at how we handle links to LCSH (id.loc.gov) - and I'm looking for feedback
on how we are proposing to do this...

I'm not 100% confident about the approach, and to some extent I'm trying to
work around the nature of how LCSH interacts with RDF at the moment I
guess... but here goes - I would very much appreciate
feedback/criticism/being told why what I'm proposing is wrong:

I guess what I want to do is preserve aspects of the faceted nature of LCSH
in a useful way, give useful links back to id.loc.gov where possible, and
give access to a wide range of facets on which the data set could be
queried. Because of this I'm proposing not just expressing the whole of the
650 field as a LCSH and checking for it's existence on id.loc.gov, but also
checking for various combinations of topical term and subdivisions from the
650 field. So for any 650 field I'm proposing we should check on
id.loc.govfor labels matching:

check(650$$a) -- topical term
check(650$$b) -- topical term
check(650$$v) -- Form subdivision
check(650$$x) -- General subdivision
check(650$$y) -- Chronological subdivision
check(650$$z) -- Geographic subdivision

Then using whichever elements exist (all as topical terms):
Check(650$$a--650$$b)
Check(650$$a--650$$v)
Check(650$$a--650$$x)
Check(650$$a--650$$y)
Check(650$$a--650$$z)
Check(650$$a--650$$b--650$$v)
Check(650$$a--650$$b--650$$x)
Check(650$$a--650$$b--650$$y)
Check(650$$a--650$$b--650$$z)
Check(650$$a--650$$b--650$$x--650$$v)
Check(650$$a--650$$b--650$$x--650$$y)
Check(650$$a--650$$b--650$$x--650$$z)
Check(650$$a--650$$b--650$$x--650$$z--650$$v)
Check(650$$a--650$$b--650$$x--650$$z--650$$y)
Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v)


As an example given:

650 00 $$aPopular music$$xHistory$$y20th century

We would be checking id.loc.gov for

'Popular music' as a topical term (http://id.loc.gov/authorities/sh85088865)
'History' as a general subdivision (http://id.loc.gov/authorities/sh99005024
)
'20th century' as a chronological subdivision (
http://id.loc.gov/authorities/sh2002012476)
'Popular music--History and criticism' as a topical term (
http://id.loc.gov/authorities/sh2008109787)
'Popular music--20th century' as a topical term (not authorised)
'Popular music--History and criticism--20th century' as a topical term (not
authorised)


And expressing all matches in our RDF.

My understanding of LCSH isn't what it might be - but the ordering of terms
in the combined string checking is based on what I understand to be the
usual order - is this correct, and should we be checking for alternative
orderings?

Thanks

Owen


-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
Thanks Tom - very helpful

Perhaps this suggests that rather using an order we should check
combinations while preserving the order of the original 650 field (I assume
this should in theory be correct always - or at least done to the best of
the cataloguers knowledge)?

So for:

650 _0 $$a Education $$z England $$x Finance.

check:

Education
England (subdiv)
Finance (subdiv)
Education--England
Education--Finance
Education--England--Finance

While for 650 _0 $$a Education $$x Economic aspects $$z England we check

Education
Economic aspects (subdiv)
England (subdiv)
Education--Economic aspects
Education--England
Education--Economic aspects--England


 - It is possible for other orders in special circumstances, e.g. with
 language dictionaries which can go something like:

 650 _0 $$a English language $$v Dictionaries $$x Albanian.


This possiblity would also covered by preserving the order - check:

English Language
Dictionaries (subdiv)
Albanian (subdiv)
English Language--Dictionaries
English Language--Albanian
English Language--Dictionaries-Albanian

Creating possibly invalid headings isn't necessarily a problem - as we won't
get a match on id.loc.gov anyway. (Instinctively English Language--Albanian
doesn't feel right)



 - Some of these are repeatable, so you can have too $$vs following each
 other (e.g. Biography--Dictionaries); two $$zs (very common), as in
 Education--England--London; two $xs (e.g. Biography--History and criticism).

 OK - that's fine, we can use each individually and in combination for any
repeated headings I think


 - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of
 them in the database?

 Hadn't checked until you asked! We have 1 in the dataset in question (c.30k
records) :)


 I'm not sure how possible it would be to come up with a definitive list of
 (reasonable) possible combinations.

 You are probably right - but I'm not too bothered about aiming at
'definitive' at this stage anyway - but I do want to get something
relatively functional/useful


 Tom

 Thomas Meehan
 Head of Current Cataloguing
 University College London Library Services

 Owen Stephens wrote:

 We are working on converting some MARC library records to RDF, and looking
 at how we handle links to LCSH (id.loc.gov http://id.loc.gov) - and I'm
 looking for feedback on how we are proposing to do this...


 I'm not 100% confident about the approach, and to some extent I'm trying
 to work around the nature of how LCSH interacts with RDF at the moment I
 guess... but here goes - I would very much appreciate
 feedback/criticism/being told why what I'm proposing is wrong:

 I guess what I want to do is preserve aspects of the faceted nature of
 LCSH in a useful way, give useful links back to id.loc.gov 
 http://id.loc.gov where possible, and give access to a wide range of
 facets on which the data set could be queried. Because of this I'm proposing
 not just expressing the whole of the 650 field as a LCSH and checking for
 it's existence on id.loc.gov http://id.loc.gov, but also checking for
 various combinations of topical term and subdivisions from the 650 field. So
 for any 650 field I'm proposing we should check on id.loc.gov 
 http://id.loc.gov for labels matching:


 check(650$$a) -- topical term
 check(650$$b) -- topical term
 check(650$$v) -- Form subdivision
 check(650$$x) -- General subdivision
 check(650$$y) -- Chronological subdivision
 check(650$$z) -- Geographic subdivision

 Then using whichever elements exist (all as topical terms):
 Check(650$$a--650$$b)
 Check(650$$a--650$$v)
 Check(650$$a--650$$x)
 Check(650$$a--650$$y)
 Check(650$$a--650$$z)
 Check(650$$a--650$$b--650$$v)
 Check(650$$a--650$$b--650$$x)
 Check(650$$a--650$$b--650$$y)
 Check(650$$a--650$$b--650$$z)
 Check(650$$a--650$$b--650$$x--650$$v)
 Check(650$$a--650$$b--650$$x--650$$y)
 Check(650$$a--650$$b--650$$x--650$$z)
 Check(650$$a--650$$b--650$$x--650$$z--650$$v)
 Check(650$$a--650$$b--650$$x--650$$z--650$$y)
 Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v)


 As an example given:

 650 00 $$aPopular music$$xHistory$$y20th century

 We would be checking id.loc.gov http://id.loc.gov for


 'Popular music' as a topical term (
 http://id.loc.gov/authorities/sh85088865)
 'History' as a general subdivision (
 http://id.loc.gov/authorities/sh99005024)
 '20th century' as a chronological subdivision (
 http://id.loc.gov/authorities/sh2002012476)
 'Popular music--History and criticism' as a topical term (
 http://id.loc.gov/authorities/sh2008109787)
 'Popular music--20th century' as a topical term (not authorised)
 'Popular music--History and criticism--20th century' as a topical term
 (not authorised)


 And expressing all matches in our RDF.

 My understanding of LCSH isn't what it might be - but the ordering of
 terms in the combined string checking is based on what I understand to be
 the usual order - is this correct, and should we be checking for alternative
 orderings?

 Thanks

 Owen


 --
 Owen 

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
*... Creating possibly invalid headings isn't necessarily a problem - as we
won't get a match on id.loc.gov anyway ...


*LCSH headings reflect materials cataloged by LC. You may have materials at
your UK (or Albania, Tunisia, etc.) which were not cataloged yet at LC, thus
nothing yet to match on.
*Ya'aqov*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
After having done numerous matching and mapping projects, there are some issues 
that you will face with your strategy, assuming I understand it correctly. 
Trying to match a heading starting at the left most subfield and working 
forward will not necessarily produce correct results when matching against the 
LCSH authority file. Using your example:

 

650 _0 $a Education $z England $x Finance

 

is a good example of why processing the heading starting at the left will not 
necessarily produce the correct results.  Assuming I understand your proposal 
you would first search for:

 

150 __ $a Education

 

and find the heading with LCCN sh85040989. Next you would look for:

 

181 __ $z England

 

and you would NOT find this heading in LCSH. This is issue one. Unfortunately, 
LC does not create 181 in LCSH (actually I think there are some, but not if 
it’s a name), instead they create a 781 in the name authority record. So to 
find the corresponding $z England we need to go to the name authority record 
150 England with LCCN n82068148. Currently under id.loc.gov you will not find 
name authority records, but you can find them at viaf.org. The second issue 
using your example is that you want to find the “longest” matching heading. 
While the pieces parts are there, so is the enumerated authority heading:

 

150 __ $a Education $z England

 

as LCCN sh2008102746. So your heading is actually composed of the enumerated 
headings:

 

sh2008102746150 __ $a Education $z England

sh2002007885180 __ $x Finance

 

and not the separate headings:

 

sh85040989 150 __ $a Education

n82068148   150 __ $a England

sh2002007885180 __ $x Finance

 

Although one could argue that either analysis is correct depending upon what 
you are trying to accomplish.

 

The matching algorithm I have used in the past contains two routines. The first 
f(a) will accept a heading as a parameter, scrub the heading, e.g., remove 
unnecessary subfield like $0, $3, $6, $8, etc. and do any other pre-processing 
necessary on the heading, then call the second function f(b). The f(b) function 
accepts a heading as a parameter and recursively calls itself until it builds 
up the list LCCNs that comprise the heading. It first looks for the given 
heading when it doesn’t find it, it removes the *last* subfield and recursively 
calls itself, otherwise it appends the found LCCN to the returned list and 
exits. This strategy will find the longest match. The headings are search 
against an augmented LCSH database where the 781 name authority records have 
been transformed into 181 records keeping the LCCN of the name authority 
record. Not ideal, but it generally works well. Adjust algorithm per need.

 

Hope this helps, Andy.

 

 

From: public-lld-requ...@w3.org [mailto:public-lld-requ...@w3.org] On Behalf Of 
Owen Stephens
Sent: Thursday, April 07, 2011 08:11
To: Thomas Meehan
Cc: Code for Libraries; public-lld; f.zabl...@open.ac.uk
Subject: Re: LCSH and Linked Data
Importance: Low

 

Thanks Tom - very helpful

Perhaps this suggests that rather using an order we should check combinations 
while preserving the order of the original 650 field (I assume this should in 
theory be correct always - or at least done to the best of the cataloguers 
knowledge)?

 

So for:

 

650 _0 $$a Education $$z England $$x Finance.

 

check:

 

Education

England (subdiv)

Finance (subdiv)

Education--England

Education--Finance

Education--England--Finance

 

While for 650 _0 $$a Education $$x Economic aspects $$z England we check

 

Education

Economic aspects (subdiv)

England (subdiv)

Education--Economic aspects

Education--England

Education--Economic aspects--England


- It is possible for other orders in special circumstances, e.g. with 
language dictionaries which can go something like:

650 _0 $$a English language $$v Dictionaries $$x Albanian.

 

This possiblity would also covered by preserving the order - check:

 

English Language

Dictionaries (subdiv)

Albanian (subdiv)

English Language--Dictionaries

English Language--Albanian

English Language--Dictionaries-Albanian

 

Creating possibly invalid headings isn't necessarily a problem - as we won't 
get a match on id.loc.gov anyway. (Instinctively English Language--Albanian 
doesn't feel right)

 


- Some of these are repeatable, so you can have too $$vs following each 
other (e.g. Biography--Dictionaries); two $$zs (very common), as in 
Education--England--London; two $xs (e.g. Biography--History and criticism).

OK - that's fine, we can use each individually and in combination for any 
repeated headings I think

 

- I'm not I've ever come across a lot of $$bs in 650s. Do you have a 
lot of them in the database?

Hadn't checked until you asked! We have 1 in the dataset in question (c.30k 
records) :)

 

I'm not sure how possible it would be to come up with a definitive list 
of (reasonable) 

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Andrew, please see *[YZ]* below

*181 __ $z England  and you would NOT find this heading in LCSH. This is
issue one. Unfortunately, LC does not create 181 in LCSH (actually I think
there are some, but not if it’s a name), instead they create a 781 in the
name authority record. *
*[YZ]*  MARC/LCSH distinguishes between names 100 and geographic names 151
in their authority record. You'll find all geographic names if you look for
151 records.

*So to find the corresponding $z England we need to go to the name authority
record 150 England with LCCN n82068148.*
*[YZ]*  *LCCN n82068148* authority record is  for 151 England.
Also Andrew, are you indicating there is a difference between the form of
geographic name in 151$a and 781$z   -- ?

*Currently under id.loc.gov you will not find name authority records, but
you can find them at viaf.org*.
*[YZ]*  viaf.org does not include geographic names. I just checked there
England. makes little sense to mix personal/corporate names with geographic
ones. Let's see what Ralph comments.

*Ya'aqov*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
Still digesting Andrew's response (thanks Andrew), but

On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com wrote:

 *Currently under id.loc.gov you will not find name authority records, but
 you can find them at viaf.org*.
 *[YZ]*  viaf.org does not include geographic names. I just checked there
 England.


Is this not the relevant VIAF entry
http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804


-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread LeVan,Ralph
If you look at the fields those names come from, I think they mean
England as a corporation, not England as a place.

Ralph

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Owen Stephens
 Sent: Thursday, April 07, 2011 11:28 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data
 
 Still digesting Andrew's response (thanks Andrew), but
 
 On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
wrote:
 
  *Currently under id.loc.gov you will not find name authority
records, but
  you can find them at viaf.org*.
  *[YZ]*  viaf.org does not include geographic names. I just checked
there
  England.
 
 
 Is this not the relevant VIAF entry
 http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
 
 
 --
 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ford, Kevin
Actually, it appears to depend on whose Authority record you're looking at.  
The Canadians, Australians, and Israelis have it as a CorporateName (110), as 
do the French (210 - unimarc); LC and the Germans say it's a Geographic Name.

In the case of LCSH, therefore, it would be a 151.  Regardless, it is in VIAF.

Warmly,

Kevin




From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of LeVan,Ralph 
[le...@oclc.org]
Sent: Thursday, April 07, 2011 11:34
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] LCSH and Linked Data

If you look at the fields those names come from, I think they mean
England as a corporation, not England as a place.

Ralph

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Owen Stephens
 Sent: Thursday, April 07, 2011 11:28 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data

 Still digesting Andrew's response (thanks Andrew), but

 On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
wrote:

  *Currently under id.loc.gov you will not find name authority
records, but
  you can find them at viaf.org*.
  *[YZ]*  viaf.org does not include geographic names. I just checked
there
  England.
 

 Is this not the relevant VIAF entry
 http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804


 --
 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Ralph, Owen's pointing to a list where corporate (110) and geographic names
(151) are mixed.

Thanks Owen, I haven't seen that the first time. I guess you got that mixed
110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround.

*Ya'aqov*





On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote:

 If you look at the fields those names come from, I think they mean
 England as a corporation, not England as a place.

 Ralph

  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of
  Owen Stephens
  Sent: Thursday, April 07, 2011 11:28 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] LCSH and Linked Data
 
  Still digesting Andrew's response (thanks Andrew), but
 
  On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
 wrote:
 
   *Currently under id.loc.gov you will not find name authority
 records, but
   you can find them at viaf.org*.
   *[YZ]*  viaf.org does not include geographic names. I just checked
 there
   England.
  
 
  Is this not the relevant VIAF entry
  http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
 
 
  --
  Owen Stephens
  Owen Stephens Consulting
  Web: http://www.ostephens.com
  Email: o...@ostephens.com




-- 
*ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456

*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens
I'm out of my depth here :)

But... this is what I understood Andrew to be saying. In this instance
(?because 'England' is a Name Authority?) rather than create a separate LCSH
authority record for 'England' (as the 151), rather the LCSH subdivision is
recorded in the 781 of the existing Name Authority record.

Searching on http://authorities.loc.gov for England, I find an Authorised
heading, marked as a LCSH - but when I go to that record what I get is the
name authority record n 82068148 - the name authority record as represented
on VIAF by http://viaf.org/viaf/142995804/ (which links to
http://errol.oclc.org/laf/n%20%2082068148.html)

Just as this is getting interesting time differences mean I'm about to head
home :)

Owen

On Thu, Apr 7, 2011 at 4:34 PM, LeVan,Ralph le...@oclc.org wrote:

 If you look at the fields those names come from, I think they mean
 England as a corporation, not England as a place.

 Ralph

  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of
  Owen Stephens
  Sent: Thursday, April 07, 2011 11:28 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] LCSH and Linked Data
 
  Still digesting Andrew's response (thanks Andrew), but
 
  On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
 wrote:
 
   *Currently under id.loc.gov you will not find name authority
 records, but
   you can find them at viaf.org*.
   *[YZ]*  viaf.org does not include geographic names. I just checked
 there
   England.
  
 
  Is this not the relevant VIAF entry
  http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
 
 
  --
  Owen Stephens
  Owen Stephens Consulting
  Web: http://www.ostephens.com
  Email: o...@ostephens.com




-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Kevin,

England exists as a corporate body and also as a geographic name. BOTH
entities exist in LCSH. This doesn't apply to all geographic names, only to
some.

Andrew pointed us to VIAF, but I expect his algorithm to limit the search
for LCSH. Let's wait for his reply.

*Ya'aqov*

*On Thu, Apr 7, 2011 at 10:44 AM, Ford, Kevin k...@loc.gov wrote:
*

 * Actually, it appears to depend on whose Authority record you're looking
 at.  The Canadians, Australians, and Israelis have it as a CorporateName
 (110), as do the French (210 - unimarc); LC and the Germans say it's a
 Geographic Name.

 In the case of LCSH, therefore, it would be a 151.  Regardless, it is in
 VIAF.

 Warmly,

 Kevin


 *
 

*
*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread LeVan,Ralph
More confusing yet, if you look at the raw XML for that record (add viaf.xml to 
the end of the URI and then view source) you’ll see that the name type is 
indeed Geographic.

 

My boss is puzzled.

 

Ralph

 

From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] 
Sent: Thursday, April 07, 2011 11:56 AM
To: Code for Libraries
Cc: LeVan,Ralph; Houghton,Andrew
Subject: Re: [CODE4LIB] LCSH and Linked Data

 

Ralph, Owen's pointing to a list where corporate (110) and geographic names 
(151) are mixed. 

 

Thanks Owen, I haven't seen that the first time. I guess you got that mixed 
110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround.

 

Ya'aqov

 

 

 

 

On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote:

If you look at the fields those names come from, I think they mean
England as a corporation, not England as a place.

Ralph


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Owen Stephens

 Sent: Thursday, April 07, 2011 11:28 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data

 Still digesting Andrew's response (thanks Andrew), but

 On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
wrote:

  *Currently under id.loc.gov you will not find name authority
records, but
  you can find them at viaf.org*.
  *[YZ]*  viaf.org does not include geographic names. I just checked
there
  England.
 

 Is this not the relevant VIAF entry

 http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804


 --

 Owen Stephens
 Owen Stephens Consulting
 Web: http://www.ostephens.com
 Email: o...@ostephens.com




-- 
ya'aqovZISO | yaaq...@gmail.com | 856 217 3456





Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Jonathan Rochkind

On 4/7/2011 10:46 AM, Houghton,Andrew wrote:

to go to the name authority record 150 England with LCCN n82068148. Currently 
under id.loc.gov you will not find name authority records,


If this would change, so name authority record elements used in 6xx 
subject cataloging were in id.loc.gov, it would make powerful use of 
id.loc.gov much more feasible.


Is there anyone at LC this suggestion/request could be sent to, possibly 
en masse?  I do sort of have the impression it's been an item of 
contention inside LC.


Jonathan


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
Jonathan, hi and thanks,

1. I believe id.loc.gov includes a list of MARC countries and a list for
geographic areas (based on the geographic names in 151 fields.
2. cataloging rules instruct catalogers to use THOSE very name forms in 151
$a when a subject can be divided (limited)  geographically using $z.
3. Not all subjects which can be divided geographically will have the
geographical subdivision immediately after the subject. There could be 2
different sequences:

650  $a Picket lines $z Ohio
650 $a Picket Lines $x Economical aspects $z Ohio
(where/when does the geographical subdivision follow immediately or not $a
is part of the rules LC catalogers observe to the dot).

There could be also two geographical subdivisions following each other
650 $a Picket lines $zOhio $z Columbus

Oh yeah, these record elements could be used powerfully for our users.
*Ya'aqov*

*On Thu, Apr 7, 2011 at 11:29 AM, Jonathan Rochkind rochk...@jhu.eduwrote:
*

 *On 4/7/2011 10:46 AM, Houghton,Andrew wrote:
 *

 * to go to the name authority record 150 England with LCCN n82068148.
 Currently under id.loc.gov you will not find name authority records,
 *

 *
 *
 * If this would change, so name authority record elements used in 6xx
 subject cataloging were in id.loc.gov, it would make powerful use of
 id.loc.gov much more feasible.

 Is there anyone at LC this suggestion/request could be sent to, possibly en
 masse?  I do sort of have the impression it's been an item of contention
 inside LC.

 Jonathan
 *


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
1.   No disagreement, except that some 151 appears in the name file and 
some appear in the subject file:
n82068148   008/11=a 008/14=a 151 _ _ $a 
England 
sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco 
Mountains (Mexico)



2.   Yes, see n5359
151 _ _ $a Sonora (Mexico : State)
751 _ _ $z Mexico $z Sonora (State)



3.   Oops, my apologies to my VIAF colleagues, I believe that geographic 
names are in the works… or at least I was under the impression they were from a 
discussion I had last night.



 

From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] 
Sent: Thursday, April 07, 2011 11:18
To: Code for Libraries; Houghton,Andrew
Cc: LeVan,Ralph
Subject: Re: [CODE4LIB] LCSH and Linked Data

 

Andrew, please see [YZ] below

 

181 __ $z England  and you would NOT find this heading in LCSH. This is issue 
one. Unfortunately, LC does not create 181 in LCSH (actually I think there are 
some, but not if it’s a name), instead they create a 781 in the name authority 
record. 

[YZ]  MARC/LCSH distinguishes between names 100 and geographic names 151 in 
their authority record. You'll find all geographic names if you look for 151 
records.

 

So to find the corresponding $z England we need to go to the name authority 
record 150 England with LCCN n82068148. 

[YZ]  LCCN n82068148 authority record is  for 151 England.

Also Andrew, are you indicating there is a difference between the form of 
geographic name in 151$a and 781$z   -- ?

 

Currently under id.loc.gov you will not find name authority records, but you 
can find them at viaf.org. 

[YZ]  viaf.org does not include geographic names. I just checked there England. 
makes little sense to mix personal/corporate names with geographic ones. Let's 
see what Ralph comments.

 

Ya'aqov



Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
That is probably correct. England may appear as both a 110 *and* a 151 because 
the 110 signifies the concept for the country entity while the 151 signifies 
the concept for the geographic place. A subtle distinction...

Andy.

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Ya'aqov Ziso
 Sent: Thursday, April 07, 2011 11:56
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] LCSH and Linked Data
 
 Ralph, Owen's pointing to a list where corporate (110) and geographic
 names
 (151) are mixed.
 
 Thanks Owen, I haven't seen that the first time. I guess you got that
 mixed
 110/151 when limiting to 'exact name'. Perhaps Andrew has a workaround.
 
 *Ya'aqov*
 
 
 
 
 
 On Thu, Apr 7, 2011 at 10:34 AM, LeVan,Ralph le...@oclc.org wrote:
 
  If you look at the fields those names come from, I think they mean
  England as a corporation, not England as a place.
 
  Ralph
 
   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf
  Of
   Owen Stephens
   Sent: Thursday, April 07, 2011 11:28 AM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] LCSH and Linked Data
  
   Still digesting Andrew's response (thanks Andrew), but
  
   On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso yaaq...@gmail.com
  wrote:
  
*Currently under id.loc.gov you will not find name authority
  records, but
you can find them at viaf.org*.
*[YZ]*  viaf.org does not include geographic names. I just
 checked
  there
England.
   
  
   Is this not the relevant VIAF entry
   http://viaf.org/viaf/14299580http://viaf.org/viaf/142995804
  
  
   --
   Owen Stephens
   Owen Stephens Consulting
   Web: http://www.ostephens.com
   Email: o...@ostephens.com
 
 
 
 
 --
 *ya'aqov**ZISO | **yaaq...@gmail.com **| 856 217 3456
 
 *


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ya'aqov Ziso
*Andrew, as always, most helpful news, kindest thanks! more [YZ] below:*

*1.   No disagreement, except that some 151 appears in the name file and
some appear in the subject file:*
*n82068148   008/11=a 008/14=a 151 _ _ $a
England*
*sh2010015057008/11=a 008/14=b 151 _ _ $a
Tabasco Mountains (Mexico)*
*[YZ] would it be possible then to use both files as sources and create one
file for geographical names for our purpose(s)?*

*2.   Yes, see n5359*
*151 _ _ $a Sonora (Mexico : State)*
*751 _ _ $z Mexico $z Sonora (State)*
***[YZ]  Both stand for a distinct cataloging usage. Jonathan's suggestion
to consult LC may answer the question of which field/when to use for
geographical names
*
*3.   Oops, my apologies to my VIAF colleagues, I believe that
geographic names are in the works… *
***[YZ] inshAllah!*
*
*
*4. That is probably correct. England may appear as both a 110 *and* a 151
because the 110 signifies the concept for the country entity while the 151
signifies the concept for the geographic place. A subtle distinction...*
*[YZ] Exactly. This distinction called for creating both a 110 AND a 151.
But we are talking about 151. The case where there is both a 110 and a 151
does NOT apply to geographic names, only to some.*
*
*
*[YZ] VIAF would be helpful to provide a way to limit geographical names
ONLY to 151 names and their cross references.*


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Ross Singer
On Thu, Apr 7, 2011 at 12:58 PM, Ya'aqov Ziso yaaq...@gmail.com wrote:

 1. I believe id.loc.gov includes a list of MARC countries and a list for
 geographic areas (based on the geographic names in 151 fields.
 2. cataloging rules instruct catalogers to use THOSE very name forms in 151
 $a when a subject can be divided (limited)  geographically using $z.

Yeah, this could get ugly pretty fast.  It's a bit unclear to me what
the distinction is between identical terms in both the geographic
areas and the country codes
(http://id.loc.gov/vocabulary/geographicAreas/e-uk-en 
http://id.loc.gov/vocabulary/countries/enk).  Well, in LC's current
representation, there *is* no distinction, they're both just
skos:Concepts that (by virtue of skos:exactMatch) effectively
interchangeable.

See also http://id.loc.gov/vocabulary/geographicAreas/fa and
http://id.loc.gov/authorities/sh85009230#concept.  You have a single
institution minting multiple URIs for what is effectively the same
thing (albeit in different vocabularies), although, ironically,
nothing points at any actual real world objects.

VIAF doesn't do much better in this particular case (there are lots of
examples where it does, mind you):  http://viaf.org/viaf/142995804
(see: http://viaf.org/viaf/142995804/rdf.xml).  We have all of these
triangulations around the concept of England or Atlas mountains,
but we can't actually refer to England or the Atlas mountains.

Also, I am not somehow above this problem, either.  With the linked
MARC codes lists (http://purl.org/NET/marccodes/), I had to make a
similar decision, I just chose to go the opposite route:  define them
as things, rather than concepts
(http://purl.org/NET/marccodes/gacs/fa#location,
http://purl.org/NET/marccodes/gacs/e-uk-en#location,
http://purl.org/NET/marccodes/countries/enk#location, etc.), which
presents its own set of problems
(http://purl.org/NET/marccodes/gacs/h#location is not a SpatialThing
no matter how liberal your definition).

At some point, it's worth addressing what these things actually *are*
and if, indeed, they are effectively the same thing, if it's worth
preserving these redundancies, because I think they'll cause grief in
the future.

-Ross.


Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Houghton,Andrew
My bad in (2) that should have been 781 and it’s LC’s way to indicate the 
geographic form used for a 181 when a heading may be geographically subdivided. 
The point is, when you are trying to do authority matching/mapping you have to 
match against the 181’s in LCSH *and* the 781’s in NAF.  This is an oddity of 
the LC authority file that people may not be aware of, hence why I pointed it 
out.  As I indicated, in my mapping projects I have taken LCSH and added new 
181 records based on the 781’s found in NAF.  This allows the matching process 
to work reasonably well without dragging in the entire NAF for searching and 
matching.  However, this still doesn’t give the complete the picture since in 
LCSH the *construction rules* allow you to use things in the name authority 
file as subjects, ugh.  Effectively, LCSH isn’t useful by itself when trying to 
match/decompose 6XX in bibliographic records.  You really need access to NAF as 
well.  Things get worst when talking about the Children’s headings… since you 
can pull from both LCSH and NAF, ugh-ugh.  While LC would like us to think of 
the authority file as three separate authorities, LCSH, LCSHac, NAF, in reality 
the dependencies require you to ignore the thesaurus boundaries and just treat 
the entire authority file as one thesauri.  We struggled with this in the 
terminology services project, especially when the references in one thesaurus 
cross over into the other thesauri.

 

Andy.

 

From: Ya'aqov Ziso [mailto:yaaq...@gmail.com] 
Sent: Thursday, April 07, 2011 13:47
To: Code for Libraries; Houghton,Andrew
Cc: Hickey,Thom; LeVan,Ralph
Subject: Re: [CODE4LIB] LCSH and Linked Data

 

Andrew, as always, most helpful news, kindest thanks! more [YZ] below:

 

1.   No disagreement, except that some 151 appears in the name file and 
some appear in the subject file:
n82068148   008/11=a 008/14=a 151 _ _ $a England
sh2010015057008/11=a 008/14=b 151 _ _ $a Tabasco 
Mountains (Mexico)
[YZ] would it be possible then to use both files as sources and create one file 
for geographical names for our purpose(s)?

2.   Yes, see n5359
151 _ _ $a Sonora (Mexico : State)
751 _ _ $z Mexico $z Sonora (State)

[YZ]  Both stand for a distinct cataloging usage. Jonathan's suggestion to 
consult LC may answer the question of which field/when to use for geographical 
names

3.   Oops, my apologies to my VIAF colleagues, I believe that geographic 
names are in the works… 

[YZ] inshAllah!

 

4. That is probably correct. England may appear as both a 110 *and* a 151 
because the 110 signifies the concept for the country entity while the 151 
signifies the concept for the geographic place. A subtle distinction...

[YZ] Exactly. This distinction called for creating both a 110 AND a 151. But we 
are talking about 151. The case where there is both a 110 and a 151 does NOT 
apply to geographic names, only to some.

 

[YZ] VIAF would be helpful to provide a way to limit geographical names ONLY to 
151 names and their cross references.



Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Jonathan Rochkind

On 4/7/2011 1:21 PM, Houghton,Andrew wrote:

That is probably correct. England may appear as both a 110 *and* a 151 because 
the 110 signifies the concept for the country entity while the 151 signifies 
the concept for the geographic place. A subtle distinction...


This starts getting into categorization philosophy type issues, and 
reveal that LCSH isn't entirely consistent in it's modelling (as 
virtually no classification will be without being extraordinarily 
complex, the world is a messy  place), along the lines Ross was talking 
about too, but I think it can be explicated a bit


I'm not sure it's quite true to say that a 151 (corresponding to a 
6xx $v subdivision) is a geographic place as entirely distinct from a 
'country entity'.   I might instead say the 151 is meant to be a sort of 
geo-historical place,  that does take into account, well, either 
political entities or general contemporary conceptions of place 
distinctions at particular historical times.  While the 110 is about a 
collective-body _actor_, a government


All of these are $v's, which presumably are authorized by authority 151s:

Soviet Union
Russia
Russia (Federation)
Former Soviet Republics

typically assigned for works about that area of the world at the time 
that area of the world was known as a particular thing, heh.


Or: Italy / Roman Empire
Byzantine Empire / Ottoman Empire / Turkey / Balkan Peninsula

Now, all those things aren't the _exact_ same longitude and lattitude, 
but with significant overlap, different in different cases. At any rate, 
151s aren't  purely a name for a geographic boundary on the planet, 
they're some kind of, um, geo-political-historical concept.


Compare to the terms you can put in an 048, which ARE meant to be 
history and political entity free. e-ur == Russia. Russian Empire. 
Soviet Union. Former Soviet Republics. Yeah, all of em together. 
Nevermind they dont' have exactly the same boundaries. (And of course 
the boundaries of any one of em can and did change over time).  At least 
048's MOSTLY try to be purely geographical, free of historical/political 
context, but then sometimes they go ahead and add weird ones that can't 
possibly follow that principle, like d= Developing Countries or 
dd=Developed Countries.


But yeah, then we've got the 110 England, which isn't a geographical 
concept AT ALL, it refers really to the Government/political _actor_  
(as a collective body)  known as England. Which happens to have 
controlled or claimed certain geographic territory for itself at 
different times, but the 110 England isn't about the geographic 
territory, it's about the collective-body actor. (Does that even still 
exist? What is it's contemporary or historical relationship to the 
concepts United Kingdom and Great Britain, are those political 
actors too?)


Somewhere I read an article about the particular messiness of geographic 
vocabularies, as discussed above, I forget where.  Wish I could find it 
again, it would be helpful here.  But modelling the real world with a 
subject vocabulary is inherently messy, especially so with geographic 
classification like this that is meant to somehow cover all of recorded 
human history too. The map is not the territory.