Re: [CODE4LIB] source of marc geographic code?

2011-06-23 Thread Ford, Kevin
The GeographicArea codes have been available from [1] in XML [2] since at least 
late 2007 [3].  I can't say with 100% certainty that the XML structure has 
remained perfectly consistent since 2007, but eyeballing the 2007 version and 
comparing it to currently available file suggests that the structure has 
remained consistent.

The GACS codes are also available from ID, as has been pointed out.  The entire 
list is available for download at [4].  Let me acknowledge, though, that the 
labels for the URIs (incidentally, the GACS code is the last token of the URI)  
are not part of the RDF/N-triples/JSON at [5].  This sounds like a feature 
request - and a useful one at that.  Would that be an accurate interpretation 
of this thread?

Cordially,

Kevin

--
Network Development  MARC Standards Office

[1] http://www.loc.gov/marc/geoareas/gacshome.html
[2] http://www.loc.gov/standards/codelists/gacs.xml
[3] http://web.archive.org/web/20071129170212/http://www.loc.gov/marc/geoareas/
[4] http://id.loc.gov/download/
[5] http://id.loc.gov/vocabulary/geographicAreas.html



From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan 
Rochkind [rochk...@jhu.edu]
Sent: Wednesday, June 22, 2011 21:43
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] source of marc geographic code?

 The result was that a few meetings later LC announced that they
 had coded the MARC online pages in XML, and were generating the HTML
 from that. I think I was mis-understood.

No doubt, but man if they'd then just SHARE that XML with us at a persistent 
URL, and keep the structure of that XML the same, that'd be really useful!


Re: [CODE4LIB] source of marc geographic code?

2011-06-23 Thread Jonathan Rochkind
Huh, that does look like it's got what I need, although it's a bit 
confusing. I wasn't able to find a URL to a file with the format Karen 
cites below.  I'm probably dense. Can anyone give me the URL that 
returns a list of all terms with each term having the XML Karen quotes 
below?


It looks like I'd still have to drill down into what should be an 
opaque identifier to get the actual MARC code. Extract fq from 
http://id.loc.gov/vocabulary/geographicAreas/fq;, hard-coding in that 
those URLs will always be of that form, with the last term being the 
actual MARC code.  But that's _probably_ a safe assumption. (Although it 
wouldn't hurt if they added a data element marcCode or somethign with 
the actual literal fq i it.)


On 6/22/2011 10:35 PM, Karen Coyle wrote:

Quoting Jonathan Rochkind rochk...@jhu.edu:




Right, so like I keep saying, as far as I can tell, those files are 
lists of URLs, one for each code. (Or technically lists of 
RDF-triples, but where two parts of each triple is identical in every 
triple just saying this URL is part of the marc geographic 
vocabulary, and then each triple has a unique URL representing a code).


And I'd need to do a seperate HTTP request for each code ( a couple 
hundred?) to actually get the label(s).


I'm not sure why you see it as separate requests, unless the 
downloaded file doesn't work for you -- but maybe I don't understand 
what you are trying to do. The downloaded full file has the display 
data and the codes:


rdf:Description 
rdf:about=http://id.loc.gov/vocabulary/geographicAreas/fq;

rdf:type rdf:resource=http://www.w3.org/2004/02/skos/core#Concept/
rdf:type 
rdf:resource=http://www.w3.org/1999/02/22-rdf-syntax-ns#Resource/

owl:sameAs rdf:resource=info:lc/vocabulary/gacs/fq/
skos:prefLabel xml:lang=enAfrica, French-speaking 
Equatorial/skos:prefLabel
skos:notation 
rdf:datatype=http://www.w3.org/2001/XMLSchema#string;fq/skos:notation
skos:inScheme 
rdf:resource=http://id.loc.gov/vocabulary/geographicAreas/

skos:altLabel xml:lang=enAfrica, Equatorial/skos:altLabel
skos:narrower
skos:Concept
skos:prefLabel xml:lang=enChad, Lake/skos:prefLabel
skos:broader 
rdf:resource=http://id.loc.gov/vocabulary/geographicAreas/fq/

/skos:Concept
/skos:narrower
skos:altLabel xml:lang=enFrench Equatorial Africa/skos:altLabel
skos:altLabel xml:lang=enFrench-speaking Equatorial 
Africa/skos:altLabel
skos:exactMatch 
rdf:resource=http://id.loc.gov/authorities/sh85001608#concept/
skos:broader 
rdf:resource=http://id.loc.gov/vocabulary/geographicAreas/f/

vs:term_statusstable/vs:term_status
skos:changeNote rdf:nodeID=fq/

so if ypu pick out the value in prefLabel and the value in notation 
you have what you need, no? (admittedly, this is NOT the same as a 
simple, comma delimited list!)


skos:prefLabel xml:lang=enChad, Lake/skos:prefLabel
skos:notation 
rdf:datatype=http://www.w3.org/2001/XMLSchema#string;fq/skos:notation




Am I missing something? That's not a very convenient way to get the 
data for the very common use case of wanting to construct a mapping 
from code to label, right? Or that's just me?


What would be nice would be a simple XSLT transform that turns out a 
CSV on the fly, always getting the latest values.


No?

kc









Re: [CODE4LIB] source of marc geographic code?

2011-06-23 Thread Keith Jenkins
On Thu, Jun 23, 2011 at 10:59 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 On 6/22/2011 11:25 PM, Ross Singer wrote:
 Can't you use:
 http://www.loc.gov/standards/codelists/gacs.xml

 Yes, I can! I didn't know about/hadn't found that one either hadn't been
 mentioned until now. Thanks! Where did you find that?

That XML file is linked from near the bottom of this page:
http://www.loc.gov/marc/geoareas/

Keith


Re: [CODE4LIB] source of marc geographic code?

2011-06-23 Thread Jonathan Rochkind

On 6/22/2011 11:25 PM, Ross Singer wrote:

Can't you use:
http://www.loc.gov/standards/codelists/gacs.xml
?
It's what I used to make marccodes.heroku.com/gacs/


Yes, I can! I didn't know about/hadn't found that one either hadn't been 
mentioned until now. Thanks! Where did you find that?


That's potentially an even more convenient format for my use case than 
the RDF version.




Although like Karen pointed out, not sure why you can't use the
RDF/XML from id.loc.gov

-Ross.

On Wed, Jun 22, 2011 at 5:44 PM, Jonathan Rochkindrochk...@jhu.edu  wrote:

Can anyone remind me if there's a machine readable copy of the MARC
geographic codes available at any persistent URL?

They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
actually had a script that automatically downloaded from there and scraped
the HTML -- but sometime since I wrote the script, the HTML structure on the
page changed and it broke.

(I kind of thought that was unlikely since that HTML page itself was machine
generated -- but I guess they changed the software that generated it.
Certainly I knew that scraping HTML was a bad thing to rely on... which is
why I hope LC provides this in some format less likely to change?)



[CODE4LIB] source of marc geographic code?

2011-06-22 Thread Jonathan Rochkind
Can anyone remind me if there's a machine readable copy of the MARC 
geographic codes available at any persistent URL?


They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I 
actually had a script that automatically downloaded from there and 
scraped the HTML -- but sometime since I wrote the script, the HTML 
structure on the page changed and it broke.


(I kind of thought that was unlikely since that HTML page itself was 
machine generated -- but I guess they changed the software that 
generated it. Certainly I knew that scraping HTML was a bad thing to 
rely on... which is why I hope LC provides this in some format less 
likely to change?)


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Kyle Banerjee
I went through a process similar to what you describe sometime back for a
tool I made (i.e. I could find no easily downloadable info). You can
download something that will be easier to parse from

http://calculate.alptown.com/gac.js

It's probably not 100% accurate as I haven't downloaded for quite awhile.
But catalogers have me correct errors they discover and there are about 800
unique visitors per day so I assume they notice most things.

It would be nice if this kind of data could be provided in a straightforward
format.

kyle



On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Can anyone remind me if there's a machine readable copy of the MARC
 geographic codes available at any persistent URL?

 They're in HTML at 
 http://www.loc.gov/marc/**geoareas/gacs_code.htmlhttp://www.loc.gov/marc/geoareas/gacs_code.html.
  I actually had a script that automatically downloaded from there and
 scraped the HTML -- but sometime since I wrote the script, the HTML
 structure on the page changed and it broke.

 (I kind of thought that was unlikely since that HTML page itself was
 machine generated -- but I guess they changed the software that generated
 it. Certainly I knew that scraping HTML was a bad thing to rely on... which
 is why I hope LC provides this in some format less likely to change?)




-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.877.9773


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Kyle Banerjee
And yes, I realize the structure of the data in the file referenced below is
idiotic even if it is easier to parse than HTML.

But this was part of the first javascript program I ever wrote, and that was
back in 1997 when getting real time interaction with browsers was harder
(and I never bothered to rewrite).

kyle

On Wed, Jun 22, 2011 at 2:57 PM, Kyle Banerjee baner...@uoregon.edu wrote:

 I went through a process similar to what you describe sometime back for a
 tool I made (i.e. I could find no easily downloadable info). You can
 download something that will be easier to parse from

 http://calculate.alptown.com/gac.js

 It's probably not 100% accurate as I haven't downloaded for quite awhile.
 But catalogers have me correct errors they discover and there are about 800
 unique visitors per day so I assume they notice most things.

 It would be nice if this kind of data could be provided in a
 straightforward format.

 kyle




 On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkind rochk...@jhu.eduwrote:

 Can anyone remind me if there's a machine readable copy of the MARC
 geographic codes available at any persistent URL?

 They're in HTML at 
 http://www.loc.gov/marc/**geoareas/gacs_code.htmlhttp://www.loc.gov/marc/geoareas/gacs_code.html.
  I actually had a script that automatically downloaded from there and
 scraped the HTML -- but sometime since I wrote the script, the HTML
 structure on the page changed and it broke.

 (I kind of thought that was unlikely since that HTML page itself was
 machine generated -- but I guess they changed the software that generated
 it. Certainly I knew that scraping HTML was a bad thing to rely on... which
 is why I hope LC provides this in some format less likely to change?)




 --
 --
 Kyle Banerjee
 Digital Services Program Manager
 Orbis Cascade Alliance
 baner...@uoregon.edu / 503.877.9773




-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.877.9773


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Jonathan Rochkind

Man, I figured it was there somewhere I just didn't know it.

If it's really not there, can we like start a campaign to convince LC 
that part of maintaining the MARC vocabularies is making them available 
at a persistent URL, in machine-readable fashion, updated and maintained 
by them as vocabularies change.


Or else, how is any software supposed to use them?  Counting on 
developers to manually review notices of update and manually update 
local lists is inefficient and entirely unrealistic.


On 6/22/2011 5:57 PM, Kyle Banerjee wrote:

I went through a process similar to what you describe sometime back for a
tool I made (i.e. I could find no easily downloadable info). You can
download something that will be easier to parse from

http://calculate.alptown.com/gac.js

It's probably not 100% accurate as I haven't downloaded for quite awhile.
But catalogers have me correct errors they discover and there are about 800
unique visitors per day so I assume they notice most things.

It would be nice if this kind of data could be provided in a straightforward
format.

kyle



On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkindrochk...@jhu.edu  wrote:


Can anyone remind me if there's a machine readable copy of the MARC
geographic codes available at any persistent URL?

They're in HTML at 
http://www.loc.gov/marc/**geoareas/gacs_code.htmlhttp://www.loc.gov/marc/geoareas/gacs_code.html.
 I actually had a script that automatically downloaded from there and
scraped the HTML -- but sometime since I wrote the script, the HTML
structure on the page changed and it broke.

(I kind of thought that was unlikely since that HTML page itself was
machine generated -- but I guess they changed the software that generated
it. Certainly I knew that scraping HTML was a bad thing to rely on... which
is why I hope LC provides this in some format less likely to change?)






Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Jonathan Rochkind
PS: Kyle, that's your own version? That's... sort of kind of machine 
readable. Well, not really. I can't figure out quite what's going on 
there,  the label/value pairs are just stuffed in single, javascript 
string literals, seperated by newlines, or sometimes (but sometimes not) 
with Assigned code: strings, etc.


That's in facta little bit harder to parse then what I'm doing against 
LC. I'm running CSS selectors against the HTML; I'm not having any 
difficulty parsing, the problem is that the format can change without 
notice. But yours seems harder to parse to me, am I missing something?


In the end, all I need is a list of pairs, code to label. I'll be 
looking up from code, so I don't even care about alternate labels, 
really.


On 6/22/2011 5:57 PM, Kyle Banerjee wrote:

I went through a process similar to what you describe sometime back for a
tool I made (i.e. I could find no easily downloadable info). You can
download something that will be easier to parse from

http://calculate.alptown.com/gac.js

It's probably not 100% accurate as I haven't downloaded for quite awhile.
But catalogers have me correct errors they discover and there are about 800
unique visitors per day so I assume they notice most things.

It would be nice if this kind of data could be provided in a straightforward
format.

kyle



On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkindrochk...@jhu.edu  wrote:


Can anyone remind me if there's a machine readable copy of the MARC
geographic codes available at any persistent URL?

They're in HTML at 
http://www.loc.gov/marc/**geoareas/gacs_code.htmlhttp://www.loc.gov/marc/geoareas/gacs_code.html.
 I actually had a script that automatically downloaded from there and
scraped the HTML -- but sometime since I wrote the script, the HTML
structure on the page changed and it broke.

(I kind of thought that was unlikely since that HTML page itself was
machine generated -- but I guess they changed the software that generated
it. Certainly I knew that scraping HTML was a bad thing to rely on... which
is why I hope LC provides this in some format less likely to change?)






Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Stephen Hearn
Have you looked at id.loc.gov? One of its vocabularies defines URLs
for each of the MARC geographic area codes.

Stephen


On Wed, Jun 22, 2011 at 4:44 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Can anyone remind me if there's a machine readable copy of the MARC
 geographic codes available at any persistent URL?

 They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
 actually had a script that automatically downloaded from there and scraped
 the HTML -- but sometime since I wrote the script, the HTML structure on the
 page changed and it broke.

 (I kind of thought that was unlikely since that HTML page itself was machine
 generated -- but I guess they changed the software that generated it.
 Certainly I knew that scraping HTML was a bad thing to rely on... which is
 why I hope LC provides this in some format less likely to change?)




-- 
Stephen Hearn, Metadata Strategist
Technical Services, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Kyle Banerjee
Yes -- it is something I created out of thin air.

It was originally designed for catalogers who wanted a visual display to
duplicate the print, and achieving adequate performance on interactive
search, retrieval, and rendering on the computers/browsers at the time made
me have to include all the formatting.

To bust it up, split on '@'. That will give you individual records. The
labels will tell you the role of information. For example,

'Assigned code(s):\n '

will be followed by newline delimited codes for the rest of the field

'\n  USE  '

indicates a SEE reference, while any line that does not contain newlines
simply contains a single code. I realize it sounds nuts, but there aren't
that many variations so it's not as bad as it looks.

Since you just want pairs, you might want to load values that have codes
into a dictionary so when you encounter a SEE reference, you can create a
key value pair. The issue with ignoring alternate names is that there are a
number of nonintuitive connections that people wouldn't be able to make.

kyle

On Wed, Jun 22, 2011 at 3:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 **
 PS: Kyle, that's your own version? That's... sort of kind of machine
 readable. Well, not really. I can't figure out quite what's going on there,
 the label/value pairs are just stuffed in single, javascript string
 literals, seperated by newlines, or sometimes (but sometimes not) with
 Assigned code: strings, etc.

 That's in fact a little bit harder to parse then what I'm doing against
 LC. I'm running CSS selectors against the HTML; I'm not having any
 difficulty parsing, the problem is that the format can change without
 notice. But yours seems harder to parse to me, am I missing something?

 In the end, all I need is a list of pairs, code to label. I'll be looking
 up from code, so I don't even care about alternate labels, really.

 On 6/22/2011 5:57 PM, Kyle Banerjee wrote:

 I went through a process similar to what you describe sometime back for a
 tool I made (i.e. I could find no easily downloadable info). You can
 download something that will be easier to parse from
 http://calculate.alptown.com/gac.js

 It's probably not 100% accurate as I haven't downloaded for quite awhile.
 But catalogers have me correct errors they discover and there are about 800
 unique visitors per day so I assume they notice most things.

 It would be nice if this kind of data could be provided in a straightforward
 format.

 kyle



 On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkind rochk...@jhu.edu 
 rochk...@jhu.edu wrote:


  Can anyone remind me if there's a machine readable copy of the MARC
 geographic codes available at any persistent URL?

 They're in HTML at 
 http://www.loc.gov/marc/**geoareas/gacs_code.htmlhttp://www.loc.gov/marc/geoareas/gacs_code.html
  http://www.loc.gov/marc/geoareas/gacs_code.html. I actually had a script 
 that automatically downloaded from there and
 scraped the HTML -- but sometime since I wrote the script, the HTML
 structure on the page changed and it broke.

 (I kind of thought that was unlikely since that HTML page itself was
 machine generated -- but I guess they changed the software that generated
 it. Certainly I knew that scraping HTML was a bad thing to rely on... which
 is why I hope LC provides this in some format less likely to change?)





-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.877.9773


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Jonathan Rochkind
Aha, that's probably what I need. And now I remember Ross probably 
pointed that out to me before.


I'm still having trouble figuring out how to get from the rdf-triples 
it's got there to a hash of codes (as they appear in marc records, not 
URIs), to labels.


It seems like it in fact will be a lot more work than the scraping I'm 
doing of the HTML page now, but of course the problem with the HTML page 
is that it's structure is not reliable, it changes.  So the structured 
data from id.loc.gov is the way to go but I'm still getting confused 
figuring out how to get what I want out of it. If anyone wants to give 
me any hints, appreciated.


It kind of looks like I FIRST have to get the complete list from one of 
the structured forms (RDF-XML, triple, etc), and THEN make a seperate 
HTTP request for _each_ term listed in the list to get the code as found 
in the MARC record and the label.  That's a pretty slow process, as well 
as requiring writing more code than a task like this seems like it 
should take. Is there anything on that site that can give me the 
code/label pairs in one single download?



On 6/22/2011 6:38 PM, Stephen Hearn wrote:

Have you looked at id.loc.gov? One of its vocabularies defines URLs
for each of the MARC geographic area codes.

Stephen


On Wed, Jun 22, 2011 at 4:44 PM, Jonathan Rochkindrochk...@jhu.edu  wrote:

Can anyone remind me if there's a machine readable copy of the MARC
geographic codes available at any persistent URL?

They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
actually had a script that automatically downloaded from there and scraped
the HTML -- but sometime since I wrote the script, the HTML structure on the
page changed and it broke.

(I kind of thought that was unlikely since that HTML page itself was machine
generated -- but I guess they changed the software that generated it.
Certainly I knew that scraping HTML was a bad thing to rely on... which is
why I hope LC provides this in some format less likely to change?)






Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Kyle Banerjee
I wasn't aware of this, but it definitely didn't exist way back when I
started. You can download all the GACs in XML from that page.

kyle

On Wed, Jun 22, 2011 at 3:38 PM, Stephen Hearn s-h...@umn.edu wrote:

 Have you looked at id.loc.gov? One of its vocabularies defines URLs
 for each of the MARC geographic area codes.

 Stephen


 On Wed, Jun 22, 2011 at 4:44 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  Can anyone remind me if there's a machine readable copy of the MARC
  geographic codes available at any persistent URL?
 
  They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
  actually had a script that automatically downloaded from there and
 scraped
  the HTML -- but sometime since I wrote the script, the HTML structure on
 the
  page changed and it broke.
 
  (I kind of thought that was unlikely since that HTML page itself was
 machine
  generated -- but I guess they changed the software that generated it.
  Certainly I knew that scraping HTML was a bad thing to rely on... which
 is
  why I hope LC provides this in some format less likely to change?)
 



 --
 Stephen Hearn, Metadata Strategist
 Technical Services, University Libraries
 University of Minnesota
 160 Wilson Library
 309 19th Avenue South
 Minneapolis, MN 55455
 Ph: 612-625-2328
 Fx: 612-625-3428




-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.877.9773


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Karen Coyle

Quoting Jonathan Rochkind rochk...@jhu.edu:

Can anyone remind me if there's a machine readable copy of the MARC  
geographic codes available at any persistent URL?


Not sure how persistent, but here's Ross's version:

http://marccodes.heroku.com/gacs/

I often made the point at MARBI meetings (before I just gave up going  
to them) that all of MARC (tags, subfields, codes) should be available  
in a machine-readable form.[1] I go NUTS when I see those email  
notices come around, the idea that all over the world people are  
manually keying in codes into a local table. Please help us make sure  
that does not happen in any future formats Make noise now!


kc

[1] The result was that a few meetings later LC announced that they  
had coded the MARC online pages in XML, and were generating the HTML  
from that. I think I was mis-understood.




They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html .  
I actually had a script that automatically downloaded from there and  
scraped the HTML -- but sometime since I wrote the script, the  
HTML structure on the page changed and it broke.


(I kind of thought that was unlikely since that HTML page itself was  
machine generated -- but I guess they changed the software that  
generated it. Certainly I knew that scraping HTML was a bad thing to  
rely on... which is why I hope LC provides this in some format less  
likely to change?)






--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Tom Keays
It can be found at
http://id.loc.gov/vocabulary/geographicAreas.html

Look near the bottom of the page for links to the codes as RDF, N-triples,
and JSON.

Tom

On Wed, Jun 22, 2011 at 6:38 PM, Stephen Hearn s-h...@umn.edu wrote:

 Have you looked at id.loc.gov? One of its vocabularies defines URLs
 for each of the MARC geographic area codes.

 Stephen


 On Wed, Jun 22, 2011 at 4:44 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  Can anyone remind me if there's a machine readable copy of the MARC
  geographic codes available at any persistent URL?
 
  They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
  actually had a script that automatically downloaded from there and
 scraped
  the HTML -- but sometime since I wrote the script, the HTML structure on
 the
  page changed and it broke.
 
  (I kind of thought that was unlikely since that HTML page itself was
 machine
  generated -- but I guess they changed the software that generated it.
  Certainly I knew that scraping HTML was a bad thing to rely on... which
 is
  why I hope LC provides this in some format less likely to change?)
 



 --
 Stephen Hearn, Metadata Strategist
 Technical Services, University Libraries
 University of Minnesota
 160 Wilson Library
 309 19th Avenue South
 Minneapolis, MN 55455
 Ph: 612-625-2328
 Fx: 612-625-3428



Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Jonathan Rochkind
 It can be found at
 http://id.loc.gov/vocabulary/geographicAreas.html

 Look near the bottom of the page for links to the codes as RDF, N-triples,
 and JSON.
 
Right, so like I keep saying, as far as I can tell, those files are lists of 
URLs, one for each code. (Or technically lists of RDF-triples, but where two 
parts of each triple is identical in every triple just saying this URL is part 
of the marc geographic vocabulary, and then each triple has a unique URL 
representing a code). 

And I'd need to do a seperate HTTP request for each code ( a couple hundred?) 
to actually get the label(s). 

Am I missing something? That's not a very convenient way to get the data for 
the very common use case of wanting to construct a mapping from code to label, 
right? Or that's just me?


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Jonathan Rochkind
 The result was that a few meetings later LC announced that they
 had coded the MARC online pages in XML, and were generating the HTML
 from that. I think I was mis-understood.

No doubt, but man if they'd then just SHARE that XML with us at a persistent 
URL, and keep the structure of that XML the same, that'd be really useful!  


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Karen Coyle

Quoting Jonathan Rochkind rochk...@jhu.edu:




Right, so like I keep saying, as far as I can tell, those files are  
lists of URLs, one for each code. (Or technically lists of  
RDF-triples, but where two parts of each triple is identical in  
every triple just saying this URL is part of the marc geographic  
vocabulary, and then each triple has a unique URL representing a  
code).


And I'd need to do a seperate HTTP request for each code ( a couple  
hundred?) to actually get the label(s).


I'm not sure why you see it as separate requests, unless the  
downloaded file doesn't work for you -- but maybe I don't understand  
what you are trying to do. The downloaded full file has the display  
data and the codes:


rdf:Description rdf:about=http://id.loc.gov/vocabulary/geographicAreas/fq;
rdf:type rdf:resource=http://www.w3.org/2004/02/skos/core#Concept/
rdf:type  
rdf:resource=http://www.w3.org/1999/02/22-rdf-syntax-ns#Resource/

owl:sameAs rdf:resource=info:lc/vocabulary/gacs/fq/
skos:prefLabel xml:lang=enAfrica, French-speaking  
Equatorial/skos:prefLabel
skos:notation  
rdf:datatype=http://www.w3.org/2001/XMLSchema#string;fq/skos:notation
skos:inScheme  
rdf:resource=http://id.loc.gov/vocabulary/geographicAreas/

skos:altLabel xml:lang=enAfrica, Equatorial/skos:altLabel
skos:narrower
skos:Concept
skos:prefLabel xml:lang=enChad, Lake/skos:prefLabel
skos:broader  
rdf:resource=http://id.loc.gov/vocabulary/geographicAreas/fq/

/skos:Concept
/skos:narrower
skos:altLabel xml:lang=enFrench Equatorial Africa/skos:altLabel
skos:altLabel xml:lang=enFrench-speaking Equatorial  
Africa/skos:altLabel
skos:exactMatch  
rdf:resource=http://id.loc.gov/authorities/sh85001608#concept/
skos:broader  
rdf:resource=http://id.loc.gov/vocabulary/geographicAreas/f/

vs:term_statusstable/vs:term_status
skos:changeNote rdf:nodeID=fq/

so if ypu pick out the value in prefLabel and the value in notation  
you have what you need, no? (admittedly, this is NOT the same as a  
simple, comma delimited list!)


skos:prefLabel xml:lang=enChad, Lake/skos:prefLabel
skos:notation  
rdf:datatype=http://www.w3.org/2001/XMLSchema#string;fq/skos:notation




Am I missing something? That's not a very convenient way to get the  
data for the very common use case of wanting to construct a mapping  
from code to label, right? Or that's just me?


What would be nice would be a simple XSLT transform that turns out a  
CSV on the fly, always getting the latest values.


No?

kc







--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] source of marc geographic code?

2011-06-22 Thread Ross Singer
Can't you use:
http://www.loc.gov/standards/codelists/gacs.xml
?
It's what I used to make marccodes.heroku.com/gacs/

Although like Karen pointed out, not sure why you can't use the
RDF/XML from id.loc.gov

-Ross.

On Wed, Jun 22, 2011 at 5:44 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Can anyone remind me if there's a machine readable copy of the MARC
 geographic codes available at any persistent URL?

 They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
 actually had a script that automatically downloaded from there and scraped
 the HTML -- but sometime since I wrote the script, the HTML structure on the
 page changed and it broke.

 (I kind of thought that was unlikely since that HTML page itself was machine
 generated -- but I guess they changed the software that generated it.
 Certainly I knew that scraping HTML was a bad thing to rely on... which is
 why I hope LC provides this in some format less likely to change?)