Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Simon Brown
[mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ethan Gruber Sent: Friday, September 26, 2014 3:54 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Reconciling corporate names? I would check with the developers of SNAC ( http://socialarchive.iath.virginia.edu/), as they've spent a lot

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Trail, Nate
To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Reconciling corporate names? You could always web scrape, or download and then search the LCNAF with some script that looks like: #Build query for webscraping query = paste(http://id.loc.gov/search/?q=;, URLencode(corporate name here ), q=cs

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Jonathan Rochkind
...@loc.gov -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Brown Sent: Monday, September 29, 2014 9:38 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Reconciling corporate names? You could always web scrape, or download

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Kyle Banerjee
for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ethan Gruber Sent: Friday, September 26, 2014 3:54 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Reconciling corporate names? I would check with the developers of SNAC ( http://socialarchive.iath.virginia.edu/), as they've

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Jean Roth
-Original Message- KB From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of KB Simon Brown KB Sent: Monday, September 29, 2014 9:38 AM KB To: CODE4LIB@LISTSERV.ND.EDU KB Subject: Re: [CODE4LIB] Reconciling corporate names? KB KB You could always web scrape, or download

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Kyle Banerjee
After a quick search, http://id.loc.gov/download/ looks like the place to go. I haven't downloaded it myself, but the file sizes make it look like the right stuff. kyle On Mon, Sep 29, 2014 at 10:55 AM, Jean Roth jr...@nber.org wrote: What is the link to the downloadable LCNAF data? -- Jean

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Jean Roth
Thank you! It looks like the files are available as RDF/XML, Turtle, or N-triples files. Any examples or suggestions for reading any of these formats? The MARC Countries file is small, 31-79 kb. I assume a script that would read a small file like that would at least be a start for the LCNAF

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Kyle Banerjee
The best way to handle them depends on what you want to do. You need to actually download the NAF files rather than countries or other small files as different kinds of data will be organized differently. Just don't try to read multigigabyte files in a text editor :) If you start with one of the

[CODE4LIB] Reconciling corporate names?

2014-09-26 Thread Galligan, Patrick
I'm looking to reconcile about 40,000 corporate names against LCNAF to see whether they are authorized strings or not, but I'm drawing a blank about how to get it done. I've used http://freeyourmetadata.org/ for reconciling subject headings before, but I can't get it to work for LCNAF. Has

Re: [CODE4LIB] Reconciling corporate names?

2014-09-26 Thread Ethan Gruber
I would check with the developers of SNAC ( http://socialarchive.iath.virginia.edu/), as they've spent a lot of time developing named entity recognition scripts for personal and corporate names. They might have something you can reuse. Ethan On Fri, Sep 26, 2014 at 3:47 PM, Galligan, Patrick

Re: [CODE4LIB] Reconciling corporate names?

2014-09-26 Thread Karen Hanson
@LISTSERV.ND.EDU] On Behalf Of Ethan Gruber Sent: Friday, September 26, 2014 3:54 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Reconciling corporate names? I would check with the developers of SNAC ( http://socialarchive.iath.virginia.edu/), as they've spent a lot of time developing named entity