Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Justin Hopkins 📖
Seems like a lively party, so I'll join in. I think all the matchpoints are bad. Ultimately we used many different factors to arrive at a sort of "match score". This seemed to be a good approach. Obviously, we can't plan for all the possible variations and inconsistencies in MARC, but we can do a

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Jason Etheridge
> We liked your fingerprinting idea. We expanded it a bit: Awesome. There was another idea we had (and implemented) back when I worked for PINES, though I don't know how worthwhile it is these days: A dedupe interface that can allow/expedite user processing of proposed merges from algorithms sim

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Blake Henderson
All, I meant to share the results (the fruits of our labor): 1. 169,206 bibs were deduped. 2. We had 1,000,234 non-deleted bibs and now we have 832,915 (there were new bibs getting added to the system for the duration of the dedupe) 3. 16.9% duplication resolution 4. 12,381 bibs were NOT merge

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Blake Henderson
Jason, We liked your fingerprinting idea. We expanded it a bit: $fingerprints{alternate} = join("\t", $marc{item_form}, $marc{date1}, $marc{record_type}, $marc{bib_lvl}, $marc{title}, $marc{subtitle}.$marc{subtitlep}, $marc{author} ? $marc{author} : '', $marc{audioformat}, $m

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Jason Etheridge
For what it's worth, this is the fairly conservative algorithm used by the default fingerprinter in the migration-tools repository: https://docs.google.com/document/d/1tvuA0Os3W0B2Fl_GvO_Z6ZG6ZHecg8JtTRMz3QUktK8/edit?usp=sharing Comments welcome. -- Jason Etheridge | Community and Migration Man

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Rogan Hamby
We will have to agree to disagree. On Tue, Apr 26, 2016 at 2:32 PM, Elaine Hardy wrote: > For cataloging, ISBN is not a match point. For data cleanup and migration, > it is, at the very least, a bad match point . There are too many potential > errors with 020s to use it as a main match point. We

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Elaine Hardy
For cataloging, ISBN is not a match point. For data cleanup and migration, it is, at the very least, a bad match point . There are too many potential errors with 020s to use it as a main match point. We still have mismatches in our catalog where a vendor used it as the main match point -- as a resu

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Rogan Hamby
I disagree that the 020 can't be used as a match point. I don't think it should be used as the only match point. It is possible to generate errors with the method described in that code. In my experience the benefits of the high number of accurate matches outweighed the bad matches. CiL publish

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-26 Thread Elaine Hardy
Keep in mind that an ISBN (MARC field 020) is not a match point. It is a finding aid. Publishers do reuse ISBNs or use a different ISBN for what is a new printing rather than a new publication (meaning no change in information). Not only can ISBNs for all formats of a title be present on a bib reco

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Rogan Hamby
That is one thing to point out, when it was written originally electronic records were still fairly rare. The consortium it was written for still only uses them in very small numbers and I setup those as distinct bib sources that I modified the bib selection code to exclude. Those are things to l

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Blake Henderson
Whatever method you use I heartily recommend doing so on a testing system and having catalogers look over the results first. You may have already done all the due diligence but I say it for anyone reading along as well. I've never had problems with this method and heard back from others with pos

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Jim Taylor
; Evergreen Discussion Group Subject: Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records Hi Jim, It is available. To be clear I helped create the de-duplication algorithm but the actual coding was done by Galen Charlton of Equinox. You can find it here: http

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Rogan Hamby
Hi Jim, It is available. To be clear I helped create the de-duplication algorithm but the actual coding was done by Galen Charlton of Equinox. You can find it here: http://git.esilibrary.com/?p=migration-tools.git;h=300a04108fc6a3d14424c6d365329be334114f7d The full scope of the script goes a

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Jason Stephenson
On 04/25/2016 02:45 PM, Jim Taylor wrote: Yes. Thank you. I should have written that down right after we talked but didn't. I assume this will also take care of any holds and other related links? It does. Thanks. Jim You're welcome.

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread swills beyond-print.com
Rogan Hamby shared his work with me.  It's a set of SQL procedures that product a 'best bib' and then identifies the less interesting duplicate and it seems to work well.  I modified it so that it produces the candidates but doesn't actually do the merge since we like to have that personal touch

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Jim Taylor
lf Of Jason Stephenson Sent: Monday, April 25, 2016 1:42 PM To: Evergreen Discussion Group Subject: Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records On 04/25/2016 02:24 PM, Jim Taylor wrote: > I raised the question at the conference regarding the ability to merge > records

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Jason Stephenson
On 04/25/2016 02:24 PM, Jim Taylor wrote: I raised the question at the conference regarding the ability to merge records outside the program interface and was told there was a procedure/function that would allow this to be done. Does anyone know where I can find this function? My searching has

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Jim Taylor
-boun...@list.georgialibraries.org] On Behalf Of Janet Schrader Sent: Monday, April 25, 2016 1:35 PM To: 'Evergreen Discussion Group' Subject: Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records Jim, Do you mean something other than record buckets? You can put bib re

Re: [OPEN-ILS-GENERAL] Programmatic Merging of Bibliographic Records

2016-04-25 Thread Janet Schrader
Jim, Do you mean something other than record buckets? You can put bib records in a bucket and then select to merge them. The records open tiled vertically and you select the lead (retained) one. In this interface you can edit the record if you want to include something from the merged record in