Re: RFC: bibupload --merge for WebSubmit

Cristian Bacchi Tue, 31 May 2011 15:40:22 +0200

Thanks very much!

The interactive workflow you described seems to completely cover the problem
of the “back-office merge”. If I understood: harvesting (or some
batch-importing method) > BibUpload > holding-pen-area > IDs matching
control (probably configurable?) > RT ticketing > manual validation of
preselected records, using BibMerge.


It looks like the solution used by some Integrated Library System to solve
sync between collaborative networks (especially for merging multiple
localizations of the same book record).

Please, receive this expression of interest in BibMerge and any other
administrative web interface :-)

Cheers,
Cristian

On Tue, May 31, 2011 at 1:14 PM, Samuele Kaplun <[email protected]>wrote:

> Ciao Cristian!
>
> Il giorno mar, 31/05/2011 alle 11.43 +0200, Cristian Bacchi ha scritto:
> > May I try to shift your interesting use-case from [user > submission]
> > scenario to [library-collaborative-network > import] scenario?
>
> indeed my stress was originally centered on WebSubmit as we are
> currently implementing at CDS a generic submission interface that might
> indeed modify records of heterogeneous origin.
>
> > It should became: “Imagine a [import or harvesting task] that might
> > allow a [library of the collaborative network] to modify records that
> > might have not been created by the same [task] (say they are of other
> > types, or they have been imported on the fly from another [library of
> > the collaborative network])”
> >
> > Please, don't consider this scenario too strange: for example, it's
> > the case of “the incoming modification B” (proposed by one library)
> > centered only on the descriptive section of “the original record
> > A” (created by another library, and already present in Invenio); the
> > administrative part of the record (say: holdings, localizations, …)
> > has to be preserved merging A and B.
>
> If you would accept to have an interactive solution to this issue (i.e.
> not automatically solved by an algorithm), this is addressed by recent
> development in the form of the BibMerge module and the integration with
> the RT ticketing system. I am not the best one to explain these two, so
> other developers might wish to step into this discussion and correct me.
> BibMerge is a new administrative web inteface that let a cataloguer
> merging two records by comparing them side by side at the MARC level,
> showing differences (as the UNIX utility diff). It let you choose from
> both records the correct field/subfield and eventually create new merged
> ones. Moreover this can be integrated with the RT ticketing system,
> allowing a network of catalogers, world wide distributed, to share the
> burden of this task and to coordinate, in order not to step on each
> other foot. This can be integrated with the BibUpload holding pen, in
> the following workflow: new records being harvested are bibuploaded into
> a separate are (called indeed holding pen). If the record happen to
> match (because of some IDs) already existing records, catalogers are
> notified through RT and will manually merge the record via BibMerge.
>
> > Now, remaining in your example of “handles the author field as
> > composed by a name and affiliation”, and dealing with the case “for
> > every field in A that has a subfield $8 (i.e. origin), which happens
> > to be mentioned in a corresponding field in B, [..action..]”, my
> > question is: do you mean by “happens to be mentioned” that you also
> > compare the content of the field/subdield in addition to the
> > tag/character?
>
> So, the algorithm I was proposing works by considering one MARC field at
> a time, side by side. If the record A has five 700 fields, and record B
> has four 700 fields, the two sets are aligned (thus the last field of A
> is discarded), and subfields are merged:
>
> A       B               RESULT
> -----------------------------------
> $afoo   $abar $abaz     $abar $abaz
> $afoo   $bbar           $afoo $bbar
>
> My above mention to $8 is in order to align fields having the same value
> in $8 both in A and in B. This is because often, in Invenio, we
> distinguish by putting a specific value in $8 when the given field was
> added in the record through some external source (e.g. keywords
> extracted by BibClassify, or references extracted by RefExtract).
>
> > And, if yes: will the comparison (between the same field/subfield of
> > records A and B) be made also evaluating all the occurrences of
> > repeatable field/subfield in both sides?
>
> Yes, no special care is needed in distinguishing repeatable subfields.
> Simply all the subfields with the same code from A are substituted by
> all the subfields with the same code from B.
>
> > Trying to make sense for my question: in library collaborative network
> > that kind of merging (and comparison) is made on fields which also
> > present some subfield with coded-data or even ID (like the
> > authoriry-list-ID of the name, ore the code of the library).
>
> Indeed, this is the tricky part, where an ID is contained in a subfield,
> and is identifying the whole field (e.g. an author ID). The proposed
> algorithm can only work if the incoming record has preserved the
> existing field (in the worst case it should only correct typos). But it
> would work in the case e.g. where an author FOO was in the first
> occurrence of record A (and an ID for FOO was in a specific subfield),
> and the incoming record want to put in the first position an author BAR,
> without specifing an ID. In this case, the above algorithm would
> preserve the ID of FOO and store it whith the author BAR.
>
> This would rarely happen in case of WebSubmit where I am imagining a
> submitter having a form pre-filled with all the understood data of
> record A and providing additional information/corrections to create
> record B, but might well happen in your case when merging two
> pre-existing records coming from two different sources. In this case I
> think the only solution would be a (computer aided) manual merging by
> the above mentioned tools.
>
> Cheers!
>        Sam
> --
> Samuele Kaplun
> Invenio Developer ** <http://invenio-software.org/>
>
>

Re: RFC: bibupload --merge for WebSubmit

Reply via email to