#548: BibMatch: match validation
--------------------------+--------------------------------
Reporter: jlavik | Owner: jlavik
Type: enhancement | Status: closed
Priority: major | Milestone:
Component: BibMatch | Version:
Resolution: fixed | Keywords: matching, workflow
--------------------------+--------------------------------
Changes (by Jan Aage Lavik <jan.age.lavik@…>):
* status: in_merge => closed
* resolution: => fixed
Comment:
In [d39a330c306e3dcb2f70bca11c143df91747bc4e]:
{{{
#!CommitTicketReference repository=""
revision="d39a330c306e3dcb2f70bca11c143df91747bc4e"
BibMatch: match validation
* Adds a new sub-module for comparing records after searching
for potentially matching records, called the match validation
step. (fixes #548)
* Various methods are used when comparing records, for example
special metrics for comparing authors, titles and identifiers.
These comparison methods are configurable per (sub-)field and
acts as rules for matching records. These rules can be grouped
in rulesets using regular expressions, allowing records to
be compared differently based on content. (fixes #183)
* For an exact match to happen all defined comparison rules must
succeed. If they do not all succeed, but the ratio of success
is above a certain (configurable) limit, the match is considered
fuzzy. Two or more matching fields MUST be found, unless
certain MARC fields have been configured as 'final' or 'joker'
types, i.e. identifier fields such as DOI or ISBN.
* Another configurable is added to control the limit of maximum
number of search results to compare for a single search query.
* Both match validation and fuzzy searching are toggleable using the
CLI commands '--no-valid' and '--no-fuzzy' respectively.
* New command available, '--ascii', for transliterating record values
to ASCII before being used in searching and matching. XML entities,
like &, are transformed to UTF-8 before searches.
* Adds a configuration module specific for BibMatch internal globals.
* Enables automatic logging of BibMatch runs, providing information
about record matching results.
* Also adds applicable regression tests, a new unit-test module and
brand new admin and hacking guides.
* Detects if any input records are badly parsed by BibRecord.
}}}
--
Ticket URL: </ticket/548#comment:3>
Invenio <http://invenio-software.org>