any project for record linkage, fuzzy grouping, and deduplication based on Solr/Lucene?

Mobius ReX Mon, 17 Mar 2014 11:00:33 -0700

For example, given a new big department merged from three departments. A
few employees worked for two or three departments before merging. That
means, the attributes of one person might be listed under different
departments' databases. One additional problem is that one person can have
different first names or nick names.


These attributes of a person include
first name, last name, email, home phone, cell phone, ssn, address, etc ...

Because some values of the above could be empty, there is no unique primary
key.
Hence, we need an intelligent solution for the classification, and to put
weights for different matching rules.

Any tips to handle such runtime fast deduplication tasks for big data
(about 100 million records)?
Any open-source project working on this?

any project for record linkage, fuzzy grouping, and deduplication based on Solr/Lucene?

Reply via email to