https://bugzilla.wikimedia.org/show_bug.cgi?id=72430

            Bug ID: 72430
           Summary: Re-implement uniqueness constraint in a consistent and
                    efficient way
           Product: MediaWiki extensions
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: WikidataRepo
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
       Web browser: ---
   Mobile Platform: ---

At present, we have several kinds of uniqueness constraints:

1) a property's label must be unique (per language, among properties)

2) an item's combination of label and description must be unique (per language,
among items, if a description is given)

3) an item's sitelink must be unique (per target wiki).

Each of these is implemented separately; some are checked on every save, some
are checked only on creation and modification of the respective part of the
entity. Some checks are expensive or awkward (the label+description uniqueness
requires a self-join on a big table with a very complex condition).


Proposed solution:

* We add a "fingerprint" table, with two columns: fp_entity and fp_identity.
fp_entity holds the id of the entity that fingerprint belongs to, fp_identity
holds the fingerprint (as a string, which may be a hash). There's a composite
unique key over both columns, and a separate index on the fb_identity column.

* To check for conflicts, we compute all fingerprints of the candidate, and
check if we find any of them in the database. If so, there is a conflict.

* Fingerprints can be computed as sensitive or insensitive as we like (by e.g.
converting to lower case or stripping whitespace before hashing)

* as an added bonus, we can look up any entity by a fingerprint (e.g. items by
sitelink or properties by label) without touching the terms table.

Example fingerprints for the three use cases mentioned above:

1) property label:  $fp_fingerprint = "label:{$language}:{$hashOfLabel}";

2) item label+description:  $fp_fingerprint =
"label+desc:{$language}:{$hashOfLabelAndDescription}";
// Note: if there is no description in that language, no fingerprint is
generated for that language

3) item sitelinks:  $fp_fingerprint = "sitelink:{$site}:{$hashOfPageTitle}";

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to