https://bugzilla.wikimedia.org/show_bug.cgi?id=72430
Bug ID: 72430
Summary: Re-implement uniqueness constraint in a consistent and
efficient way
Product: MediaWiki extensions
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: WikidataRepo
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Web browser: ---
Mobile Platform: ---
At present, we have several kinds of uniqueness constraints:
1) a property's label must be unique (per language, among properties)
2) an item's combination of label and description must be unique (per language,
among items, if a description is given)
3) an item's sitelink must be unique (per target wiki).
Each of these is implemented separately; some are checked on every save, some
are checked only on creation and modification of the respective part of the
entity. Some checks are expensive or awkward (the label+description uniqueness
requires a self-join on a big table with a very complex condition).
Proposed solution:
* We add a "fingerprint" table, with two columns: fp_entity and fp_identity.
fp_entity holds the id of the entity that fingerprint belongs to, fp_identity
holds the fingerprint (as a string, which may be a hash). There's a composite
unique key over both columns, and a separate index on the fb_identity column.
* To check for conflicts, we compute all fingerprints of the candidate, and
check if we find any of them in the database. If so, there is a conflict.
* Fingerprints can be computed as sensitive or insensitive as we like (by e.g.
converting to lower case or stripping whitespace before hashing)
* as an added bonus, we can look up any entity by a fingerprint (e.g. items by
sitelink or properties by label) without touching the terms table.
Example fingerprints for the three use cases mentioned above:
1) property label: $fp_fingerprint = "label:{$language}:{$hashOfLabel}";
2) item label+description: $fp_fingerprint =
"label+desc:{$language}:{$hashOfLabelAndDescription}";
// Note: if there is no description in that language, no fingerprint is
generated for that language
3) item sitelinks: $fp_fingerprint = "sitelink:{$site}:{$hashOfPageTitle}";
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l