Smalyshev created this task. Smalyshev claimed this task. Smalyshev added subscribers: Smalyshev, Manybubbles, daniel. Smalyshev added projects: Wikidata, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION Right now when the dump is generated, references are identified by content has. This means reference to German WIkipedia always produces "ref:004ec6fbee857649acdbdbad4f97b2c8571df97". However, since these are many such references, the data for this reference is repeated over and over, potentially creating thousands of copies of the same information. We need to remove the duplicates from the dump - or change the way the hash is generated (how?) Additionally, we may encounter the same problem when importing updates, so we must account for this when we make the update procedure. TASK DETAIL https://phabricator.wikimedia.org/T92586 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, JanZerebecki _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
