Smalyshev created this task.
Smalyshev claimed this task.
Smalyshev added subscribers: Smalyshev, Manybubbles, daniel.
Smalyshev added projects: Wikidata, Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Right now when the dump is generated, references are identified by content 
has. This means reference to German WIkipedia always produces 
"ref:004ec6fbee857649acdbdbad4f97b2c8571df97". However, since these are many 
such references, the data for this reference is repeated over and over, 
potentially creating thousands of copies of the same information. We need to 
remove the duplicates from the dump - or change the way the hash is generated 
(how?)
  
  Additionally, we may encounter the same problem when importing updates, so we 
must account for this when we make the update procedure.

TASK DETAIL
  https://phabricator.wikimedia.org/T92586

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, GWicke, JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to