Addshore created this task.
Addshore added projects: Analytics, Dumps-Generation, Wikidata, wdwb-tech.

TASK DESCRIPTION
  Wikidata dumps currently come directly from the SQL servers.
  The general process here is iterate through all pages, and slowly write all 
content to files (possibly in multiple threads).
  
  An alternative solution could be for Wikidata to produce 2 event streams of 
RDF and JSON output to hadoop, if T120242: Consistent MediaWiki state change 
events | MediaWiki events as source of truth 
<https://phabricator.wikimedia.org/T120242> & T215001: Revisions missing from 
mediawiki_revision_create <https://phabricator.wikimedia.org/T215001> are 
complete.
  In order to not need to wait for T120242 
<https://phabricator.wikimedia.org/T120242> or T215001 
<https://phabricator.wikimedia.org/T215001> this could be implemented 
differently, with a service taking a reliable and consistent input (such as 
MediaWiki recent changes) and populating a reliable stream in kafka of content 
by making requests to Wikidata for the content.
  
  Dumps could then be created directly from hadoop, which I imagine would take 
far less time allowing users to get fresher data, and also benefiting services 
such as #wikidata-query-service 
<https://phabricator.wikimedia.org/tag/wikidata-query-service/> which sometimes 
have to reload from dumps.
  If we could quickly push this data to kafka too, we would likely see some 
reduction in load on s8 db servers, as dump generation would no longer need to 
run. I'm sure #dba <https://phabricator.wikimedia.org/tag/dba/> would 
appreciate this.
  And the new query service flink updater could also make use of the RDF 
stream, instead of using mediawiki revision create events and then requesting 
Special:EntityData.

TASK DETAIL
  https://phabricator.wikimedia.org/T291089

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Addshore, Invadibot, maantietaja, jannee_e, Akuckartz, 4748kitoko, 
holger.knust, Nandana, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, JAllemandou, 
terrrydactyl, Wikidata-bugs, aude, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to