Addshore added a comment.
When T120242: Consistent MediaWiki state change events | MediaWiki events as source of truth <https://phabricator.wikimedia.org/T120242> is ready we could probably change some of the architecture and process around dumping for Wikidata.org We would likely keep the existing scripts as they are for the Wikibase usecases, and may still want to create a script that generates TTL/RDF from a JSON dump For Wikidata.org we could move towards `edit--> kafka --> streaming job --> WMF-API(get content) --> store on HDFS` And then from HDFS generate JSON, RDF, TTL dumps much faster and consistently This will likely tie into the ongoing wikidata / wikibase subsetting discussions too, as subsetting dumps from HDFS will be much easier than while using the existing systems. See T46581: Partial dumps <https://phabricator.wikimedia.org/T46581> etc. But most of this probably lives in separate tickets. TASK DETAIL https://phabricator.wikimedia.org/T94019 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
