dcausse added a comment.
Indeed, the RDF data is available in the hive table `discovery.wikibase_rdf` but it is generated reading the TTL dumps so it might not help for this particular task. Using hadoop will indeed allow to process the json efficiently but has drawbacks as already pointed out: - requires maintaining the Wikibase -> RDF projection in multiple codebases (PHP wikibase & in spark) - once created from the hadoop cluster it will have to be pushed back to the labstore machine for public consumption and might add extra delay TASK DETAIL https://phabricator.wikimedia.org/T94019 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Addshore, toan, Tonina_Zhelyazkova_WMDE, JAllemandou, Pintoch, Smalyshev, hoo, Liuxinyu970226, mkroetzsch, Aklapper, daniel, Invadibot, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
