thalhamm added a comment.

@Smalyshev, I think we might check first if the type of output is of any use for you. You can get most info (e.g. output/input format) at http://people.aifb.kit.edu/ath/#Wikidata_PageRank. It is not run on Hadoop and it takes fairly little resources (actually it can be optimized to run on a laptop with 16gb of ram). Currently, there are no optimizations in place and we use about 200GB of RAM (processing power doesn't matter). In case good use cases exist and the output it has been verified that the current output is of any use, as next steps I would consider the following:

  • transform the actual link datasets of Wikipedia to a processable format (similar to the output of DBpedia pagelinks)
  • develop a processing pipeline as a docker file and make all source code available under a free license

TASK DETAIL
https://phabricator.wikimedia.org/T143424

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thalhamm
Cc: Smalyshev, thalhamm, thiemowmde, Sjoerddebruin, Glorian_Yapinus, Aklapper, QZanden, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to