| thalhamm added a comment. |
@Smalyshev, I think we might check first if the type of output is of any use for you. You can get most info (e.g. output/input format) at http://people.aifb.kit.edu/ath/#Wikidata_PageRank. It is not run on Hadoop and it takes fairly little resources (actually it can be optimized to run on a laptop with 16gb of ram). Currently, there are no optimizations in place and we use about 200GB of RAM (processing power doesn't matter). In case good use cases exist and the output it has been verified that the current output is of any use, as next steps I would consider the following:
- transform the actual link datasets of Wikipedia to a processable format (similar to the output of DBpedia pagelinks)
- develop a processing pipeline as a docker file and make all source code available under a free license
TASK DETAIL
EMAIL PREFERENCES
To: thalhamm
Cc: Smalyshev, thalhamm, thiemowmde, Sjoerddebruin, Glorian_Yapinus, Aklapper, QZanden, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331
Cc: Smalyshev, thalhamm, thiemowmde, Sjoerddebruin, Glorian_Yapinus, Aklapper, QZanden, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
