Lucas_Werkmeister_WMDE updated the task description. (Show Details)

CHANGES TO TASK DESCRIPTION
...
As for the processing time, on my system 9% of the dump were processed in 23 minutes, so the full conversion would probably take some hours, but not days. The CPU time as reported by Bash’s `time` builtin was actually less than the wall-clock time, so it doesn’t look like the tool is multi-threaded. But of course it’s possible that there is some additional phase of processing after the tool is done reading the file, and I have no idea how long that could take.

See also [rdfhdt/hdt-cpp#119](https://github.com/rdfhdt/hdt-cpp/issues/119) for some discussion on converting large datasets. For now, it seems that the large memory requirement is expected. The discussion also points to [a MapReduce-based implementation](https://github.com/rdfhdt/hdt-mr), but there haven’t been any commits to it for a year, and I have no idea if it’s currently possible to use it (there seems to be some [build failure](https://github.com/rdfhdt/hdt-mr/issues/6), at least).

TASK DETAIL
https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to