Lucas_Werkmeister_WMDE updated the task description. (Show Details) |
CHANGES TO TASK DESCRIPTION
See also [rdfhdt/hdt-cpp#119](https://github.com/rdfhdt/hdt-cpp/issues/119) for some discussion on converting large datasets. For now, it seems that the large memory requirement is expected. The discussion also points to [a MapReduce-based implementation](https://github.com/rdfhdt/hdt-mr), but there haven’t been any commits to it for a year, and I have no idea if it’s currently possible to use it (there seems to be some [build failure](https://github.com/rdfhdt/hdt-mr/issues/6), at least).
...
As for the processing time, on my system 9% of the dump were processed in 23 minutes, so the full conversion would probably take some hours, but not days. The CPU time as reported by Bash’s `time` builtin was actually less than the wall-clock time, so it doesn’t look like the tool is multi-threaded. But of course it’s possible that there is some additional phase of processing after the tool is done reading the file, and I have no idea how long that could take.See also [rdfhdt/hdt-cpp#119](https://github.com/rdfhdt/hdt-cpp/issues/119) for some discussion on converting large datasets. For now, it seems that the large memory requirement is expected. The discussion also points to [a MapReduce-based implementation](https://github.com/rdfhdt/hdt-mr), but there haven’t been any commits to it for a year, and I have no idea if it’s currently possible to use it (there seems to be some [build failure](https://github.com/rdfhdt/hdt-mr/issues/6), at least).
TASK DETAIL
EMAIL PREFERENCES
To: Lucas_Werkmeister_WMDE
Cc: Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
Cc: Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs