Lucas_Werkmeister_WMDE added a comment.

I ran the conversion directly from the ttl.gz file

Interesting, I couldn’t get that to work and had to pipe gunzip output into the program.

I also tried converting the latest dump, and since I don’t have access to any system with that much RAM, I thought I could perhaps trade some execution time for swap space. Bad idea :) the process got through 20% of the input file and then slowed to a crawl, at data rates of single-digit kilobytes per second. It would’ve taken half a year to finish at that rate.

But FWIW, here’s the command I used, with a healthy dose of systemd sandboxing since it’s a completely unknown program I’m running:

time pv latest-all.ttl.gz |
    gunzip |
    sudo systemd-run --wait --pipe --unit rdf2hdt \
        -p CapabilityBoundingSet=CAP_DAC_OVERRIDE \
        -p ProtectSystem=strict p PrivateNetwork=yes -p ProtectHome=yes -p PrivateDevices=yes \
        -p ProtectKernelTunables=yes -p ProtectControlGroups=yes \
        -p NoNewPrivileges=yes -p RestrictNamespaces=yes \
        -p MemoryAccounting=yes -p CPUAccounting=yes -p BlockIOAccounting=yes -p IOAccounting=yes -p TasksAccounting=yes \
        /usr/local/bin/rdf2hdt -i -f ttl -B 'http://wikiba.se/ontology-beta#Dump' /dev/stdin /dev/stdout \
    >| wikidata-2017-11-01.hdt

I had to make install the program because the libtoolized dev build doesn’t really support being run like that. (See systemd/systemd#7254 for the CapabilityBoundingSet part – knowing what I know now, -p $USER would’ve been the better choice.)

@Smalyshev we discussed dumping the JNL files used by blaze graph directly at points during wikidata con.
I'm aware that isnt a HDT dump, but im wondering if this would help in any way.

Can we reliably get a consistent snapshot of those files when BlazeGraph is constantly writing updates to them?


TASK DETAIL
https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Addshore, Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to