gerritbot added a comment.
Change 430585 merged by ArielGlenn:
[operations/puppet@production] Create RDF dumps in batches, not all at once
https://gerrit.wikimedia.org/r/430585TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emai
gerritbot added a comment.
Change 430395 merged by ArielGlenn:
[operations/puppet@production] Wikidata entity dumps: Move generic parts into functions
https://gerrit.wikimedia.org/r/430395TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/setting
gerritbot added a comment.
Change 430585 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[operations/puppet@production] Create RDF dumps in batches, not all at once
https://gerrit.wikimedia.org/r/430585TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phab
gerritbot added a comment.
Change 430395 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[operations/puppet@production] Wikidata entity dumps: Move generic parts into functions
https://gerrit.wikimedia.org/r/430395TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCES
gerritbot added a comment.
Change 425926 merged by ArielGlenn:
[operations/puppet@production] Wikidata JSON dump: Only dump batches of ~400,000 pages at once
https://gerrit.wikimedia.org/r/425926TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/
gerritbot added a comment.
Change 425926 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[operations/puppet@production] [WIP] Wikidata JSON dump: Only dump batches of ~400,000 pages at once
https://gerrit.wikimedia.org/r/425926TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAI
hoo added a comment.
The dump script calls will basically look like this soon:
php repo/maintenance/dumpJson.php --wiki wikidatawiki --first-page-id `expr $i \* 40 \* $shards + 1` --last-page-id `expr \( $i + 1 \) \* 40 \* $shards`TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL
ArielGlenn added a comment.
Works for me.TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, ArielGlennCc: ArielGlenn, Aklapper, hoo, Lahi, Gq86, GoranSMilovanovic, lisong, QZanden, LawExplorer, Wikidata-bug
hoo added a comment.
I just noticed that we could also use:
php maintenance/sql.php --wiki wikidatawiki --json --query 'SELECT MAX(page_id) AS max_page_id FROM page' | grep max_page_id | grep -oP '\d+'
That's maybe simpler for just getting this one bit of information.TASK DETAILhttps://phabricator
ArielGlenn added a comment.
Looks like a good first estimate to me. Remember these things can always be tweaked later.TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, ArielGlennCc: ArielGlenn, Aklapper, h
hoo added a comment.
Velocity (taken as average from the three runs listed above):
JSON: 214k entities/hour
TTL: 189k entities/hour
truthy-nt: 157k entities/hour
Due to this, I suggest to always run roughly roughly 400k page ids per script run (considering there are possibly missing ones, pages
hoo added a comment.
20180405 truthy-nt dump: Each shard dumped about 8.04m entities in (very roughly) 60h.
20180328 truthy-nt dump: Each shard dumped about 7.96m entities in (very roughly) 47h.
TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/
hoo added a comment.
20180402 TTL dump: Each shard dumped about 8.01m entities in (very roughly) 49h.
20180326 TTL dump: Each shard dumped about 7.95m entities in (very roughly) 35h.
20180319 TTL dump: Each shard dumped about 7.91m entities in (very roughly) 45h.
TASK DETAILhttps://phabricator.wik
hoo added a comment.
20180402 JSON dump: Each shard dumped about 7.70m entities in (very roughly) 40h.
20180326 JSON dump: Each shard dumped about 7.65m entities in (very roughly) 35h.
20180326 JSON dump: Each shard dumped about 7.63m entities in (very roughly) 33h.
TASK DETAILhttps://phabricator.
14 matches
Mail list logo