[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-05-03 Thread gerritbot
gerritbot added a comment. Change 430585 merged by ArielGlenn: [operations/puppet@production] Create RDF dumps in batches, not all at once https://gerrit.wikimedia.org/r/430585TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emai

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-05-03 Thread gerritbot
gerritbot added a comment. Change 430395 merged by ArielGlenn: [operations/puppet@production] Wikidata entity dumps: Move generic parts into functions https://gerrit.wikimedia.org/r/430395TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/setting

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-05-03 Thread gerritbot
gerritbot added a comment. Change 430585 had a related patch set uploaded (by Hoo man; owner: Hoo man): [operations/puppet@production] Create RDF dumps in batches, not all at once https://gerrit.wikimedia.org/r/430585TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phab

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-05-02 Thread gerritbot
gerritbot added a comment. Change 430395 had a related patch set uploaded (by Hoo man; owner: Hoo man): [operations/puppet@production] Wikidata entity dumps: Move generic parts into functions https://gerrit.wikimedia.org/r/430395TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-29 Thread gerritbot
gerritbot added a comment. Change 425926 merged by ArielGlenn: [operations/puppet@production] Wikidata JSON dump: Only dump batches of ~400,000 pages at once https://gerrit.wikimedia.org/r/425926TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-12 Thread gerritbot
gerritbot added a comment. Change 425926 had a related patch set uploaded (by Hoo man; owner: Hoo man): [operations/puppet@production] [WIP] Wikidata JSON dump: Only dump batches of ~400,000 pages at once https://gerrit.wikimedia.org/r/425926TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAI

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread hoo
hoo added a comment. The dump script calls will basically look like this soon: php repo/maintenance/dumpJson.php --wiki wikidatawiki --first-page-id `expr $i \* 40 \* $shards + 1` --last-page-id `expr \( $i + 1 \) \* 40 \* $shards`TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread ArielGlenn
ArielGlenn added a comment. Works for me.TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, ArielGlennCc: ArielGlenn, Aklapper, hoo, Lahi, Gq86, GoranSMilovanovic, lisong, QZanden, LawExplorer, Wikidata-bug

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread hoo
hoo added a comment. I just noticed that we could also use: php maintenance/sql.php --wiki wikidatawiki --json --query 'SELECT MAX(page_id) AS max_page_id FROM page' | grep max_page_id | grep -oP '\d+' That's maybe simpler for just getting this one bit of information.TASK DETAILhttps://phabricator

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread ArielGlenn
ArielGlenn added a comment. Looks like a good first estimate to me. Remember these things can always be tweaked later.TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, ArielGlennCc: ArielGlenn, Aklapper, h

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread hoo
hoo added a comment. Velocity (taken as average from the three runs listed above): JSON: 214k entities/hour TTL: 189k entities/hour truthy-nt: 157k entities/hour Due to this, I suggest to always run roughly roughly 400k page ids per script run (considering there are possibly missing ones, pages

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread hoo
hoo added a comment. 20180405 truthy-nt dump: Each shard dumped about 8.04m entities in (very roughly) 60h. 20180328 truthy-nt dump: Each shard dumped about 7.96m entities in (very roughly) 47h. TASK DETAILhttps://phabricator.wikimedia.org/T190513EMAIL PREFERENCEShttps://phabricator.wikimedia.org/

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread hoo
hoo added a comment. 20180402 TTL dump: Each shard dumped about 8.01m entities in (very roughly) 49h. 20180326 TTL dump: Each shard dumped about 7.95m entities in (very roughly) 35h. 20180319 TTL dump: Each shard dumped about 7.91m entities in (very roughly) 45h. TASK DETAILhttps://phabricator.wik

[Wikidata-bugs] [Maniphest] [Commented On] T190513: Make sure Wikidata entity dump scripts run for only about 1-2hours

2018-04-10 Thread hoo
hoo added a comment. 20180402 JSON dump: Each shard dumped about 7.70m entities in (very roughly) 40h. 20180326 JSON dump: Each shard dumped about 7.65m entities in (very roughly) 35h. 20180326 JSON dump: Each shard dumped about 7.63m entities in (very roughly) 33h. TASK DETAILhttps://phabricator.