[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-20 Thread jijiki
jijiki updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe, jijiki Cc: eprodromou, Michael, NullPointer, Platonides, hashar, Addshore, Majavah, Ladsgroup, JMeybohm, e

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-20 Thread Joe
Joe closed this task as "Resolved". Joe added a comment. Reporting here in brief: - We confirmed the problem had to do with activating firejail for all executions of external programs. That triggered a kernel bug - This kernel bug can be bypassed by disabling kernel memory accounting in

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-19 Thread Joe
Joe claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: eprodromou, Michael, NullPointer, Platonides, hashar, Addshore, Majavah, Ladsgroup, JMeybohm, ema, Joe, RhinosF1, Ari

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-18 Thread Joe
Joe closed subtask Restricted Task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: eprodromou, Michael, NullPointer, Platonides, hashar, Addshore, Majavah, Ladsgroup, JMeybo

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-18 Thread eprodromou
eprodromou moved this task from Inbox to Tracking/Watching on the Platform Engineering board. eprodromou added a comment. We're tracking this, but unsure as to next steps. Let us know if more active investigation from Platform team is needed. TASK DETAIL https://phabricator.wikimedia.org/T

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-17 Thread ops-monitoring-bot
ops-monitoring-bot added a comment. Completed auto-reimage of hosts: ['mw1359.eqiad.wmnet'] and were **ALL** successful. TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ops-monitori

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-17 Thread ops-monitoring-bot
ops-monitoring-bot added a comment. Script wmf-auto-reimage was launched by cdanis on cumin1001.eqiad.wmnet for hosts: mw1359.eqiad.wmnet The log can be found in `/var/log/wmf-auto-reimage/202008171607_cdanis_15670_mw1359_eqiad_wmnet.log`. TASK DETAIL https://phabricator.wikime

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-14 Thread RhinosF1
RhinosF1 added a comment. In T260281#6385334 , @NullPointer wrote: > I suggest setting this a security issue since this may cause people to //intentionally// make memory leaks to damage servers using this software. If it's related t

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-14 Thread NullPointer
NullPointer added a comment. I suggest setting this a security issue since this may cause people to //intentionally// make memory leaks to damage servers using this software. TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/setting

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-13 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2020-08-13T14:45:55Z] repool mw1382 with kernel memory accounting disabled T260281 TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-13 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2020-08-13T14:38:52Z] reboot mw1382 with kernel memory accounting disabled T260281 TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-13 Thread ema
ema added a comment. `node_vmstat_nr_slab_unreclaimable` is going up indefinitely on nodes affected by the issue, following a pattern that matches the general memory usage. However, the actual amount of "lost" memory does not match the size of unreclaimable slabs, which is only ~2G on mw1357

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-13 Thread ema
ema added a comment. In T260281#6382529 , @ema wrote: > I've installed systemtap on mw1357 Nevermind, I've seen only now that mw1357 is depooled. Here's some preliminary results from mw1359: P12251

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-13 Thread ema
ema added a comment. In T260281#6381768 , @CDanis wrote: > attach a tracepoint to `memcg_schedule_kmem_cache_create` and gather calling stacktraces. That's the function that creates the work item that results in a worker thread calling

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-13 Thread ema
ema updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ema Cc: Ladsgroup, JMeybohm, ema, Joe, RhinosF1, ArielGlenn, jijiki, Aklapper, CDanis, lmata, wkandek, Akuckartz,

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-13 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2020-08-13T08:43:31Z] <_joe_> downgrading imagemagick on mw1378 T260281 TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.o

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-12 Thread Joe
Joe triaged this task as "Unbreak Now!" priority. Joe added projects: Platform Engineering, Wikidata. Joe added a comment. I'm not 100% sure that slabs are the problem here, but I'll try to followup later. In the meantime, the servers we've rebooted yesterday are definitely showing the s