akosiaris created this task.
akosiaris added projects: Wikidata, wikiba.se, Operations.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION

SRE team noticed that a specific host (mc1023) is close to saturating the uplink network connection [1]. More investigation into the grafana graphs for the entire cluster [2] showed that this is a recurring pattern that seems to follow hosts around. Doing a memkeys on mc1023 we found out that the key

wikibase_shared/1_32_0-wmf_20-wikidatawiki-hhvm:CacheAwarePropertyInfoStore

is doing >600Mbps of traffic. The fact the train version is coded in the key name supports the theory of the key name following the train and being hashed to a different server, explaining the fact the traffic seems to follow hosts around.

This will cause an outage soon, needs to be fixed

[1] https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?panelId=8&fullscreen&orgId=1&var-server=mc1023&var-datasource=eqiad%20prometheus%2Fops&from=now-7d&to=now-1m

[2] https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?orgId=1&from=now-30m&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=All


TASK DETAIL
https://phabricator.wikimedia.org/T204083

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: akosiaris
Cc: Aklapper, mark, Krinkle, Joe, akosiaris, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, LawExplorer, Zppix, Wong128hk, Wikidata-bugs, aude, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to