Krinkle added a subscriber: akosiaris.
Krinkle added a comment.

At T204083, @akosiaris wrote:

SRE team noticed that a specific host (mc1023) is close to saturating the uplink network connection [1]. More investigation into the grafana graphs for the entire cluster [2] showed that this is a recurring pattern that seems to follow hosts around. Doing a memkeys on mc1023 we found out that the key

wikibase_shared/1_32_0-wmf_20-wikidatawiki-hhvm:CacheAwarePropertyInfoStore

is doing >600Mbps of traffic. The fact the train version is coded in the key name supports the theory of the key name following the train and being hashed to a different server, explaining the fact the traffic seems to follow hosts around.

This will cause an outage soon, needs to be fixed

[1] https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?panelId=8&fullscreen&orgId=1&var-server=mc1023&var-datasource=eqiad%20prometheus%2Fops&from=now-7d&to=now-1m

[2] https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?orgId=1&from=now-30m&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=All

https://grafana.wikimedia.org/dashboard/db/t204083?orgId=1 shows the excessive traffic moving around the various memcached hosts for the last 1 year.


TASK DETAIL
https://phabricator.wikimedia.org/T97368

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Krinkle
Cc: akosiaris, Krinkle, JanZerebecki, thiemowmde, aude, daniel, Aklapper, hoo, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Dinoguy1000, Mbch331, Jay8g
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to