JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking, JMeybohm
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE,
Isabelladantes1983
JMeybohm added a comment.
In T362084#9700057 <https://phabricator.wikimedia.org/T362084#9700057>,
@Lucas_Werkmeister_WMDE wrote:
> Can someone clarify what the problem here is? From WBQC’s perspective, it’s
totally expected that some of these regex checks will fail (though the
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1,
AWesterinen, BTullis
JMeybohm added a comment.
In T293063#8582600 <https://phabricator.wikimedia.org/T293063#8582600>,
@dcausse wrote:
> Hey, clarified this a bit, renamed it to "Hard depool/re-pool", yes in this
method the jobs should start right after the helm deploy, the jar is
JMeybohm added a comment.
Hey @dcausse, I'm reading this again because of the upcoming k8s 1.23 upgrade
and was wondering:
In "To restore:" section of "Alternate actions (not fully untested):" - do we
need to start the job somehow as well, specifying w
JMeybohm added a project: serviceops-radar.
Restricted Application added a project: wdwb-tech.
TASK DETAIL
https://phabricator.wikimedia.org/T326409
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: BTullis, JMeybohm, gmodena, Ottomata
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T326409
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse,
Themindcoder, Adamm71, Jersione
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T293063
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: RKemper, Gehel, bking, JMeybohm, Jelto, Aklapper, jijiki, dcausse,
Astuthiodit_1, AWesterinen
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T293063
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: RKemper, Gehel, bking, JMeybohm, Jelto, Aklapper, jijiki, dcausse,
Astuthiodit_1, AWesterinen
JMeybohm added a comment.
In T301147#7821813 <https://phabricator.wikimedia.org/T301147#7821813>,
@dcausse wrote:
> The additional PODs won't be used as a flink job does not automatically
scale so it would be a pure waste of resources (2.5G of reserved mem per
additional POD).
JMeybohm added a comment.
> To be discussed with service ops:
>
> - Investigate and address the reasons why after a node failure k8s did not
fulfill its promise of making sure that the rdf-streaming-updater deployment
have 6 working replicas
The problem was more that the
JMeybohm added a comment.
In T301147#7689837 <https://phabricator.wikimedia.org/T301147#7689837>,
@dcausse wrote:
> @JMeybohm we're still investigating why the application did not properly
recover while kubernetes1014 went down but if you have ideas on the two
questions in t
JMeybohm added a comment.
I'd opt for "reuse the same [flink] cluster" from the perspective that we
treat this snowflaky-ish in the k8s clusters. So less flink-clusters means less
snowflakes (at some point it does become a snowball, right? ).
TASK DETA
JMeybohm added subscribers: Jelto, JMeybohm.
JMeybohm added a comment.
@dcausse IIRC we said that "something in the areas of hours" would be
considered a "short maintenance" and thus would not need any additional actions
to be carried out, right?
As pa
JMeybohm closed this task as "Resolved".
JMeybohm added a comment.
Thanks, closing then.
TASK DETAIL
https://phabricator.wikimedia.org/T287443
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: JMeybohm, dcausse, Aklapper
JMeybohm added a comment.
That is because your application is reading default kubernetes environment
variables which carry the ClusterIP of `kubernetes.default.svc.cluster.local`
instead of it's name. The ClusterIP we unfortunately don't have in the
certificate on the actual servers
JMeybohm claimed this task.
JMeybohm added a comment.
Looking into this.
Problem is that we currently do not allow Pods to access the Kubernetes API
servers (Egress rule is missing) and it's not super trivial to allow that in a
transparent way (e.g. without having to declare the API
JMeybohm added a subscriber: RLazarus.
JMeybohm added a comment.
Picking up from the IRC conversation yesterday @RLazarus figured that the
response body looks like it is
https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/master/errorpages/503.html
At the time this issue
JMeybohm added a comment.
I do see that using the configmap election method is appealing as it is build
in and does not require additional software to function. Unfortunately I was
not able to understand (by briefly reading the docs) if this uses a separate
configmap or the one
JMeybohm added a comment.
It was more a matter of a day than month (as we just upgraded the kubernetes
version in staging). Also we don't enable monitoring for staging in general,
but of cause errors like that should be catched at deploy time. This can
currently be done by running `helmfile
JMeybohm triaged this task as "Medium" priority.
TASK DETAIL
https://phabricator.wikimedia.org/T264821
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: Michael, RhinosF1, Joe, LSobanski, Addshore, Ladsgroup, RLazarus,
JMeybohm added a comment.
Looking at the values today it's pretty clear that mw1382 wins and mw1381
takes the second place.
The overall memory usage looks like it's safe to leave it this way over the
weekend. On Monday we should reboot the clusters again, with
"cgroup.memory=n
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T260329
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: Ladsgroup, Tarrow, Addshore, CDanis, Aklapper, jijiki, ArielGlenn,
RhinosF1, Joe, lmata
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T260329
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: Ladsgroup, Tarrow, Addshore, CDanis, Aklapper, jijiki, ArielGlenn,
RhinosF1, Joe, lmata
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T260329
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: Ladsgroup, Tarrow, Addshore, CDanis, Aklapper, jijiki, ArielGlenn,
RhinosF1, Joe, lmata
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T260329
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: Ladsgroup, Tarrow, Addshore, CDanis, Aklapper, jijiki, ArielGlenn,
RhinosF1, Joe, lmata
JMeybohm updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T260329
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JMeybohm
Cc: Ladsgroup, Tarrow, Addshore, CDanis, Aklapper, jijiki, ArielGlenn,
RhinosF1, Joe, lmata
JMeybohm added a comment.
@Michael thanks for writing this up!
So, if it is safe to assume the MW -> termbox timeout is 3s I would suggest
we configure the envoys accordingly by setting `tls.upstream_timeout: "3s"` in
termbox values.yaml as well as `timeout: "3s&quo
28 matches
Mail list logo