Participants 1. davido 2. guilhem 3. Brett 4. Christian Agenda * Upgrade Gerrit to 2.13.10 + Issue filed in Redmine + Who is working on it + https://redmine.documentfoundation.org/issues/2463 - G. Do we need to go through that again? The issue was filed and assigned, we'll follow up from there (no update there = no news) + Next steps + Set up staging gerrit instance and sync production data - Assigned target to Q1
* Monitoring update + Brett: I believe Prometheus to be the better solution for infra monitoring. - Prometheus is actively maintained by the community rather than TICK's reliance on InfluxDB. Exporters (data collection binaries) are already available in debian stable and debian backports (even the shiny, new 2.0 release). TICK requires external repositories and would require some auditing (see: Chronograf phoning home by default). . G. yay, agreed :-) . What's up with the docker containers on vm213? Brett: I had just installed a bunch of throwaway services for metrics testing. . grafana: http://localhost:3000/dashboard/db/node-exporter-full (forward 3000/TCP to vm213 first) . prometheus: http://localhost:9090/consoles/node-overview.html (forward 9090/TCP to vm213 first) - Prometheus does not have a useful built-in dashboard, only ad-hoc query input: They recommend using Grafana for that. Presently, Debian only has a package in Sid. . Package was removed from testing during the freeze due to two RC bugs, and was subsequently orphaned by its maintainer <https://bugs.debian.org/876648> . G. Not a blocker for us: I'd refrain from using third party repos when possible, but we need a single installation of that package (installing prometheus/telegraph from a third party repo on every single host would be another story…) Might even step in and adopt the package if its maintenance is not a burden :-) - https://prometheus.io/docs/introduction/faq/#how-does-prometheus-compare-against-other-monitoring-systems? - Debian only has a small number of exporters available in the repos - We'd have to manually install/configure any additional exporters from https://prometheus.io/docs/instrumenting/exporters/ . Not a blocker, we can build ourselves and tell salt to install the .deb - G. Confidentiality and integrity protection . exporters installed to vm191 and vm213 for now; they currently communicate via intranet (private IP range) . monitoring server needs to be offsite (eg, reuse monitoring.tdf), and metrics need to be protected: either with SSL/TLS, or with IPsec/VPN/… → go for TLS tunnels, most host have a nginx instance anyway . client auth: HTTP digest or client certs (ECDSA for minimum overhead) . server (exporter) auth: server certs (ECDSA for minimum overhead) . SSL/TLS protection needs an extra SSL termination proxy (eg nginx, or stunnel4); unfortunately both client and server insist on verifying the chain for mutual auth — instead of pinning the key material… - Dashboards (prometheus & grafana) protection: refrain from using SSO here so admins are not locked out if the SAML IdP or LDAP server are down. - Each exporter listens on a different INET port (91XY) and speak HTTP with ‘/metrics’ as entry path. Do we want to open gazilion of ports, or a single port with multiple entry paths behind the reverse proxy? (eg, ‘/MySQL/metrics’) - Status of monitoring.documentfoundation.org . Ubuntu 14.04.5 LTS root server, hosted at filoo . guilhem: Would prefer a Debian (stretch) box instead for the sake of uniformity, ok to wipe and recycle? . AI guilhem: get in touch with filoo if there is no rescue boot - Users have requested a public dashboard / status page, with basic info (no graph) such as service/host up/down and a custom field we (admin team) can fill to tell them we're aware of the problem and are working on it . Probably doable with prometheus API(?) . Exposing blackbox exporter metrics from our various services would probably be enough (HTTP return code and timing) . Example: https://status.lineageos.org/ , powered by https://hund.io/ → Sponsored service? + Is there a desire for just infra monitoring or application-level monitoring as well? If we need application-level monitoring, the ELK stack is recommended: https://www.elastic.co/webinars/introduction-elk-stack - At least mail queue, database (MySQL, PostgreSQL, slapd) operations, HTTPd response code & timing . Brett: These can be handled by prometheus, though AFAIK apache/nginx need a module to get in-depth stats. - ELK: ElasticSearch + LogStash + Kibana - LogStash vs. graylog pro & cons? (We already have an instance of the latter) . Brett: I forgot about graylog, sorry. I see no reason to switch from it. . OK let's keep it then + Alerting - The alert system of our current Incinga-based monitoring system is (mostly) not working, and having working alerts is an incentive for refactoring the monitoring system; so we really want that one to work :-) - Threshold-based is good enough - As discussed a few calls ago, needs to be schedulable so volunteers aren't awoken during their vacation . https://github.com/prometheus/alertmanager/issues/876 is a feature request from last year with no priority :( . Possible workaround at https://github.com/prometheus/alertmanager/issues/517#issuecomment-250918957. There are mentions that using the prometheus API/webhooks could work. . Wouldn't it be up to the volunteer to silence alarms when on vacation? → It's also about week-ends and night: we don't want to give all infra volunteers the feeling that they are on duty - Need (at least) SMS *and* mail . Prometheus' alertmanager can be bridged to an SMS provider https://github.com/messagebird/sachet . Brett: I've had success with using email as provided by telecoms. e.g. I use T-Mobile (Deutsche Telekom) and can email 1231231...@tmomail.net to get a text. https://support.t-mobile.com/docs/DOC-3309 . How about other countries? Not aware of a Swedish provider offering a similar service . also sipgate has API (RPS & REST) to send sms (https://teamhelp.sipgate.de/hc/de/articles/207867549-Die-sipgate-APIs-im-%C3%9Cberblick german entry page to various api docs (spec in English)) . so does pingdom.com * SSO adoption <https://user.documentfoundation.org>: + 572 accounts in total (72 since the last infra call) + Nextcloud is now using SAML (unauthenticated users are redirected to auth.tdf); accounts not in LDAP yet are *locked out* + All MC members now have a LDAP account, shared creds (HTTP digest auth) is now deprecated + governance: 1/10 board member missing; 40/190 (21%) TDF members missing + contributors: 84/175 (48%) recent (last 90 days) wiki editors missing + Need to resume the redmine migration to SAML * Next call: Tuesday March 20 2018 at 18:30 Berlin time (17:30 UTC) -- Guilhem. -- To unsubscribe e-mail to: website+unsubscr...@global.libreoffice.org Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette List archive: https://listarchives.libreoffice.org/global/website/ All messages sent to this list will be publicly archived and cannot be deleted