Hi everyone, TL;DR SRE will be co-owning, together with Thomas Chin of Data Platform Engineering, the service-utils <https://www.mediawiki.org/wiki/Service-utils> Node.js utilities library, service-runner <https://www.mediawiki.org/wiki/Service-runner>'s spiritual successor and replacement. If you are thinking about starting a new Node.js service to be deployed in production, using service-utils is expected to greatly reduce friction, speed up development and allow you to focus on your business needs instead of how to satisfy SRE requirements for getting the service deployed. If you already own one or more services that is (are) service-runner powered, consider migrating to service-utils at your convenience. Background
Service-runner <https://github.com/wikimedia/service-runner> is a library that provides generalized runtime facilities for Node.js services, including: - a standard worker cluster setup with restarts, - a generalized YAML config format with support for running multiple services in a single process, - runtime facilities for - logging - metrics reporting - rate limiting. Usage in Foundation produced code and micro services is thought to have increased the productivity of developer and SRE teams by abstracting and automating away, the important but unrelated to the actual goals of the developer teams, requirements mentioned above. Ongoing support from the owning team, which is no longer extant, the Services team, also played a pivotal role. Out of the 20 NodeJS apps (powering 25 services), 15 currently use it (those not using it are either pioneering service-utils already or are trying out other solutions). Unfortunately, it is also abandoned. Node.js developers have noticed and the Phabricator workboard <https://phabricator.wikimedia.org/project/view/1062/> makes this evident, with Task Titles like "service-runner has vulnerable and outdated dependencies", "service-runner depends on preq, a wrapper of request, which is deprecated". Some efforts have been made to find a replacement. Thomas Chin has been kind enough to create service-utils, a library designed to be compatible (if not a drop-in replacement in many cases) with service-runner. However, maintaining this library isn't Thomas' nor the Data Platform Engineering team's mission. This has, unsurprisingly, led to lower than wished for (by SRE at least) adoption. Next steps SRE will be, starting in Q2 2025-2026 - Becoming more familiar with the code base - Updating documentation as needed. - Helping with ongoing maintenance of the library - Providing help and guidance for migration from service-runner to service-utils - Going through the backlog of work tracked in the corresponding Phabricator work boards for the 2 projects and implement/resolve bug/decline/stall as deemed appropriate, always after discussions with relevant stakeholders. - Contributing the following features they are interested in: - Abstracting away talking to the service-mesh - Finish testing and rolling out support for Open Telemetry - Announcing the full deprecation of service-runner, archival of code repositories and removal from library repositories (i.e. npm) when all, in scope, code bases have been migrated over. FAQIs this a full drop-in replacement? No. It's close enough for most use cases. However, service-runner is a decade old code base. Some of the functionalities it provides either no longer make sense in 2025 (e.g. worker cluster setup in a Kubernetes environment is not needed), made assumptions about the Infrastructure that no longer apply (e.g. rate limiting) or rely on entirely abandoned and non-salvageable libraries (e.g. kad that implements rate limiting). Those are not and will not be supported in service-utils. When should my team migrate to/adopt this? If you start a new Node.Js service in the WMF, go for this from day 1. If you already run a service-runner powered service, at your earliest convenience. Who do I talk to if I want some functionality implemented before I adopt this? Faster path forward is probably to come to either #wikimedia-sre on IRC or #talk-to-sre in Slack. SRE Service operations will respond. Is this tracked in the annual plan? Yes, most work is already under KR WE6.2, aka Production Readiness <https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2025-2026/Product_%26_Technology_OKRs#Accelerate_Path_to_Product_Outcomes_(WE6)>. Some parts of the work described above will be ongoing and tracked under Annual Essential Work
_______________________________________________ Wikitech-l mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
