GWicke created this task. GWicke added a subscriber: GWicke. GWicke added a project: wikidata-query-service. GWicke changed Security from none to none.
TASK DESCRIPTION We need a reliable way to distribute a variety of update events emitted from MediaWiki core (and other services) to various consumers. Currently we use the job queue for this (ex: Parsoid extension), but this is fairly complex, not very reliable and does not support multiple consumers without setting up separate job types. We are looking for a solution that decouples producers from consumers, and gives us better reliability than the current job queue. ## Event type candidates - Wikidata updates: summary of changes (ideally with details of the actual changes) - use case: keeping the #wikidata-query-service up to date - Page edits, moves and visibility changes (page / revision deletion / suppression); pretty much what is tracked in [the Parsoid extension](https://github.com/wikimedia/mediawiki-extensions-Parsoid/blob/817a7581f1ba554415128449b7a0a6a00248a443/Parsoid.hooks.php#L66) - use case: keeping restbase content and caches up to date ## Requirements for an implementation - persistent: state does not disappear on power failure & can support large delays (order of days) for individual consumers - no single point of failure - supports pub/sub consumers with varying speed - ideally, lets various producers enqueue new events (not just MW core) - example use case: restbase scheduling dependent updates for content variants after HTML was updated ## Option 1: Kafka Kafka is a persistent and replicated queue with support for both pub/sub and job queue use cases. We already use it at high volume for request log queueing, so have operational experience and a working puppetization. This makes it a promising candidate. Rough tasks for an implementation: - Set up a kafka instance - Figure out good producer & consumer interfaces - could use raw kafka, but there might be a benefit in some abstraction: Could we use HTTP / websockets? See also: [RESTBase queueing notes](https://github.com/wikimedia/restbase-cassandra/blob/master/doc/QueueBucket.md) - define events & relative order requirements - hook up a synchronous producer to the relevant MediaWiki hooks ## Open questions - Should we abstract over the raw queue interface? - How can we scale this down for third-party users? - Can we build on the existing job queue fall-back? TASK DETAIL https://phabricator.wikimedia.org/T84923 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GWicke Cc: Aklapper, GWicke, jkroll, Smalyshev, Wikidata-bugs, aude, Manybubbles, daniel _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs