GWicke created this task.
GWicke added a subscriber: GWicke.
GWicke added a project: wikidata-query-service.
GWicke changed Security from none to none.

TASK DESCRIPTION
  We need a reliable way to distribute a variety of update events emitted from 
MediaWiki core (and other services) to various consumers. Currently we use the 
job queue for this (ex: Parsoid extension), but this is fairly complex, not 
very reliable and does not support multiple consumers without setting up 
separate job types.
  
  We are looking for a solution that decouples producers from consumers, and 
gives us better reliability than the current job queue.
  
  ## Event type candidates
  
  - Wikidata updates: summary of changes (ideally with details of the actual 
changes)
    - use case: keeping the #wikidata-query-service up to date
  - Page edits, moves and visibility changes (page / revision deletion / 
suppression); pretty much what is tracked in [the Parsoid 
extension](https://github.com/wikimedia/mediawiki-extensions-Parsoid/blob/817a7581f1ba554415128449b7a0a6a00248a443/Parsoid.hooks.php#L66)
    - use case: keeping restbase content and caches up to date
  
  ## Requirements for an implementation
  
  - persistent: state does not disappear on power failure & can support large 
delays (order of days) for individual consumers
  - no single point of failure
  - supports pub/sub consumers with varying speed
  - ideally, lets various producers enqueue new events (not just MW core)
    - example use case: restbase scheduling dependent updates for content 
variants after HTML was updated
  
  ## Option 1: Kafka
  
  Kafka is a persistent and replicated queue with support for both pub/sub and 
job queue use cases. We already use it at high volume for request log queueing, 
so have operational experience and a working puppetization. This makes it a 
promising candidate.
  
  Rough tasks for an implementation:
  
  - Set up a kafka instance
  - Figure out good producer & consumer interfaces
    - could use raw kafka, but there might be a benefit in some abstraction: 
Could we use HTTP / websockets? See also: [RESTBase queueing 
notes](https://github.com/wikimedia/restbase-cassandra/blob/master/doc/QueueBucket.md)
  - define events & relative order requirements
  - hook up a synchronous producer to the relevant MediaWiki hooks
  
  ## Open questions
  
  - Should we abstract over the raw queue interface?
  - How can we scale this down for third-party users?
  - Can we build on the existing job queue fall-back?

TASK DETAIL
  https://phabricator.wikimedia.org/T84923

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GWicke
Cc: Aklapper, GWicke, jkroll, Smalyshev, Wikidata-bugs, aude, Manybubbles, 
daniel



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to