https://bugzilla.wikimedia.org/show_bug.cgi?id=62860

--- Comment #1 from Krinkle <krinklem...@gmail.com> ---
The CVNBot software used for #cvn-sw, #cvn-commons, #cvn-meta etc. can't
provide this because of limitations in the source feed, irc.wikimedia.org,
which doesn't expose revision change tags.

Looks like realistically the fast way to make this happen is probably a little
nodejs project ran from tool labs that polls wikis using one of these
approaches:

1) Socket to irc.wikimedia.org, join all channels, filter for lines that look
like edits/page creations (any line that includes a url with rcid), extract
rcid from url, make API request and retrieve change tags.

Pros:
 - Only one socket for events.
 - No API polling.

Cons: 
 - It's friggin IRC
 - Still requires an API request, looots of them (one for every edit/newpage
across all of Wikimedia). Could be done in batches when implementing it with a
short local delay/buffer before outputting it, but still a ton of requests.

1b) Alternative: Like #1a, but do the changetag retrieval via SQL query to
labsdb instead of API request.

Pros:
 - Only one socket for events.
 - No API polling.
 - No API requests at all.
Cons:
 - dbreplag might cause problems.


2) Have the app generate a list of API entry points for all Wikimedia wikis
(using either operations/mediawiki-config data or using centralauth/sitematrix
API), and poll all these periodically for action=recentchanges, taking care to
ensure we don't miss edits (lower query is faster/cheaper, but means if there
is more than limit N number of edits since the last query, you miss out).

Pros:
 - Edit information included in main event stream (ApiRecentChanges).

Cons:
 - API polling.
 - One API request for each wiki, at an interval.
 - Not missing events is going to be hard.

3) Have the app fetch a list of wikis from labsdb.meta.wiki, open 1 connection
for each db shard, and start polling recentchanges for each wiki (using WHERE
query to find everything since the last poll, potentially LIMIT still to keep
things rate limited)

Pros:
 - Edit information included in main event stream (recentchanges table).
 - Only a few sockets needed (7 or 8) to be able to query all 100s of wikis.
 - No API polling.
 - No API requests at all.

Cons:
 - Slight delay due to dbreplag to labs, but we don't use anything else so it's
consistent should the app should be blind to it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to