https://bugzilla.wikimedia.org/show_bug.cgi?id=67117

--- Comment #1 from Aaron Schulz <[email protected]> ---
A few things:
* This seems to be missing a ArticleRevisionVisibilitySet handler
* http_post() needs to handle $wgHTTPProxy
* The maximum jobs attempts for the queue will have to be set very high to
avoid update losses (maxTries)
* NewRevisionFromEditComplete and other hooks trigger before COMMIT so the jobs
should probably be delayed (using 'jobReleaseTimestamp'). Moving them
post-COMMIT is not an option since the network partition could cause nothing to
be enqueued (the reverse, a job and no COMMIT, is wasteful but harmless).
* We tend to run lots of jobs for one wiki at a time. http_post() could benefit
from some sort of singleton on the curl handle instead closing it each time.
See
http://stackoverflow.com/questions/972925/persistent-keepalive-http-with-the-php-curl-library.
* createLastChangesOutput() should use LIMIT+DISTINCT instead of a "break"
statement. Also, I'm not sure how well that works. There can only be one job
for hitting the URL that returns this result in the queue, but it only does the
last 60 seconds of changes. Also, it selects rc_timestamp but does not use it
now. Is it OK if the Hub missing a bunch of changes from this (e.g. are the
per-Title jobs good enough?)
* It's curious that the hub is supposed to talk back to a special page, why not
an API page instead? 
* The Link headers also go there. What is the use of these? Also, since they'd
take 30 days to apply to all pages (the varnish cache TTL), it would be a pain
to change them. They definitely need to be stable.

Come to think of it, it seems like we need to send the following events to the
hub:
* New revisions
* Page (un)deletions
* Revision (un)deletions
* Page moves
All of the above leave either edit or log entries in recent changes. Page moves
only leave one at the old title...though rc_params can be inspected to get the
new title. I wonder if instead of a job per title if there can instead be a
single job that sends all changes since the "last update time" and updates the
"last update time" on success. The advantages would be:
a) Far fewer jobs needed
b) All updates would be batched
c) Supporting more hubs is easier since only another job and time position is
needed (rather than N jobs for each hub for each title)
Of course I may have missed something.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to