https://bugzilla.wikimedia.org/show_bug.cgi?id=67117

--- Comment #2 from Alexander Lehmann 
<[email protected]> ---
Hi Aaron.

Thanks for your notes. Some questions to your notes are inline.

(In reply to Aaron Schulz from comment #1)
> A few things:
> * This seems to be missing a ArticleRevisionVisibilitySet handler
> * http_post() needs to handle $wgHTTPProxy
> * The maximum jobs attempts for the queue will have to be set very high to
> avoid update losses (maxTries)
> * NewRevisionFromEditComplete and other hooks trigger before COMMIT so the
> jobs should probably be delayed (using 'jobReleaseTimestamp'). Moving them
> post-COMMIT is not an option since the network partition could cause nothing
> to be enqueued (the reverse, a job and no COMMIT, is wasteful but harmless).

I'm not sure if I understood the problem correctly. How should I set the time
period of the delay? Should the delay dynamically adjust or a should we set a
fixed period of time.

> * We tend to run lots of jobs for one wiki at a time. http_post() could
> benefit from some sort of singleton on the curl handle instead closing it
> each time. See
> http://stackoverflow.com/questions/972925/persistent-keepalive-http-with-the-
> php-curl-library.
> * createLastChangesOutput() should use LIMIT+DISTINCT instead of a "break"
> statement. Also, I'm not sure how well that works. There can only be one job
> for hitting the URL that returns this result in the queue, but it only does
> the last 60 seconds of changes. Also, it selects rc_timestamp but does not
> use it now. Is it OK if the Hub missing a bunch of changes from this (e.g.
> are the per-Title jobs good enough?)
> * It's curious that the hub is supposed to talk back to a special page, why
> not an API page instead? 

The extension provides data in the MediaWiki XML export format, I don't know if
the API is intended for this format. Does it make a real difference to using a
special page?

> * The Link headers also go there. What is the use of these? Also, since
> they'd take 30 days to apply to all pages (the varnish cache TTL), it would
> be a pain to change them. They definitely need to be stable.
> 

They need to be there according to the specification of PubSubHubbub[1].
But they are stable.

> Come to think of it, it seems like we need to send the following events to
> the hub:
> * New revisions
> * Page (un)deletions
> * Revision (un)deletions
> * Page moves
> All of the above leave either edit or log entries in recent changes. Page
> moves only leave one at the old title...though rc_params can be inspected to
> get the new title. I wonder if instead of a job per title if there can
> instead be a single job that sends all changes since the "last update time"
> and updates the "last update time" on success. The advantages would be:
> a) Far fewer jobs needed
> b) All updates would be batched
> c) Supporting more hubs is easier since only another job and time position
> is needed (rather than N jobs for each hub for each title)
> Of course I may have missed something.

Unfortunately, there is no possibility to identify the request of the Hubs.
Theoretically, anyone can call the PubSubHubbub export interface. Therefore, we
do not know the "last update time". The hub cannot provide the time, because
the resource is identified only through the exact URL.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to