Hi, Is it easy to brief the added value (or supported use cases) by switching to PubSubHubbub? The edit stream in Wikidata is so huge that I can hardly think of anyone wanting to be in *real-time* sync with Wikidata With 20 p/s their infrastructure should be pretty scalable to not break.
Maybe I am biased with DBpedia but by doing some experiments on English Wikipedia we found that the ideal update with OAI-PMH time was every ~5 minutes. OAI aggregates multiple revisions of a page to a single edit so when we ask: "get me the items that changed the last 5 minutes" we skip the processing of many minor edits It looks like we lose this option with PubSubHubbub right? As we already asked before, does PubSubHubbub supports mirroring a wikidata clone? The OAI-PMH extension has this option Best, Dimitris On Tue, Jul 8, 2014 at 11:31 AM, Daniel Kinzler <[email protected] > wrote: > Replying to myself because I forgot to mention an important detail: > > Am 08.07.2014 10:22, schrieb Daniel Kinzler: > > Am 08.07.2014 01:46, schrieb Rob Lanphier: > >> On Fri, Jul 4, 2014 at 7:16 AM, Lydia Pintscher < > [email protected] > > ... > >> Hi Lydia, > >> > >> Thanks for providing the basic overview of this. Could you (or someone > on the > >> team) provide an explanation about how you would like this to be > configured on > >> the Wikimedia cluster? > > > > We'd like to enable it just on Wikidata at first, but I see no reason > not to > > enable it for all projects if that goes well. > > > > The PubSubHubbub (PuSH) extension would be configured to push > notifications to > > the google hub (two per edit). The hub then notifies any subscribers via > their > > callback urls. > > We need a proxy to be set up to allow the app servers to talk to the > google hub. > If this is deployed on full scale, we expect in excess of 20 POST requests > per > second (two per edit), plus up to the same number (but probably fewer) of > GET > requests coming back from the hub, asking for the full page content of > every > page changed, as XML export, from a special page interface similar to > Special:Export. This would probably bypass the web cache. > > PubSubHubbub is nice and simple, but it's really designed for news feeds, > not > for versioned content of massive collaborative sites. It works, but it's > not as > efficient as we could wish. > > -- > Daniel Kinzler > Senior Software Developer > > Wikimedia Deutschland > Gesellschaft zur Förderung Freien Wissens e.V. > > _______________________________________________ > Wikidata-tech mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > -- Kontokostas Dimitris
_______________________________________________ Wikidata-tech mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
