Hi,

Is it easy to brief the added value (or supported use cases) by switching
to PubSubHubbub?
The edit stream in Wikidata is so huge that I can hardly think of anyone
wanting to be in *real-time* sync with Wikidata
With 20 p/s their infrastructure should be pretty scalable to not break.

Maybe I am biased with DBpedia but by doing some experiments on English
Wikipedia we found that the ideal update with OAI-PMH time was every ~5
minutes.
OAI aggregates multiple revisions of a page to a single edit
so when we ask: "get me the items that changed the last 5 minutes" we skip
the processing of many minor edits

It looks like we lose this option with PubSubHubbub right?
As we already asked before, does PubSubHubbub supports mirroring a wikidata
clone? The OAI-PMH extension has this option

Best,
Dimitris




On Tue, Jul 8, 2014 at 11:31 AM, Daniel Kinzler <[email protected]
> wrote:

> Replying to myself because I forgot to mention an important detail:
>
> Am 08.07.2014 10:22, schrieb Daniel Kinzler:
> > Am 08.07.2014 01:46, schrieb Rob Lanphier:
> >> On Fri, Jul 4, 2014 at 7:16 AM, Lydia Pintscher <
> [email protected]
> > ...
> >> Hi Lydia,
> >>
> >> Thanks for providing the basic overview of this.  Could you (or someone
> on the
> >> team) provide an explanation about how you would like this to be
> configured on
> >> the Wikimedia cluster?
> >
> > We'd like to enable it just on Wikidata at first, but I see no reason
> not to
> > enable it for all projects if that goes well.
> >
> > The PubSubHubbub (PuSH) extension would be configured to push
> notifications to
> > the google hub (two per edit). The hub then notifies any subscribers via
> their
> > callback urls.
>
> We need a proxy to be set up to allow the app servers to talk to the
> google hub.
> If this is deployed on full scale, we expect in excess of 20 POST requests
> per
> second (two per edit), plus up to the same number (but probably fewer) of
> GET
> requests coming back from the hub, asking for the full page content of
> every
> page changed, as XML export, from a special page interface similar to
> Special:Export. This would probably bypass the web cache.
>
> PubSubHubbub is nice and simple, but it's really designed for news feeds,
> not
> for versioned content of massive collaborative sites. It works, but it's
> not as
> efficient as we could wish.
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> _______________________________________________
> Wikidata-tech mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>



-- 
Kontokostas Dimitris
_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to