Hey Gergo, thanks for the heads up! The big questions here is: how does it scale? Sending events to 100 clients may work, but does it work for 100 thousand?
And then there's several more important details to sort out: What's the granularity of subscription - a wiki? A page? Where does filtering by namespace etc happen? How big is the latency? How does recovery/re-sync work after disconnect/downtime? I have not read the entire conversation, so the answers might already be there - my appologies if they are, just point me there. Anyway, if anyone has a good solution for sending wiki-events to a large number of subscribers, yes, please let us (WMDE/Wikidata) know about it! Am 26.09.2016 um 22:07 schrieb Gergo Tisza: > On Mon, Sep 26, 2016 at 5:57 AM, Andrew Otto <o...@wikimedia.org> wrote: > >> A public resumable stream of Wikimedia events would allow folks >> outside of WMF networks to build realtime stream processing tooling on top >> of our data. Folks with their own Spark or Flink or Storm clusters (in >> Amazon or labs or wherever) could consume this and perform complex stream >> processing (e.g. machine learning algorithms (like ORES), windowed trending >> aggregations, etc.). >> > > I recall WMDE trying something similar a year ago (via PubSubHubbub) and > getting vetoed by ops. If they are not aware yet, might be worth contacting > them and asking if the new streaming service would cover their use cases > (it was about Wikidata change invalidation on third-party wikis, I think). > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l