Hi Denny On Fri, Apr 26, 2013 at 5:56 PM, Denny Vrandečić < [email protected]> wrote:
> The third party propagation is not very high on our priority list. Not > because it is not important, but because there are things that are even > more important - like getting it to work for Wikipedia :) And this seems to > be stabilizing. > > What we have, for now: > > * We have the broadcast of all edits through IRC. > > * One could poll recent changes, but with 200-450 edits per minute, this > might get problematic. > > * We do have the OAIRepository extension installed on Wikidata. Did anyone > try that? > Great! Didn't know that. I see it installed ( http://www.wikidata.org/wiki/Special:OAIRepository) but it is password protected, can we (DBpedia) request access? Cheers, Dimitris > > Besides that, we are currently moving our dispatches all to Redis, which > has built-in-support for PubSubHubbub, so we will probably have some > support for that at some point. I cannot make promises with regards to > timeline of that, though. It is still in implementation, and needs to be > fully tested and deployed, and after that it might have some rough edges > still. So, it *could* be there in two to three months, but I cannot promise > that. > > The other three options are not sufficient? > > Cheers, > Denny > > > > > 2013/4/26 Yuri Astrakhan <[email protected]> > >> Recently I spoke with Wikia, and being able to subscribe to the recent >> changes feed is a very important feature to them. Apparently polling API's >> recent changes creates a much higher stress on the system than subscribing. >> >> Now, we don't need (from the start) to implement publishing of all the >> data - just the fact that certain items have changed, and they can later be >> requested by usual means, but it would be good to implement this system for >> all of the API, not just wikidata. >> >> >> On Fri, Apr 26, 2013 at 3:13 AM, Dimitris Kontokostas >> <[email protected]>wrote: >> >>> Dear Jeremy, all, >>> >>> In addition to what Sebastian said, in DBpedia Live we use the OAI-PMH >>> protocol to get update feeds for English, German & Dutch WIkipedia. >>> This OAI-PMH implementation [1] is very convenient for what we need (and >>> I guess for most people) because it uses the latest modification date for >>> update publishing. >>> So when we ask for updates after time X it returns a list of articles >>> with modification date after X, no matter how many times they were edited >>> in between. >>> >>> This is very easy for you to support (no need for extra hardware, just >>> an extra table / index) and suited best for most use cases. >>> What most people need in the end is to know which pages changed since >>> time X. Fine grained details are for special type of clients. >>> >>> Best, >>> Dimitris >>> >>> [1] http://www.mediawiki.org/wiki/Extension:OAIRepository >>> >>> >>> On Fri, Apr 26, 2013 at 9:40 AM, Sebastian Hellmann < >>> [email protected]> wrote: >>> >>>> Dear Jeremy, >>>> please read email from Daniel Kinzler on this list from 26.03.2013 18:26 >>>> : >>>> >>>> * A dispatcher needs about 3 seconds to dispatch 1000 changes to a >>>>> client wiki. >>>>> * Considering we have ~300 client wikis, this means one dispatcher can >>>>> handle >>>>> about 4000 changes per hour. >>>>> * We currently have two dispatchers running in parallel (on a single >>>>> box, hume), >>>>> that makes a capacity of 8000 changes/hour. >>>>> * We are seeing roughly 17000 changes per hour on wikidata.org - more >>>>> than twice >>>>> our dispatch capacity. >>>>> * I want to try running 6 dispatcher processes; that would give us the >>>>> capacity >>>>> to handle 24000 changes per hour (assuming linear scaling). >>>>> >>>> >>>> 1. Somebody needs to run the Hub and it needs to scale. Looks like the >>>> protocol was intended to save some traffic, not to dispatch a massive >>>> amount of messages / per day to a large number of clients. Again, I am not >>>> familiar, how efficient PubSubHubbub is. What kind of hardware is needed to >>>> run this, effectively? Do you have experience with this? >>>> >>>> 2. Somebody will still need to run and maintain the Hub and feed all >>>> clients. I was offering to host one of the hubs for DBpedia users, but I am >>>> not sure, whether we have that capacity. >>>> >>>> So we should use IRC RC + http request to the changed page as fallback? >>>> >>>> Sebastian >>>> >>>> Am 26.04.2013 08:06, schrieb Jeremy Baron: >>>> >>>> Hi, >>>>> >>>>> On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann >>>>> <[email protected]**leipzig.de<[email protected]>> >>>>> wrote: >>>>> >>>>>> Well, PubSubHubbub is a nice idea. However it clearly depends on two >>>>>> factors: >>>>>> 1. whether Wikidata sets up such an infrastructure (I need to check >>>>>> whether we have capacities, I am not sure atm) >>>>>> >>>>> Capacity for what? the infrastructure should be not be a problem. >>>>> (famous last words, can look more closely tomorrow. but I'm really not >>>>> worried about it) And you don't need any infrastructure at all for >>>>> development; just use one of google's public instances. >>>>> >>>>> 2. whether performance is good enough to handle high-volume publishers >>>>>> >>>>> Again, how do you mean? >>>>> >>>>> Basically, polling to recent changes [1] and then do a http request >>>>>> to the individual pages should be fine for a start. So I guess this is >>>>>> what >>>>>> we will implement, if there aren't any better suggestions. >>>>>> The whole issue is problematic and the DBpedia project would be >>>>>> happy, if this were discussed and decided right now, so we can plan >>>>>> development. >>>>>> >>>>>> What is the best practice to get updates from Wikipedia at the moment? >>>>>> >>>>> I believe just about everyone uses the IRC feed from >>>>> irc.wikimedia.org. >>>>> https://meta.wikimedia.org/**wiki/IRC/Channels#Raw_feeds<https://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds> >>>>> >>>>> I imagine wikidata will or maybe already does propagate changes to a >>>>> channel on that server but I can imagine IRC would not be a good >>>>> method for many Instant data repo users. Some will not be able to >>>>> sustain a single TCP connection for extended periods, some will not be >>>>> able to use IRC ports at all, and some may go offline periodically. >>>>> e.g. a server on a laptop. AIUI, PubSubHubbub has none of those >>>>> problems and is better than the current IRC solution in just about >>>>> every way. >>>>> >>>>> We could potentially even replace the current cross-DB job queue >>>>> insert crazyness with PubSubHubbub for use on the cluster internally. >>>>> >>>>> -Jeremy >>>>> >>>>> ______________________________**_________________ >>>>> Wikidata-l mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l> >>>>> >>>>> >>>> >>>> -- >>>> Dipl. Inf. Sebastian Hellmann >>>> Department of Computer Science, University of Leipzig >>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , >>>> http://dbpedia.org/Wiktionary , http://dbpedia.org >>>> Homepage: >>>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> >>>> Research Group: http://aksw.org >>>> >>>> ______________________________**_________________ >>>> Wikidata-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l> >>>> >>> >>> >>> >>> -- >>> Kontokostas Dimitris >>> >>> _______________________________________________ >>> Wikidata-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>> >>> >> >> _______________________________________________ >> Wikidata-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> >> > > > -- > Project director Wikidata > Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin > Tel. +49-30-219 158 26-0 | http://wikimedia.de > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für > Körperschaften I Berlin, Steuernummer 27/681/51985. > > _______________________________________________ > Wikidata-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > -- Kontokostas Dimitris
_______________________________________________ Wikidata-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-l
