Hi Denny

On Fri, Apr 26, 2013 at 5:56 PM, Denny Vrandečić <
[email protected]> wrote:

> The third party propagation is not very high on our priority list. Not
> because it is not important, but because there are things that are even
> more important - like getting it to work for Wikipedia :) And this seems to
> be stabilizing.
>
> What we have, for now:
>
> * We have the broadcast of all edits through IRC.
>
> * One could poll recent changes, but with 200-450 edits per minute, this
> might get problematic.
>
> * We do have the OAIRepository extension installed on Wikidata. Did anyone
> try that?
>

Great! Didn't know that. I see it installed (
http://www.wikidata.org/wiki/Special:OAIRepository) but it is password
protected, can we (DBpedia) request access?

Cheers,
Dimitris


>
> Besides that, we are currently moving our dispatches all to Redis, which
> has built-in-support for PubSubHubbub, so we will probably have some
> support for that at some point. I cannot make promises with regards to
> timeline of that, though. It is still in implementation, and needs to be
> fully tested and deployed, and after that it might have some rough edges
> still. So, it *could* be there in two to three months, but I cannot promise
> that.
>
> The other three options are not sufficient?
>
> Cheers,
> Denny
>
>
>
>
> 2013/4/26 Yuri Astrakhan <[email protected]>
>
>> Recently I spoke with Wikia, and being able to subscribe to the recent
>> changes feed is a very important feature to them. Apparently polling API's
>> recent changes creates a much higher stress on the system than subscribing.
>>
>> Now, we don't need (from the start)  to implement publishing of all the
>> data - just the fact that certain items have changed, and they can later be
>> requested by usual means, but it would be good to implement this system for
>> all of the API, not just wikidata.
>>
>>
>> On Fri, Apr 26, 2013 at 3:13 AM, Dimitris Kontokostas 
>> <[email protected]>wrote:
>>
>>> Dear Jeremy, all,
>>>
>>> In addition to what Sebastian said, in DBpedia Live we use the OAI-PMH
>>> protocol to get update feeds for English, German & Dutch WIkipedia.
>>> This OAI-PMH implementation [1] is very convenient for what we need (and
>>> I guess for most people) because it uses the latest modification date for
>>> update publishing.
>>> So when we ask for updates after time X it returns a list of articles
>>> with modification date after X, no matter how many times they were edited
>>> in between.
>>>
>>> This is very easy for you to support (no need for extra hardware, just
>>> an extra table / index) and suited best for most use cases.
>>> What most people need in the end is to know which pages changed since
>>> time X. Fine grained details are for special type of clients.
>>>
>>> Best,
>>> Dimitris
>>>
>>> [1] http://www.mediawiki.org/wiki/Extension:OAIRepository
>>>
>>>
>>> On Fri, Apr 26, 2013 at 9:40 AM, Sebastian Hellmann <
>>> [email protected]> wrote:
>>>
>>>> Dear Jeremy,
>>>> please read email from Daniel Kinzler on this list from 26.03.2013 18:26
>>>> :
>>>>
>>>>  * A dispatcher needs about 3 seconds to dispatch 1000 changes to a
>>>>> client wiki.
>>>>> * Considering we have ~300 client wikis, this means one dispatcher can
>>>>> handle
>>>>> about 4000 changes per hour.
>>>>> * We currently have two dispatchers running in parallel (on a single
>>>>> box, hume),
>>>>> that makes a capacity of 8000 changes/hour.
>>>>> * We are seeing roughly 17000 changes per hour on wikidata.org - more
>>>>> than twice
>>>>> our dispatch capacity.
>>>>> * I want to try running 6 dispatcher processes; that would give us the
>>>>> capacity
>>>>> to handle 24000 changes per hour (assuming linear scaling).
>>>>>
>>>>
>>>> 1.  Somebody needs to run the Hub and it needs to scale. Looks like the
>>>> protocol was intended to save some traffic, not to dispatch a massive
>>>> amount of messages / per day to a large number of clients. Again, I am not
>>>> familiar, how efficient PubSubHubbub is. What kind of hardware is needed to
>>>> run this, effectively? Do you have experience with this?
>>>>
>>>> 2. Somebody will still need to run and maintain the Hub and feed all
>>>> clients. I was offering to host one of the hubs for DBpedia users, but I am
>>>> not sure, whether we have that capacity.
>>>>
>>>> So we should use IRC RC + http request to the changed page as fallback?
>>>>
>>>> Sebastian
>>>>
>>>> Am 26.04.2013 08:06, schrieb Jeremy Baron:
>>>>
>>>>    Hi,
>>>>>
>>>>> On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann
>>>>> <[email protected]**leipzig.de<[email protected]>>
>>>>> wrote:
>>>>>
>>>>>> Well, PubSubHubbub is a nice idea. However it clearly depends on two
>>>>>> factors:
>>>>>> 1. whether Wikidata sets up such an infrastructure (I need to check
>>>>>> whether we have capacities, I am not sure atm)
>>>>>>
>>>>> Capacity for what? the infrastructure should be not be a problem.
>>>>> (famous last words, can look more closely tomorrow. but I'm really not
>>>>> worried about it) And you don't need any infrastructure at all for
>>>>> development; just use one of google's public instances.
>>>>>
>>>>>  2. whether performance is good enough to handle high-volume publishers
>>>>>>
>>>>> Again, how do you mean?
>>>>>
>>>>>  Basically, polling to recent changes [1] and then do a http request
>>>>>> to the individual pages should be fine for a start. So I guess this is 
>>>>>> what
>>>>>> we will implement, if there aren't any better suggestions.
>>>>>> The whole issue is problematic and the DBpedia project would be
>>>>>> happy, if this were discussed and decided right now, so we can plan
>>>>>> development.
>>>>>>
>>>>>> What is the best practice to get updates from Wikipedia at the moment?
>>>>>>
>>>>> I believe just about everyone uses the IRC feed from
>>>>> irc.wikimedia.org.
>>>>> https://meta.wikimedia.org/**wiki/IRC/Channels#Raw_feeds<https://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds>
>>>>>
>>>>> I imagine wikidata will or maybe already does propagate changes to a
>>>>> channel on that server but I can imagine IRC would not be a good
>>>>> method for many Instant data repo users. Some will not be able to
>>>>> sustain a single TCP connection for extended periods, some will not be
>>>>> able to use IRC ports at all, and some may go offline periodically.
>>>>> e.g. a server on a laptop. AIUI, PubSubHubbub has none of those
>>>>> problems and is better than the current IRC solution in just about
>>>>> every way.
>>>>>
>>>>> We could potentially even replace the current cross-DB job queue
>>>>> insert crazyness with PubSubHubbub for use on the cluster internally.
>>>>>
>>>>> -Jeremy
>>>>>
>>>>> ______________________________**_________________
>>>>> Wikidata-l mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Dipl. Inf. Sebastian Hellmann
>>>> Department of Computer Science, University of Leipzig
>>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>>>> Homepage: 
>>>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>>> Research Group: http://aksw.org
>>>>
>>>> ______________________________**_________________
>>>> Wikidata-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
>>>>
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>> _______________________________________________
>>> Wikidata-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>


-- 
Kontokostas Dimitris
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to