I mean, in simple words: Your idea: when the data on wikidata is changed the new content is pushed to all local wikis / somewhere
My idea: local wikis retrieve data from wikidata db directly, no need to push anything on change On Mon, Apr 23, 2012 at 4:07 PM, Petr Bena <[email protected]> wrote: > I think it would be much better if the local wikis where it is > supposed to access this would have some sort of client extension which > would allow them to render the content using the db of wikidata. That > would be much simpler and faster > > On Mon, Apr 23, 2012 at 2:45 PM, Daniel Kinzler <[email protected]> wrote: >> Hi all! >> >> The wikidata team has been discussing how to best make data from wikidata >> available on local wikis. Fetching the data via HTTP whenever a page is >> re-rendered doesn't seem prudent, so we (mainly Jeroen) came up with a >> push-based architecture. >> >> The proposal is at >> <http://meta.wikimedia.org/wiki/Wikidata/Notes/Caching_investigation#Proposal:_HTTP_push_to_local_db_storage>, >> I have copied it below too. >> >> Please have a lot and let us know if you think this is viable, and which of >> the >> two variants you deem better! >> >> Thanks, >> -- daniel >> >> PS: Please keep the discussion on wikitech-l, so we have it all in one >> place. >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> == Proposal: HTTP push to local db storage == >> >> * Every time an item on Wikidata is changed, an HTTP push is issued to all >> subscribing clients (wikis) >> ** initially, "subscriptions" are just entries in an array in the >> configuration. >> ** Pushes can be done via the job queue. >> ** pushing is done via the mediawiki API, but other protocols such as PubSub >> Hubbub / AtomPub can easily be added to support 3rd parties. >> ** pushes need to be authenticated, so we don't get malicious crap. Pushes >> should be done using a special user with a special user right. >> ** the push may contain either the full set of information for the item, or >> just >> a delta (diff) + hash for integrity check (in case an update was missed). >> >> * When the client receives a push, it does two things: >> *# write the fresh data into a local database table (the local wikidata >> cache) >> *# invalidate the (parser) cache for all pages that use the respective item >> (for >> now we can assume that we know this from the language links) >> *#* if we only update language links, the page doesn't even need to be >> re-parsed: we just update the languagelinks in the cached ParserOutput >> object. >> >> * when a page is rendered, interlanguage links and other info is taken from >> the >> local wikidata cache. No queries are made to wikidata during >> parsing/rendering. >> >> * In case an update is missed, we need a mechanism to allow requesting a full >> purge and re-fetch of all data from on the client side and not just wait >> until >> the next push which might very well take a very long time to happen. >> ** There needs to be a manual option for when someone detects this. maybe >> action=purge can be made to do this. Simple cache-invalidation however >> shouldn't >> pull info from wikidata. >> **A time-to-live could be added to the local copy of the data so that it's >> updated by doing a pull periodically so the data does not stay stale >> indefinitely after a failed push. >> >> === Variation: shared database tables === >> >> Instead of having a local wikidata cache on each wiki (which may grow big - a >> first guesstimate of Jeroen and Reedy is up to 1TB total, for all wikis), all >> client wikis could access the same central database table(s) managed by the >> wikidata wiki. >> >> * this is similar to the way the globalusage extension tracks the usage of >> commons images >> * whenever a page is re-rendered, the local wiki would query the table in the >> wikidata db. This means a cross-cluster db query whenever a page is rendered, >> instead a local query. >> * the HTTP push mechanism described above would still be needed to purge the >> parser cache when needed. But the push requests would not need to contain the >> updated data, they may just be requests to purge the cache. >> * the ability for full HTTP pushes (using the mediawiki API or some other >> interface) would still be desirable for 3rd party integration. >> >> * This approach greatly lowers the amount of space used in the database >> * it doesn't change the number of http requests made >> ** it does however reduce the amount of data transferred via http (but not by >> much, at least not compared to pushing diffs) >> * it doesn't change the number of database requests, but it introduces >> cross-cluster requests >> >> >> >> _______________________________________________ >> Wikitech-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
