Thanks Jérémie, we are definitely aiming for a more official announcement. The reason for the soft launch is that, after experimenting for a few months with the DataHub, we are still reporting to the developers issues that need to be addressed before a broader announcement. The CKAN data browser, for example, is quite rudimentary; there is limited support for batch file upload; data citation support is not keeping up with standards/best practices in the field etc. If anyone on these lists is interested in crash-testing the repository I'd be happy to follow up off-list.
Despite these issues, CKAN remains our engine of choice: it's open source, actively maintained by OKFN (an organization whose mission is aligned to Wikimedia's) and is currently used by large orgs and governments to run institutional repositories (like http://data.gov.uk). The long-term vision is that of an actual "data/API hub" built on top of a naked repository, to facilitate the discovery/reuse of various data sources. I copy below a note I posted some weeks ago to wikitech-l on this topic. Dario Begin forwarded message: > From: Dario Taraborelli <[email protected]> > Subject: Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link > to the footer of Wikimedia wikis > Date: September 25, 2012 10:55:47 AM PDT > > I am very excited to see this proposal and happy to help in my spare time, > thanks for starting the thread. In fact, I started brainstorming a while ago > with a number of colleagues and community members on how an ideal Wikimedia > developer hub might look like. > > My thoughts: > > (1) the hub should be focused on documenting reuse of Wikimedia's data > sources (the API, the XML dumps, the IRC streams), not just the MediaWiki > codebase. We are investing quite a lot of outreach effort in the MediaWiki > developer community, this hub should be broader in scope and support the > development of third-party apps/services building on these data sources. A > consultation we ran last year indicates that a large number of > developers/researchers interested in building services/mashups on top of > Wikipedia don't have a clue about what data/APIs we make available beside the > XML dumps or where to find this data: this is the audience we should build > the developer hub for. > > (2) the hub should host simple recipes on how to use existing data sources > for building applications and list existing libraries for data > crunching/manipulation. My initial attempt at listing Wikimedia/Wikipedia > apps, mashups and data wrangling libraries is this spreadsheet, contributions > are welcome [1] > > (3) on top of documenting data sources/APIs we should showcase the best > applications that use them and incentivize more developers to play with our > data, like Flickr does with its app garden. WMF designer Vibha Bamba created > these two mockups [1] [2], loosely inspired by > http://selection.datavisualization.ch, for a visual directory that we could > initially host on Labs. > > Dario > > [1] > https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Ams-fyukCIlMdDVrNHQ5RmJtZmNNQ01UbF9qeUV2aGc#gid=0 > > [2] http://commons.wikimedia.org/wiki/File:Wikipedia_DataViz-01.png > [3] http://commons.wikimedia.org/wiki/File:Wikipedia_DataViz-02.png > On Oct 22, 2012, at 7:00 PM, Jérémie Roquet <[email protected]> wrote: > cc-ed xmldatadumps-l > > Hi, > > 2012/10/23 Dario Taraborelli <[email protected]>: >> 2012/10/23 James Forrester <[email protected]>: >>> On 22 October 2012 16:03, Hydriz Wikipedia <[email protected]> wrote: >>>> I have long been wanting to say this, but is it possible for the team >>>> behind >>>> compiling such datasets to put future (and if possible, current) datasets >>>> into dumps.wikimedia.org so that it is easier for everyone to find stuff >>>> and >>>> not be all over the place? Thanks for that! >>> >>> Many one-off and regular datasets, from query results to data dumps >>> and similar, are now indexed[0] on The Data Hub (formerly CKAN) run by >>> the Open Knowledge Foundation for precisely this reason - so that data >>> researchers can easily find data about Wikimedia, and see when it's >>> updated. >>> >>> [0] - http://thedatahub.org/en/group/wikimedia >> >> The dumps server was never meant to become a permanent open data repository, >> but it started being used as an ad-hoc solution to host all sort of datasets >> published by WMF on top of the actual XML dumps: that's the problem we're >> trying to fix. >> >> Regardless of where the data is physically hosted, your go-to point to >> discover WMF datasets from now on is the DataHub. Think of it as a data >> registry: the registry is all you need to know in order to find where the >> data is hosted and to extract the appropriate metadata/documentation. > > That's fine for me but I think more communication about this would be > welcome. I've added a link to meta:Data_dumps¹ and I'll communicate > about this on the French Wikipedia, but a link on the dumps' page for > other downloads² would be great. > > Most people I've helped to find data on the Wikimedia projects now > know about dumps.wikimedia.org, but AFAIK none of them is reading > wiki-research-l. > > Best regards, > > ¹ https://meta.wikimedia.org/wiki/Data_dumps > ² http://dumps.wikimedia.org/other/ > > -- > Jérémie > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
