Thanks Jérémie,

we are definitely aiming for a more official announcement. The reason for the 
soft launch is that, after experimenting for a few months with the DataHub, we 
are still reporting to the developers issues that need to be addressed before a 
broader announcement. The CKAN data browser, for example, is quite rudimentary; 
there is limited support for batch file upload; data citation support is not 
keeping up with standards/best practices in the field etc. If anyone on these 
lists is interested in crash-testing the repository I'd be happy to follow up 
off-list.

Despite these issues, CKAN remains our engine of choice: it's open source, 
actively maintained by OKFN (an organization whose mission is aligned to 
Wikimedia's) and is currently used by large orgs and governments to run 
institutional repositories (like http://data.gov.uk).

The long-term vision is that of an actual "data/API hub" built on top of a 
naked repository, to facilitate the discovery/reuse of various data sources. I 
copy below a note I posted some weeks ago to wikitech-l on this topic.

Dario

Begin forwarded message:

> From: Dario Taraborelli <[email protected]>
> Subject: Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link 
> to the footer of Wikimedia wikis
> Date: September 25, 2012 10:55:47 AM PDT
> 
> I am very excited to see this proposal and happy to help in my spare time, 
> thanks for starting the thread. In fact, I started brainstorming a while ago 
> with a number of colleagues and community members on how an ideal Wikimedia 
> developer hub might look like. 
> 
> My thoughts:
> 
> (1) the hub should be focused on documenting reuse of Wikimedia's data 
> sources (the API, the XML dumps, the IRC streams), not just the MediaWiki 
> codebase. We are investing quite a lot of outreach effort in the MediaWiki 
> developer community, this hub should be broader in scope and support the 
> development of third-party apps/services building on these data sources. A 
> consultation we ran last year indicates that a large number of 
> developers/researchers interested in building services/mashups on top of 
> Wikipedia don't have a clue about what data/APIs we make available beside the 
> XML dumps or where to find this data: this is the audience we should build 
> the developer hub for.
> 
> (2) the hub should host simple recipes on how to use existing data sources 
> for building applications and list existing libraries for data 
> crunching/manipulation. My initial attempt at listing Wikimedia/Wikipedia 
> apps, mashups and data wrangling libraries is this spreadsheet, contributions 
> are welcome [1]
> 
> (3) on top of documenting data sources/APIs we should showcase the best 
> applications that use them and incentivize more developers to play with our 
> data, like Flickr does with its app garden. WMF designer Vibha Bamba created 
> these two mockups [1] [2], loosely inspired by 
> http://selection.datavisualization.ch, for a visual directory that we could 
> initially host on Labs.
> 
> Dario
> 
> [1] 
> https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Ams-fyukCIlMdDVrNHQ5RmJtZmNNQ01UbF9qeUV2aGc#gid=0
>  
> [2] http://commons.wikimedia.org/wiki/File:Wikipedia_DataViz-01.png
> [3] http://commons.wikimedia.org/wiki/File:Wikipedia_DataViz-02.png 
> 



On Oct 22, 2012, at 7:00 PM, Jérémie Roquet <[email protected]> wrote:

> cc-ed xmldatadumps-l
> 
> Hi,
> 
> 2012/10/23 Dario Taraborelli <[email protected]>:
>> 2012/10/23 James Forrester <[email protected]>:
>>> On 22 October 2012 16:03, Hydriz Wikipedia <[email protected]> wrote:
>>>> I have long been wanting to say this, but is it possible for the team 
>>>> behind
>>>> compiling such datasets to put future (and if possible, current) datasets
>>>> into dumps.wikimedia.org so that it is easier for everyone to find stuff 
>>>> and
>>>> not be all over the place? Thanks for that!
>>> 
>>> Many one-off and regular datasets, from query results to data dumps
>>> and similar, are now indexed[0] on The Data Hub (formerly CKAN) run by
>>> the Open Knowledge Foundation for precisely this reason - so that data
>>> researchers can easily find data about Wikimedia, and see when it's
>>> updated.
>>> 
>>> [0] - http://thedatahub.org/en/group/wikimedia
>> 
>> The dumps server was never meant to become a permanent open data repository, 
>> but it started being used as an ad-hoc solution to host all sort of datasets 
>> published by WMF on top of the actual XML dumps: that's the problem we're 
>> trying to fix.
>> 
>> Regardless of where the data is physically hosted, your go-to point to 
>> discover WMF datasets from now on is the DataHub. Think of it as a data 
>> registry: the registry is  all you need to know in order to find where the 
>> data is hosted and to extract the appropriate metadata/documentation.
> 
> That's fine for me but I think more communication about this would be
> welcome. I've added a link to meta:Data_dumps¹ and I'll communicate
> about this on the French Wikipedia, but a link on the dumps' page for
> other downloads² would be great.
> 
> Most people I've helped to find data on the Wikimedia projects now
> know about dumps.wikimedia.org, but AFAIK none of them is reading
> wiki-research-l.
> 
> Best regards,
> 
> ¹ https://meta.wikimedia.org/wiki/Data_dumps
> ² http://dumps.wikimedia.org/other/
> 
> -- 
> Jérémie
> 
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to