Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-26 Thread Denny Vrandečić
Jona, yes, that would be useful. If it is not much trouble. Then I would extract the double links from your dataset and provide that as a comparision. Cheers, Denny 2012/6/25 Jona Christopher Sahnwaldt : > Hi Denny, > > we extract the inter-language links for DBpedia. Only for five or six > lang

Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-26 Thread Denny Vrandečić
dia.org/wiki/Help:Interlanguage_links#Bots_and_links_to_and_from_a_section > > > > - Original Message - > > From: denny.vrande...@wikimedia.de > > Sent: 06/25/12 02:39 PM > > To: Discussion list for the Wikidata project. > > Subject: Re: [Wikidata-l] Impact of Wikidata pha

Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-25 Thread Snaevar
kimedia.de Sent: 06/25/12 02:39 PM To: Discussion list for the Wikidata project. Subject: Re: [Wikidata-l] Impact of Wikidata phase I I am surprised by the amount of commented out language links -- there seem to be plenty of them, and I do not fully unde

Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-25 Thread Jona Christopher Sahnwaldt
Hi Denny, we extract the inter-language links for DBpedia. Only for five or six languages so far, but I could easily run the extractor for all 111 languages with more than 1 'good articles'. Shouldn't take more than a few hours. I would use dumps from late May / early June. We perform a full

Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-25 Thread Denny Vrandečić
I'll maybe... I shouldn't... other stuff to do... gnah... Let's see. I may well do a new run in the next few days... (you do realize that some of them wikis are pretty big, right?) :) Cheers, Denny Am 25. Juni 2012 17:22 schrieb Daniel Kinzler : > On 25.06.2012 16:39, Denny Vrandečić wrote: >>

Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-25 Thread Daniel Kinzler
On 25.06.2012 16:39, Denny Vrandečić wrote: > A full parse would have been to expensive to perform. I will update > the explanatory text to reflect that. Thank you for finding this > issue! A full parse is out of the question, but stripping comments should be simple enough: //s -- daniel -- Da

Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-25 Thread Denny Vrandečić
Hello Björn, yes, you are completely correct, it is based on an ad-hoc reg-exp, and the problem in the examples you mention is indeed due to language links that are commented out. I am surprised by the amount of commented out language links -- there seem to be plenty of them, and I do not fully un

Re: [Wikidata-l] Impact of Wikidata phase I

2012-06-25 Thread Bjoern Hoehrmann
* denny.vrande...@wikimedia.de wrote: >The full data set is available here: http://simia.net/languagelinks/doublelinks/doublelinks.de.html seems to have some errors, for example, it lists "Rundfunkjahr 1924 to [[en:1924 in radio]]" but as far as I can tell, it's t

[Wikidata-l] Impact of Wikidata phase I

2012-06-25 Thread Denny Vrandečić
Hi all, the first phase of Wikidata will help to centralize many of the Wikipedia language links. We did a small analysis to figure out the possible impact of this step. Here are a few highlights: * there are more than 240 million language links in the Wikipedias * they are responsible for about