Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-26 Thread Denny Vrandečić
Ziko,

it does not jeopardize the Wikidata goal -- the current language link
system won't be switched off, but can be further used. Everything that
is working currently will still be possible afterwards. Wikidata can
still be used to represent the 99.2% of language links that are simple
-- this would still be a huge improvement over the current state.

As soon as these are out of the way, we can think about if and how to
extend the system in order to deal with the rest.

Cheers,
Denny

2012/6/25 Ziko van Dijk vand...@wmnederland.nl:
 Hello,

 So may I guess that double links are usually the result of a
 Wikipedian who was not sure which language link to set, so in doubt,
 he simply put in the language links for two different articles?

 And in general, is it imagineable that different languages divide the
 knowledge in different ways, which could jeopardize the whole goal of
 Wikidata unifiying the language links?

 Kind regards
 Ziko


 2012/6/25 Delirium delir...@hackish.org:
 Thanks for this list. For the languages I know, I've started going through
 and fixing ones that are clearly wrong. If a number of people do that, that
 should improve the general quality/consistency of interwiki links. I second
 the other comment that it'd be nice if the parsing could be re-run to
 exclude commented-out links, but the list is still useful as is.

 There are some difficult cases, though, when languages make different
 choices on how to group subjects, so the articles aren't actually in 1-to-1
 correspondence. For example, the English article [[en: Móði and Magni]]
 unsurprisingly has two outgoing interwiki links, when linking to languages
 that split them, such as [[da:Magni]] and [[da:Modi]]. It's not clear what
 to do about these cases.

 Best,
 Mark


 On 6/25/12 12:29 PM, Denny Vrandečić wrote:

 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few
 nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny



 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



 --

 ---
 Vereniging Wikimedia Nederland
 dr. Ziko van Dijk, voorzitter
 http://wmnederland.nl/

 Wikimedia Nederland
 Postbus 167
 3500 AD Utrecht
 ---

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-26 Thread Denny Vrandečić
Amir,

thank you for the thoughtful reply!

Indeed our current plan is a kind of a staged deployment in the sense
that we will not automatically transfer the links but let the editor
community do it. On our test systems we already see bots being tried
out and rewritten, so we expect that as soon as Wikidata starts, we
will see that transition happening.

But the current language link system will continue to work, so no
article or Wikipedia is forced to switch to the Wikidata system.
Complex language links configurations can still be handled manually --
and maybe even easier so, since conflicts between bots and human
editors should be less likely to happen.

I hope that this is the right path to profit :)

Cheers,
Denny


2012/6/25 Amir E. Aharoni amir.ahar...@mail.huji.ac.il:
 Hi Denny,

 TL;DR: It's a very important question, but don't worry about it too
 much. Just do Wikidata well as it is currently planned.

 Now, the full reply.

 I wrote a bit of an essay about it in 2008:
 https://meta.wikimedia.org/wiki/Tips_for_resolving_interwiki_conflicts

 I also started a page to coordinate the efforts to resolve such conflicts:
 https://meta.wikimedia.org/wiki/Interwiki_synchronization

 It started out nicely, but didn't really scale, so I had no choice but
 to neglect it.

 There are two main reasons that it didn't scale:
 1. Fixing interlanguage links conflicts is an exhausting manual
 process. The Interlanguage extension or Wikidata are supposed to make
 it centralized and easier.

 2. Almost all Wikipedians are very, very reluctant about doing
 anything outside their home projects.

 So, Wikidata is supposed to resolve #1. Once it becomes active, #2
 will kick in again. At this stage, all I can say is our old motto: Be
 Bold. There's a rumor about me, which says that I know a lot of
 languages. I don't; I'm just bold about trying to edit Wikipedias in
 languages that I don't know. Everybody can do it. Most of the time it
 turns out to be correct and people don't complain. Trying to talk to
 people about this on village pumps and using global message delivery
 is not very efficient. In many languages, even in some major ones, the
 village pumps are not as active as in English, and even when they are,
 people very often ignore messages in English.

 Anyway, my proposal is this:
 * As discussed at bug 15607 [1], the best strategy for rolling out
 centralized language links is to enable them in articles without
 conflicts and to leave articles with conflicts without any change at
 first.
 * After initial roll-out, a list of conflicts for every project should
 be created. That is, there should be one list of articles with
 conflicts in the English Wikipedia, another list for the Hebrew
 Wikipedia, another one for Croatian, etc. This will make it relatively
 more accessible for people, because it will look like a problem in
 their project. Most people like solving local problems more than
 global problems.[2]
 * Profit.

 I believe that this crowdsourcing model may work. It won't be
 immediately perfect or very fast. It's just a sensible start.

 [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=15607

 [2] A technical implementation comment about the list of pages with
 conflicts: it will be most efficient, if it will be implemented as a
 special page in each project. If updating it immediately is too
 burdensome in terms of performance, it can be updated in batches every
 week or so. The reason it should be a special page is that it will
 look like an integrated site feature and that it will be easy to
 localize its interface.

 2012/6/25 Denny Vrandečić denny.vrande...@wikimedia.de:
 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny

 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 

Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-26 Thread Martijn Hoekstra
This number, 99.2% was also mentioned on the Berlin Hackathon. It
sounds much higher than what my (very scientifically relevant,
obviously) gut feeling tells me. Could you indicate where this number
is coming from?

On Tue, Jun 26, 2012 at 2:45 PM, Denny Vrandečić
denny.vrande...@wikimedia.de wrote:
 Ziko,

 it does not jeopardize the Wikidata goal -- the current language link
 system won't be switched off, but can be further used. Everything that
 is working currently will still be possible afterwards. Wikidata can
 still be used to represent the 99.2% of language links that are simple
 -- this would still be a huge improvement over the current state.

 As soon as these are out of the way, we can think about if and how to
 extend the system in order to deal with the rest.

 Cheers,
 Denny

 2012/6/25 Ziko van Dijk vand...@wmnederland.nl:
 Hello,

 So may I guess that double links are usually the result of a
 Wikipedian who was not sure which language link to set, so in doubt,
 he simply put in the language links for two different articles?

 And in general, is it imagineable that different languages divide the
 knowledge in different ways, which could jeopardize the whole goal of
 Wikidata unifiying the language links?

 Kind regards
 Ziko


 2012/6/25 Delirium delir...@hackish.org:
 Thanks for this list. For the languages I know, I've started going through
 and fixing ones that are clearly wrong. If a number of people do that, that
 should improve the general quality/consistency of interwiki links. I second
 the other comment that it'd be nice if the parsing could be re-run to
 exclude commented-out links, but the list is still useful as is.

 There are some difficult cases, though, when languages make different
 choices on how to group subjects, so the articles aren't actually in 1-to-1
 correspondence. For example, the English article [[en: Móði and Magni]]
 unsurprisingly has two outgoing interwiki links, when linking to languages
 that split them, such as [[da:Magni]] and [[da:Modi]]. It's not clear what
 to do about these cases.

 Best,
 Mark


 On 6/25/12 12:29 PM, Denny Vrandečić wrote:

 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few
 nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny



 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



 --

 ---
 Vereniging Wikimedia Nederland
 dr. Ziko van Dijk, voorzitter
 http://wmnederland.nl/

 Wikimedia Nederland
 Postbus 167
 3500 AD Utrecht
 ---

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-26 Thread Denny Vrandečić
I got the number from Brent Hecht, a researcher at Northwestern, who
has a number of great papers published on Wikipedia-related topics.

CC-ing him, so he knows I am blam.., er, referencing him :)

Cheers,
Denny



2012/6/26 Martijn Hoekstra martijnhoeks...@gmail.com:
 This number, 99.2% was also mentioned on the Berlin Hackathon. It
 sounds much higher than what my (very scientifically relevant,
 obviously) gut feeling tells me. Could you indicate where this number
 is coming from?

 On Tue, Jun 26, 2012 at 2:45 PM, Denny Vrandečić
 denny.vrande...@wikimedia.de wrote:
 Ziko,

 it does not jeopardize the Wikidata goal -- the current language link
 system won't be switched off, but can be further used. Everything that
 is working currently will still be possible afterwards. Wikidata can
 still be used to represent the 99.2% of language links that are simple
 -- this would still be a huge improvement over the current state.

 As soon as these are out of the way, we can think about if and how to
 extend the system in order to deal with the rest.

 Cheers,
 Denny

 2012/6/25 Ziko van Dijk vand...@wmnederland.nl:
 Hello,

 So may I guess that double links are usually the result of a
 Wikipedian who was not sure which language link to set, so in doubt,
 he simply put in the language links for two different articles?

 And in general, is it imagineable that different languages divide the
 knowledge in different ways, which could jeopardize the whole goal of
 Wikidata unifiying the language links?

 Kind regards
 Ziko


 2012/6/25 Delirium delir...@hackish.org:
 Thanks for this list. For the languages I know, I've started going through
 and fixing ones that are clearly wrong. If a number of people do that, that
 should improve the general quality/consistency of interwiki links. I second
 the other comment that it'd be nice if the parsing could be re-run to
 exclude commented-out links, but the list is still useful as is.

 There are some difficult cases, though, when languages make different
 choices on how to group subjects, so the articles aren't actually in 1-to-1
 correspondence. For example, the English article [[en: Móði and Magni]]
 unsurprisingly has two outgoing interwiki links, when linking to languages
 that split them, such as [[da:Magni]] and [[da:Modi]]. It's not clear what
 to do about these cases.

 Best,
 Mark


 On 6/25/12 12:29 PM, Denny Vrandečić wrote:

 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few
 nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny



 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



 --

 ---
 Vereniging Wikimedia Nederland
 dr. Ziko van Dijk, voorzitter
 http://wmnederland.nl/

 Wikimedia Nederland
 Postbus 167
 3500 AD Utrecht
 ---

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

 ___
 Wikimedia-l mailing list
 

Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-26 Thread Deryck Chan
One major problem with double language links I've encountered before was
that they confuse interwiki bots and therefore break things. Several
articles on the Cantonese Wikipedia (zh-yue.wp) pertaining to local
political and cultural issues in the Cantonese-speaking world have
__NOBOT__ on them simply because they have topic splits that are
significantly different from other Wikipedias, and bot interwiki
manipulation need to be prevented to maintain topic correspondence between
languages.

In short: double language links are a possible idea, but only if we can
upgrade the interwiki bots first.

On 25 June 2012 11:29, Denny Vrandečić denny.vrande...@wikimedia.de wrote:

 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few
 nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny

 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-26 Thread Brent Hecht
Hi All,
 
Brent Hecht here :-) This has been a really interesting discussion, and I 
wanted to chime in with a few notes.
 
The 99.2% is based on a quick script I wrote that looked at reciprocity among a 
sample of interlanguage links (ILLs) in 25 languages to address some questions 
that Denny and I brainstormed. It did not consider commented links, redirects, 
etc. However, in my lab’s published work that takes much more involved 
approaches to this problem, we also find that complex interlanguage link 
situations are the obvious minority, with simple 1:1 relationships being the 
norm. For instance, only 1% of connected components of the ILL graph (groups of 
articles linked together by ILLs) had more than one article per language 
edition in our 25-language dataset.
 
There are some important details to note here. For instance, most connected 
components contain only one article (most concepts are covered by only a single 
language edition) and non-1:1 cases by definition involve more articles than 
1:1 cases (all other things being equal). Also, general concepts of global 
interest are likely disproportionately represented in the non-1:1 situations 
(e.g. river, canal, high school, diplomacy). 
 
That said, given our data, I also think Denny is spot on with the “let’s start 
with the 1:1s” approach to building Wikidata, with solutions for more complex 
situations coming later. These solutions could be fascinating and important, 
but make sense as a second step, IMHO. Given that each language edition will be 
able to pick and choose from statements (last I checked, at least), this might 
provide additional flexibility as well, allowing greater variation to be 
included in the 1:1 model.
 
If interested, I'd encourage folks to check out our CHI 2012 paper [1], as well 
as some excellent work done by Gerard de Melo and Gerhard Weikum that preceded 
us [3]. De Melo and Weikum establish an interesting taxonomy for causes of 
non-1:1 links: conceptual drift, different granularities, and mistakes made by 
editors.
 
In my view, perhaps a greater problem is the one of missing interlanguage 
links, which I hope Wikidata’s popularity will help to solve. We’ve done some 
work to show that missing links can be somewhat substantial between certain 
language editions [2], although that was based on data from 2009.
 
It's important to note, too, that some of the differences in coverage of a 
given concept across articles in different languages is addressed not with 
ILLs, but simply by describing concepts differently in each language edition. 
We call this sub-concept diversity, and it can be substantial [2]. Our CHI 
2012 paper describes a system we built, Omnipedia, that allows folks to browse 
the content about a single concept in 25 language editions. We’re hoping to 
launch the system sometime soon, but we have some practical considerations to 
deal with first (funding, finishing my thesis, etc. :-)).
 
Lastly, I've been digging into the social science of this stuff a bit lately 
and many folks believe that, as Ziko said, different languages divide 
knowledge in different ways (even apart from any effects introduced in the 
Wikipedia context specifically). For instance, the linguist Anna Wierzbicka 
talks about the granularity differences as cultural elaboration and has all 
sorts of fun examples in her book Understanding Cultures Through Their 
Keywords. You can also make arguments about this from a geographic perspective 
(my social science roots), psycholinguistics, and I'm sure other fields as 
well. This stuff perhaps explains some of the 1% of non-1:1 concepts, as well 
as some of the sub-concept diversity, although I am still brainstorming.
 
In any case, hopefully this helps some! Happy to answer questions. Thanks again 
for a great discussion.
 
-   Brent

p.s. Don't forget to support Denny and crew in the Knight News Challenge 
proposal :-) : 
http://newschallenge.tumblr.com/post/25575917516/wikidata-as-a-central-free-repository-of-identifiers

Brent Hecht
Ph.D. Candidate in Computer Science @ Northwestern University
Asst. Prof. of Comp. Sci @ Univ. Minnesota beginning 2013
w: http://www.brenthecht.com
e: br...@u.northwestern.edu
t: @bhecht
 
[1]   Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle, D. 
2012. Omnipedia: Bridging the Wikipedia Language Gap. CHI  ’12: 30th 
International Conference on Human Factors in Computing Systems (2012).
[2]   Hecht, B. and Gergle, D. 2010. The Tower of Babel Meets Web 2.0: 
User-Generated Content and Its Applications in a Multilingual Context. CHI  
’10: 28th International Conference on Human Factors in Computing Systems 
(Atlanta, GA, 2010), 291–300.
[3]   de Melo, G. and Weikum, G. 2010. Untangling the Cross-Lingual Link 
Structure of Wikipedia. ACL  ’10: 48th Annual Meeting of the Association for 
Computational Linguistics (Uppsala, Sweden, 2010).









On Jun 26, 2012, at 7:56 AM, Denny Vrandečić wrote:

 I got the number from 

[Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-25 Thread Denny Vrandečić
Hi all,

I ran some analysis last week, to get some numbers out of the
Wikipedia language links. One type of reports that were generated was
the list of all articles in the main namespaces of the Wikipedias that
link to more than one article in another language edition of Wikipedia
(so called double language links). There are not that many of them
(about 19,000 in total), split by language, all available here:

http://simia.net/languagelinks/

Double language links are not errors per se, but they contain a few nuisances
* they lead to two links in the language links list that just look the
same (you have to hover over them to see that they link to different
languages), which is not really optimal from the user experience side
* they are not saved in the langlinks table and thus are ignored in
certain reports and also in the respective export

I am not sure how to reach out to the respective Wikipedia
communities, or if I should at all. Should I post to their respective
version of the village pump? Remembering from the time I was active on
the Croatian Wikipedia, I would have appreciated that list to check
the entries. I reckoned the wikipedia-l list would be the right place,
but that list looks rather dead.

Cheers,
Denny

-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-25 Thread Bence Damokos
Hi Denny,

This is a really interesting list.
Looking at the Hungarian list, I find that in many instances the duplicate
interwiki link is actually commented out (in the form of !-- Source:
[[en: something]] -- or !-- wrong interwikis: [[en: ..] [[fr: ..]] --),
and not real duplicate links. (In some cases there are indeed duplicate
links, where one concept covers two concepts in other languages.)

Maybe you could refine your search algorithm to exclude commented out
links, and improve your listing page by including not only the second
interwiki link found for a given language, but also the first one, so it is
easier to assess without having to check the article pages or source codes?

In any case, the village pumps might be a good place to post a link to the
lists. The Global message delivery system might help you in that:
http://meta.wikimedia.org/wiki/Global_message_delivery

Best regards,
Bence

On Mon, Jun 25, 2012 at 12:29 PM, Denny Vrandečić 
denny.vrande...@wikimedia.de wrote:

 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few
 nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny

 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-25 Thread Amir E. Aharoni
Hi Denny,

TL;DR: It's a very important question, but don't worry about it too
much. Just do Wikidata well as it is currently planned.

Now, the full reply.

I wrote a bit of an essay about it in 2008:
https://meta.wikimedia.org/wiki/Tips_for_resolving_interwiki_conflicts

I also started a page to coordinate the efforts to resolve such conflicts:
https://meta.wikimedia.org/wiki/Interwiki_synchronization

It started out nicely, but didn't really scale, so I had no choice but
to neglect it.

There are two main reasons that it didn't scale:
1. Fixing interlanguage links conflicts is an exhausting manual
process. The Interlanguage extension or Wikidata are supposed to make
it centralized and easier.

2. Almost all Wikipedians are very, very reluctant about doing
anything outside their home projects.

So, Wikidata is supposed to resolve #1. Once it becomes active, #2
will kick in again. At this stage, all I can say is our old motto: Be
Bold. There's a rumor about me, which says that I know a lot of
languages. I don't; I'm just bold about trying to edit Wikipedias in
languages that I don't know. Everybody can do it. Most of the time it
turns out to be correct and people don't complain. Trying to talk to
people about this on village pumps and using global message delivery
is not very efficient. In many languages, even in some major ones, the
village pumps are not as active as in English, and even when they are,
people very often ignore messages in English.

Anyway, my proposal is this:
* As discussed at bug 15607 [1], the best strategy for rolling out
centralized language links is to enable them in articles without
conflicts and to leave articles with conflicts without any change at
first.
* After initial roll-out, a list of conflicts for every project should
be created. That is, there should be one list of articles with
conflicts in the English Wikipedia, another list for the Hebrew
Wikipedia, another one for Croatian, etc. This will make it relatively
more accessible for people, because it will look like a problem in
their project. Most people like solving local problems more than
global problems.[2]
* Profit.

I believe that this crowdsourcing model may work. It won't be
immediately perfect or very fast. It's just a sensible start.

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=15607

[2] A technical implementation comment about the list of pages with
conflicts: it will be most efficient, if it will be implemented as a
special page in each project. If updating it immediately is too
burdensome in terms of performance, it can be updated in batches every
week or so. The reason it should be a special page is that it will
look like an integrated site feature and that it will be easy to
localize its interface.

2012/6/25 Denny Vrandečić denny.vrande...@wikimedia.de:
 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny

 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-25 Thread Delirium
Thanks for this list. For the languages I know, I've started going 
through and fixing ones that are clearly wrong. If a number of people do 
that, that should improve the general quality/consistency of interwiki 
links. I second the other comment that it'd be nice if the parsing could 
be re-run to exclude commented-out links, but the list is still useful 
as is.


There are some difficult cases, though, when languages make different 
choices on how to group subjects, so the articles aren't actually in 
1-to-1 correspondence. For example, the English article [[en: Móði and 
Magni]] unsurprisingly has two outgoing interwiki links, when linking to 
languages that split them, such as [[da:Magni]] and [[da:Modi]]. It's 
not clear what to do about these cases.


Best,
Mark

On 6/25/12 12:29 PM, Denny Vrandečić wrote:

Hi all,

I ran some analysis last week, to get some numbers out of the
Wikipedia language links. One type of reports that were generated was
the list of all articles in the main namespaces of the Wikipedias that
link to more than one article in another language edition of Wikipedia
(so called double language links). There are not that many of them
(about 19,000 in total), split by language, all available here:

http://simia.net/languagelinks/

Double language links are not errors per se, but they contain a few nuisances
* they lead to two links in the language links list that just look the
same (you have to hover over them to see that they link to different
languages), which is not really optimal from the user experience side
* they are not saved in the langlinks table and thus are ignored in
certain reports and also in the respective export

I am not sure how to reach out to the respective Wikipedia
communities, or if I should at all. Should I post to their respective
version of the village pump? Remembering from the time I was active on
the Croatian Wikipedia, I would have appreciated that list to check
the entries. I reckoned the wikipedia-l list would be the right place,
but that list looks rather dead.

Cheers,
Denny




___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


Re: [Wikimedia-l] Language links and double language links on the Wikipedias

2012-06-25 Thread Ziko van Dijk
Hello,

So may I guess that double links are usually the result of a
Wikipedian who was not sure which language link to set, so in doubt,
he simply put in the language links for two different articles?

And in general, is it imagineable that different languages divide the
knowledge in different ways, which could jeopardize the whole goal of
Wikidata unifiying the language links?

Kind regards
Ziko


2012/6/25 Delirium delir...@hackish.org:
 Thanks for this list. For the languages I know, I've started going through
 and fixing ones that are clearly wrong. If a number of people do that, that
 should improve the general quality/consistency of interwiki links. I second
 the other comment that it'd be nice if the parsing could be re-run to
 exclude commented-out links, but the list is still useful as is.

 There are some difficult cases, though, when languages make different
 choices on how to group subjects, so the articles aren't actually in 1-to-1
 correspondence. For example, the English article [[en: Móði and Magni]]
 unsurprisingly has two outgoing interwiki links, when linking to languages
 that split them, such as [[da:Magni]] and [[da:Modi]]. It's not clear what
 to do about these cases.

 Best,
 Mark


 On 6/25/12 12:29 PM, Denny Vrandečić wrote:

 Hi all,

 I ran some analysis last week, to get some numbers out of the
 Wikipedia language links. One type of reports that were generated was
 the list of all articles in the main namespaces of the Wikipedias that
 link to more than one article in another language edition of Wikipedia
 (so called double language links). There are not that many of them
 (about 19,000 in total), split by language, all available here:

 http://simia.net/languagelinks/

 Double language links are not errors per se, but they contain a few
 nuisances
 * they lead to two links in the language links list that just look the
 same (you have to hover over them to see that they link to different
 languages), which is not really optimal from the user experience side
 * they are not saved in the langlinks table and thus are ignored in
 certain reports and also in the respective export

 I am not sure how to reach out to the respective Wikipedia
 communities, or if I should at all. Should I post to their respective
 version of the village pump? Remembering from the time I was active on
 the Croatian Wikipedia, I would have appreciated that list to check
 the entries. I reckoned the wikipedia-l list would be the right place,
 but that list looks rather dead.

 Cheers,
 Denny



 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l



-- 

---
Vereniging Wikimedia Nederland
dr. Ziko van Dijk, voorzitter
http://wmnederland.nl/

Wikimedia Nederland
Postbus 167
3500 AD Utrecht
---

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l