Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-23 Thread Taha Yasseri
On a different track and back to Tilman's concern, we managed to get the
following sentence published in the Washington Post:

Among the 3.2 million articles Yasseri’s group studied last year, fewer
than 100 appeared to be on a definite trajectory toward perpetual
disagreement. That’s an excellent record for Wikipedia. Insofar as the free
encyclopedia provides a model for intellectual collaboration through social
media, those results are also encouraging for the pursuit of knowledge in
general.
the full article is available here:
http://www.washingtonpost.com/blogs/wonkblog/wp/2013/07/23/the-science-of-wikipedia-flamewars/

bests,
Taha


On Tue, Jul 23, 2013 at 7:42 AM, Balázs Viczián  wrote:

> When I started editing in 2006 it was already the norm; ever since people
> are encouraging each other to place their questions about a given article
> rather on the village pump or a project page, than on the actual article's
> talk page, reasoning that there is larger trafficwhat generates even
> larger traffic on those pages making article talks even more sparse :)
>
> I guess only a socio-cultural research could answer the question: why is it
> like that on huwiki. Maybe one day in the bright (and hopefully not so far)
> future Wikimedia Hungary will order a similar research so you can use that
> later on in your own research ;)
>
> Üdv,
> Balázs
>
>
> 2013/7/22 Taha Yasseri 
>
> > That's very interesting to know. Thanks for telling me. We were quite
> > surprised by seeing very spars talk pages in Hungarian Wiki.
> > I'm sure you know better than me that article talk pages are for
> different
> > purposes that user talks and the village pump. However that's interesting
> > that Hungarian Wikipedia prefer to take the discussion to other places
> than
> > talk pages.
> >
> > szervusz
> > Taha.
> >
> > On Mon, Jul 22, 2013 at 9:32 PM, Balázs Viczián <
> > balazs.vicz...@wikimedia.hu
> > > wrote:
> >
> > > As a Hungarian, it is really interesting to read something specific
> > > about the Hungarian Wikipedia :)
> > >
> > > I read somewhere (correct me if I'm wrong) that you found little to no
> > > discussions on article talk pages on the Hungarian Wikipedia,
> > > indicating that users barely discuss the content (or anything at all
> > > about the given article).
> > >
> > > Actually these discussions are either quickly moving to the village
> > > pump after 1-2 comments or happening there entirely. The most common
> > > is that the users discuss it on their user talk pages by directly
> > > messaging each other about the changes they made/content, creating
> > > 2-3-4 paralel threads on each others's user talks. Article talks for
> > > this reason are generally considered "deserted lands" on huwiki, what
> > > almost nobody reads.
> > >
> > > Cheers,
> > > Balázs
> > >
> > > 2013/7/22 Taha Yasseri 
> > > >
> > > > Anders,
> > > > I really like your idea on "universal" articles. given the fact that
> > > > translation and communication cross languages is not a very task
> these
> > > days
> > > > any more.
> > > >
> > > > By the way, in a blog post, I have release some more data on
> languages
> > > like
> > > > Japanese, Chinese, and Portugies, in case anyone's interested:
> > > >
> > >
> >
> http://tahayasseri.wordpress.com/2013/05/27/wikipedia-modern-platform-ancient-debates-on-land-and-gods/
> > > >
> > > > bests,
> > > > Taha
> > > >
> > > >
> > > > On Mon, Jul 22, 2013 at 4:17 PM, Anders Wennersten <
> > > m...@anderswennersten.se
> > > > > wrote:
> > > >
> > > > > I see the difference on the different version as most interesting
> and
> > > to
> > > > > have some insight into Arabic version, I have not had before
> > > > >
> > > > > On a "small version" like sv:wp we are very used to "steal with
> > pride"
> > > > > content from other versions, primary en:wp but also de:wp and
> others
> > > and we
> > > > > do this especially for controversial subjects that are not specific
> > > for a
> > > > > country/culture. But are en:wp and other big versions doing the
> same?
> > > It is
> > > > > very refreshing for a clinched discussion to start with an almost
> all
> > > new
> > > > > textversion.
> > > > >
> > > > > Also I wonder over articles like Homeopathy
> > > http://en.wikipedia.org/wiki/*
> > > > > *Homeopathy  which seems
> to
> > > be
> > > > > in top of controversies. Would it be an idea to compile an
> unverisal
> > > > > article with help from different versions, ie do we really utilize
> > the
> > > > > power of us having many versions and many experts?
> > > > >
> > > > > Anders
> > > > >
> > > > >
> > > > >
> > > > > Osmar Valdebenito skrev 2013-07-22 16:13:
> > > > >
> > > > >  I was interviewed a few days ago from a Chilean newspaper because
> of
> > > this
> > > > >> paper. For those interested that can read Spanish here is the full
> > > > >> article:
> > > > >> http://www.latercera.com/**noticia/tendencias/2013/07/**
> > > > >> 659-533645-9-estudi

Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-22 Thread Balázs Viczián
When I started editing in 2006 it was already the norm; ever since people
are encouraging each other to place their questions about a given article
rather on the village pump or a project page, than on the actual article's
talk page, reasoning that there is larger trafficwhat generates even
larger traffic on those pages making article talks even more sparse :)

I guess only a socio-cultural research could answer the question: why is it
like that on huwiki. Maybe one day in the bright (and hopefully not so far)
future Wikimedia Hungary will order a similar research so you can use that
later on in your own research ;)

Üdv,
Balázs


2013/7/22 Taha Yasseri 

> That's very interesting to know. Thanks for telling me. We were quite
> surprised by seeing very spars talk pages in Hungarian Wiki.
> I'm sure you know better than me that article talk pages are for different
> purposes that user talks and the village pump. However that's interesting
> that Hungarian Wikipedia prefer to take the discussion to other places than
> talk pages.
>
> szervusz
> Taha.
>
> On Mon, Jul 22, 2013 at 9:32 PM, Balázs Viczián <
> balazs.vicz...@wikimedia.hu
> > wrote:
>
> > As a Hungarian, it is really interesting to read something specific
> > about the Hungarian Wikipedia :)
> >
> > I read somewhere (correct me if I'm wrong) that you found little to no
> > discussions on article talk pages on the Hungarian Wikipedia,
> > indicating that users barely discuss the content (or anything at all
> > about the given article).
> >
> > Actually these discussions are either quickly moving to the village
> > pump after 1-2 comments or happening there entirely. The most common
> > is that the users discuss it on their user talk pages by directly
> > messaging each other about the changes they made/content, creating
> > 2-3-4 paralel threads on each others's user talks. Article talks for
> > this reason are generally considered "deserted lands" on huwiki, what
> > almost nobody reads.
> >
> > Cheers,
> > Balázs
> >
> > 2013/7/22 Taha Yasseri 
> > >
> > > Anders,
> > > I really like your idea on "universal" articles. given the fact that
> > > translation and communication cross languages is not a very task these
> > days
> > > any more.
> > >
> > > By the way, in a blog post, I have release some more data on languages
> > like
> > > Japanese, Chinese, and Portugies, in case anyone's interested:
> > >
> >
> http://tahayasseri.wordpress.com/2013/05/27/wikipedia-modern-platform-ancient-debates-on-land-and-gods/
> > >
> > > bests,
> > > Taha
> > >
> > >
> > > On Mon, Jul 22, 2013 at 4:17 PM, Anders Wennersten <
> > m...@anderswennersten.se
> > > > wrote:
> > >
> > > > I see the difference on the different version as most interesting and
> > to
> > > > have some insight into Arabic version, I have not had before
> > > >
> > > > On a "small version" like sv:wp we are very used to "steal with
> pride"
> > > > content from other versions, primary en:wp but also de:wp and others
> > and we
> > > > do this especially for controversial subjects that are not specific
> > for a
> > > > country/culture. But are en:wp and other big versions doing the same?
> > It is
> > > > very refreshing for a clinched discussion to start with an almost all
> > new
> > > > textversion.
> > > >
> > > > Also I wonder over articles like Homeopathy
> > http://en.wikipedia.org/wiki/*
> > > > *Homeopathy  which seems to
> > be
> > > > in top of controversies. Would it be an idea to compile an unverisal
> > > > article with help from different versions, ie do we really utilize
> the
> > > > power of us having many versions and many experts?
> > > >
> > > > Anders
> > > >
> > > >
> > > >
> > > > Osmar Valdebenito skrev 2013-07-22 16:13:
> > > >
> > > >  I was interviewed a few days ago from a Chilean newspaper because of
> > this
> > > >> paper. For those interested that can read Spanish here is the full
> > > >> article:
> > > >> http://www.latercera.com/**noticia/tendencias/2013/07/**
> > > >> 659-533645-9-estudio-dice-que-**chile-es-el-articulo-de-**
> > > >> wikipedia-mas-editado-en-**espanol.shtml<
> >
> http://www.latercera.com/noticia/tendencias/2013/07/659-533645-9-estudio-dice-que-chile-es-el-articulo-de-wikipedia-mas-editado-en-espanol.shtml
> > >
> > > >>
> > > >> I read the paper in full and I have to admit it has very interesting
> > > >> approaches to remove the "vandalism" effect. Probably it won't be
> > perfect,
> > > >> especially for a platform where it is impossible to have an exact,
> > > >> quantitative measure of quality or neutrality. Is there a measure of
> > > >> controversiality? I will consider controversial those articles
> where I
> > > >> usually edit and probably I will ignore several others that are more
> > > >> controversial and so on...
> > > >>
> > > >> But besides the particular issue of which is the most controversial
> > > >> article, I'm more interested in the trends that each Wikipedia has.
> > Th

Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-22 Thread Taha Yasseri
That's very interesting to know. Thanks for telling me. We were quite
surprised by seeing very spars talk pages in Hungarian Wiki.
I'm sure you know better than me that article talk pages are for different
purposes that user talks and the village pump. However that's interesting
that Hungarian Wikipedia prefer to take the discussion to other places than
talk pages.

szervusz
Taha.

On Mon, Jul 22, 2013 at 9:32 PM, Balázs Viczián  wrote:

> As a Hungarian, it is really interesting to read something specific
> about the Hungarian Wikipedia :)
>
> I read somewhere (correct me if I'm wrong) that you found little to no
> discussions on article talk pages on the Hungarian Wikipedia,
> indicating that users barely discuss the content (or anything at all
> about the given article).
>
> Actually these discussions are either quickly moving to the village
> pump after 1-2 comments or happening there entirely. The most common
> is that the users discuss it on their user talk pages by directly
> messaging each other about the changes they made/content, creating
> 2-3-4 paralel threads on each others's user talks. Article talks for
> this reason are generally considered "deserted lands" on huwiki, what
> almost nobody reads.
>
> Cheers,
> Balázs
>
> 2013/7/22 Taha Yasseri 
> >
> > Anders,
> > I really like your idea on "universal" articles. given the fact that
> > translation and communication cross languages is not a very task these
> days
> > any more.
> >
> > By the way, in a blog post, I have release some more data on languages
> like
> > Japanese, Chinese, and Portugies, in case anyone's interested:
> >
> http://tahayasseri.wordpress.com/2013/05/27/wikipedia-modern-platform-ancient-debates-on-land-and-gods/
> >
> > bests,
> > Taha
> >
> >
> > On Mon, Jul 22, 2013 at 4:17 PM, Anders Wennersten <
> m...@anderswennersten.se
> > > wrote:
> >
> > > I see the difference on the different version as most interesting and
> to
> > > have some insight into Arabic version, I have not had before
> > >
> > > On a "small version" like sv:wp we are very used to "steal with pride"
> > > content from other versions, primary en:wp but also de:wp and others
> and we
> > > do this especially for controversial subjects that are not specific
> for a
> > > country/culture. But are en:wp and other big versions doing the same?
> It is
> > > very refreshing for a clinched discussion to start with an almost all
> new
> > > textversion.
> > >
> > > Also I wonder over articles like Homeopathy
> http://en.wikipedia.org/wiki/*
> > > *Homeopathy  which seems to
> be
> > > in top of controversies. Would it be an idea to compile an unverisal
> > > article with help from different versions, ie do we really utilize the
> > > power of us having many versions and many experts?
> > >
> > > Anders
> > >
> > >
> > >
> > > Osmar Valdebenito skrev 2013-07-22 16:13:
> > >
> > >  I was interviewed a few days ago from a Chilean newspaper because of
> this
> > >> paper. For those interested that can read Spanish here is the full
> > >> article:
> > >> http://www.latercera.com/**noticia/tendencias/2013/07/**
> > >> 659-533645-9-estudio-dice-que-**chile-es-el-articulo-de-**
> > >> wikipedia-mas-editado-en-**espanol.shtml<
> http://www.latercera.com/noticia/tendencias/2013/07/659-533645-9-estudio-dice-que-chile-es-el-articulo-de-wikipedia-mas-editado-en-espanol.shtml
> >
> > >>
> > >> I read the paper in full and I have to admit it has very interesting
> > >> approaches to remove the "vandalism" effect. Probably it won't be
> perfect,
> > >> especially for a platform where it is impossible to have an exact,
> > >> quantitative measure of quality or neutrality. Is there a measure of
> > >> controversiality? I will consider controversial those articles where I
> > >> usually edit and probably I will ignore several others that are more
> > >> controversial and so on...
> > >>
> > >> But besides the particular issue of which is the most controversial
> > >> article, I'm more interested in the trends that each Wikipedia has.
> They
> > >> seem consistent and I think there is a lot of things that we can learn
> > >> from
> > >> it.
> > >>
> > >> *Osmar Valdebenito G.*
> > >> Director Ejecutivo
> > >> A. C. Wikimedia Argentina
> > >>
> > >>
> > >> 2013/7/22 Taha Yasseri 
> > >>
> > >>  Thanks Tilman.
> > >>>
> > >>> Especially for your effort to resolve the misunderstandings, which
> most
> > >>> of
> > >>> them I suppose are due to a shallow reading: "I had a bit of free
> time
> > >>> last
> > >>> night waiting for trains and I skimmed  through the study and its
> > >>> findings."
> > >>>
> > >>> We had two strategies to get rid of vandalisms, as you mentioned,
> > >>> considering only mutual reverts and waiting editors by their
> maturity, I
> > >>> suppose a vandal could not have a large maturity score by definition.
> > >>>
> > >>> As for the data, this study has been carried out in 2011, and we
> worked
> > >>> on
> > >>> the

Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-22 Thread Balázs Viczián
As a Hungarian, it is really interesting to read something specific
about the Hungarian Wikipedia :)

I read somewhere (correct me if I'm wrong) that you found little to no
discussions on article talk pages on the Hungarian Wikipedia,
indicating that users barely discuss the content (or anything at all
about the given article).

Actually these discussions are either quickly moving to the village
pump after 1-2 comments or happening there entirely. The most common
is that the users discuss it on their user talk pages by directly
messaging each other about the changes they made/content, creating
2-3-4 paralel threads on each others's user talks. Article talks for
this reason are generally considered "deserted lands" on huwiki, what
almost nobody reads.

Cheers,
Balázs

2013/7/22 Taha Yasseri 
>
> Anders,
> I really like your idea on "universal" articles. given the fact that
> translation and communication cross languages is not a very task these days
> any more.
>
> By the way, in a blog post, I have release some more data on languages like
> Japanese, Chinese, and Portugies, in case anyone's interested:
> http://tahayasseri.wordpress.com/2013/05/27/wikipedia-modern-platform-ancient-debates-on-land-and-gods/
>
> bests,
> Taha
>
>
> On Mon, Jul 22, 2013 at 4:17 PM, Anders Wennersten  > wrote:
>
> > I see the difference on the different version as most interesting and to
> > have some insight into Arabic version, I have not had before
> >
> > On a "small version" like sv:wp we are very used to "steal with pride"
> > content from other versions, primary en:wp but also de:wp and others and we
> > do this especially for controversial subjects that are not specific for a
> > country/culture. But are en:wp and other big versions doing the same? It is
> > very refreshing for a clinched discussion to start with an almost all new
> > textversion.
> >
> > Also I wonder over articles like Homeopathy http://en.wikipedia.org/wiki/*
> > *Homeopathy  which seems to be
> > in top of controversies. Would it be an idea to compile an unverisal
> > article with help from different versions, ie do we really utilize the
> > power of us having many versions and many experts?
> >
> > Anders
> >
> >
> >
> > Osmar Valdebenito skrev 2013-07-22 16:13:
> >
> >  I was interviewed a few days ago from a Chilean newspaper because of this
> >> paper. For those interested that can read Spanish here is the full
> >> article:
> >> http://www.latercera.com/**noticia/tendencias/2013/07/**
> >> 659-533645-9-estudio-dice-que-**chile-es-el-articulo-de-**
> >> wikipedia-mas-editado-en-**espanol.shtml
> >>
> >> I read the paper in full and I have to admit it has very interesting
> >> approaches to remove the "vandalism" effect. Probably it won't be perfect,
> >> especially for a platform where it is impossible to have an exact,
> >> quantitative measure of quality or neutrality. Is there a measure of
> >> controversiality? I will consider controversial those articles where I
> >> usually edit and probably I will ignore several others that are more
> >> controversial and so on...
> >>
> >> But besides the particular issue of which is the most controversial
> >> article, I'm more interested in the trends that each Wikipedia has. They
> >> seem consistent and I think there is a lot of things that we can learn
> >> from
> >> it.
> >>
> >> *Osmar Valdebenito G.*
> >> Director Ejecutivo
> >> A. C. Wikimedia Argentina
> >>
> >>
> >> 2013/7/22 Taha Yasseri 
> >>
> >>  Thanks Tilman.
> >>>
> >>> Especially for your effort to resolve the misunderstandings, which most
> >>> of
> >>> them I suppose are due to a shallow reading: "I had a bit of free time
> >>> last
> >>> night waiting for trains and I skimmed  through the study and its
> >>> findings."
> >>>
> >>> We had two strategies to get rid of vandalisms, as you mentioned,
> >>> considering only mutual reverts and waiting editors by their maturity, I
> >>> suppose a vandal could not have a large maturity score by definition.
> >>>
> >>> As for the data, this study has been carried out in 2011, and we worked
> >>> on
> >>> the latest available dump at the time. Someone experienced in academic
> >>> research, especially at this scale well knows that it really takes time
> >>> to
> >>> get the analysis done, write the reports, get them reviewed, etc.
> >>> Especially that we have published 7-8 other papers during the same
> >>> period.
> >>> I see no problem in this as long as the metadata and such information
> >>> about
> >>> the methods and the data under study are mentioned in the manuscript,
> >>> which
> >>> is clearly the case here. I have seen many Wikipedia studies without any
> >>> mention of the dump they have used!
> >>>
> >>>   Back to your concern for the general impression that the news media
> >>> give
> >>> on w

Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-22 Thread Anders Wennersten
I see the difference on the different version as most interesting and to 
have some insight into Arabic version, I have not had before


On a "small version" like sv:wp we are very used to "steal with pride" 
content from other versions, primary en:wp but also de:wp and others and 
we do this especially for controversial subjects that are not specific 
for a country/culture. But are en:wp and other big versions doing the 
same? It is very refreshing for a clinched discussion to start with an 
almost all new textversion.


Also I wonder over articles like Homeopathy 
http://en.wikipedia.org/wiki/Homeopathy which seems to be in top of 
controversies. Would it be an idea to compile an unverisal article with 
help from different versions, ie do we really utilize the power of us 
having many versions and many experts?


Anders



Osmar Valdebenito skrev 2013-07-22 16:13:

I was interviewed a few days ago from a Chilean newspaper because of this
paper. For those interested that can read Spanish here is the full article:
http://www.latercera.com/noticia/tendencias/2013/07/659-533645-9-estudio-dice-que-chile-es-el-articulo-de-wikipedia-mas-editado-en-espanol.shtml

I read the paper in full and I have to admit it has very interesting
approaches to remove the "vandalism" effect. Probably it won't be perfect,
especially for a platform where it is impossible to have an exact,
quantitative measure of quality or neutrality. Is there a measure of
controversiality? I will consider controversial those articles where I
usually edit and probably I will ignore several others that are more
controversial and so on...

But besides the particular issue of which is the most controversial
article, I'm more interested in the trends that each Wikipedia has. They
seem consistent and I think there is a lot of things that we can learn from
it.

*Osmar Valdebenito G.*
Director Ejecutivo
A. C. Wikimedia Argentina


2013/7/22 Taha Yasseri 


Thanks Tilman.

Especially for your effort to resolve the misunderstandings, which most of
them I suppose are due to a shallow reading: "I had a bit of free time last
night waiting for trains and I skimmed  through the study and its
findings."

We had two strategies to get rid of vandalisms, as you mentioned,
considering only mutual reverts and waiting editors by their maturity, I
suppose a vandal could not have a large maturity score by definition.

As for the data, this study has been carried out in 2011, and we worked on
the latest available dump at the time. Someone experienced in academic
research, especially at this scale well knows that it really takes time to
get the analysis done, write the reports, get them reviewed, etc.
Especially that we have published 7-8 other papers during the same period.
I see no problem in this as long as the metadata and such information about
the methods and the data under study are mentioned in the manuscript, which
is clearly the case here. I have seen many Wikipedia studies without any
mention of the dump they have used!

  Back to your concern for the general impression that the news media give
on wikipedia being a battlefield, I'd like to mention that I have
emphasised the small number of controversial articles compare to the total
number of articles in every single media response I had. Again as you
mentioned, we had given the percentages explicitly in our previous work.
But of course for obvious reasons journalists are not happy to highlight
this. They like to report on controversies and wars! This is not our fault
that what they report could be misleading, as long as we had tried our best
to avoid it. An interview of mine with  BBC Radio Scotland: at 04:00 I
clearly say that there are millions and thousands of articles in WIkipedia
which are not controversial, is available here:
https://www.dropbox.com/s/8whovkmipbqdzlv/bbc_radio_Scotland.mp3 . I have
done the same in all the others.

Finally, I wish that the public media coverage of our research which is
clearly far from perfect, could also provide the members of the public a
better understanding of how Wikipedia works and how fascinating it is!

Thanks again,

Taha


On 22 Jul 2013 05:58, "Tilman Bayer"  wrote:


On Sun, Jul 21, 2013 at 2:32 PM, MZMcBride  wrote:

Anders Wennersten wrote:

A most interesting study looking at findings from 10 different language
versions.

Jesus and Middle east are the most controversial articles seen over the
world, but George Bush on en:wp and Chile on es:wp

http://arxiv.org/ftp/arxiv/papers/1305/1305.5566.pdf

FWIW, here is the review by Giovanni Luca Ciampaglia in last month's
Wikimedia Research Newsletter:



https://blog.wikimedia.org/2013/06/28/wikimedia-research-newsletter-june-2013/#.22The_most_controversial_topics_in_Wikipedia:_a_multilingual_and_geographical_analysis.22

(also published in the Signpost, the weekly newsletter on the English
Wikipedia)


Thanks for sharing this.

I had a bit of free time last night waiting for trains and I skimmed
through the stud

Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-22 Thread Taha Yasseri
Anders,
I really like your idea on "universal" articles. given the fact that
translation and communication cross languages is not a very task these days
any more.

By the way, in a blog post, I have release some more data on languages like
Japanese, Chinese, and Portugies, in case anyone's interested:
http://tahayasseri.wordpress.com/2013/05/27/wikipedia-modern-platform-ancient-debates-on-land-and-gods/

bests,
Taha


On Mon, Jul 22, 2013 at 4:17 PM, Anders Wennersten  wrote:

> I see the difference on the different version as most interesting and to
> have some insight into Arabic version, I have not had before
>
> On a "small version" like sv:wp we are very used to "steal with pride"
> content from other versions, primary en:wp but also de:wp and others and we
> do this especially for controversial subjects that are not specific for a
> country/culture. But are en:wp and other big versions doing the same? It is
> very refreshing for a clinched discussion to start with an almost all new
> textversion.
>
> Also I wonder over articles like Homeopathy http://en.wikipedia.org/wiki/*
> *Homeopathy  which seems to be
> in top of controversies. Would it be an idea to compile an unverisal
> article with help from different versions, ie do we really utilize the
> power of us having many versions and many experts?
>
> Anders
>
>
>
> Osmar Valdebenito skrev 2013-07-22 16:13:
>
>  I was interviewed a few days ago from a Chilean newspaper because of this
>> paper. For those interested that can read Spanish here is the full
>> article:
>> http://www.latercera.com/**noticia/tendencias/2013/07/**
>> 659-533645-9-estudio-dice-que-**chile-es-el-articulo-de-**
>> wikipedia-mas-editado-en-**espanol.shtml
>>
>> I read the paper in full and I have to admit it has very interesting
>> approaches to remove the "vandalism" effect. Probably it won't be perfect,
>> especially for a platform where it is impossible to have an exact,
>> quantitative measure of quality or neutrality. Is there a measure of
>> controversiality? I will consider controversial those articles where I
>> usually edit and probably I will ignore several others that are more
>> controversial and so on...
>>
>> But besides the particular issue of which is the most controversial
>> article, I'm more interested in the trends that each Wikipedia has. They
>> seem consistent and I think there is a lot of things that we can learn
>> from
>> it.
>>
>> *Osmar Valdebenito G.*
>> Director Ejecutivo
>> A. C. Wikimedia Argentina
>>
>>
>> 2013/7/22 Taha Yasseri 
>>
>>  Thanks Tilman.
>>>
>>> Especially for your effort to resolve the misunderstandings, which most
>>> of
>>> them I suppose are due to a shallow reading: "I had a bit of free time
>>> last
>>> night waiting for trains and I skimmed  through the study and its
>>> findings."
>>>
>>> We had two strategies to get rid of vandalisms, as you mentioned,
>>> considering only mutual reverts and waiting editors by their maturity, I
>>> suppose a vandal could not have a large maturity score by definition.
>>>
>>> As for the data, this study has been carried out in 2011, and we worked
>>> on
>>> the latest available dump at the time. Someone experienced in academic
>>> research, especially at this scale well knows that it really takes time
>>> to
>>> get the analysis done, write the reports, get them reviewed, etc.
>>> Especially that we have published 7-8 other papers during the same
>>> period.
>>> I see no problem in this as long as the metadata and such information
>>> about
>>> the methods and the data under study are mentioned in the manuscript,
>>> which
>>> is clearly the case here. I have seen many Wikipedia studies without any
>>> mention of the dump they have used!
>>>
>>>   Back to your concern for the general impression that the news media
>>> give
>>> on wikipedia being a battlefield, I'd like to mention that I have
>>> emphasised the small number of controversial articles compare to the
>>> total
>>> number of articles in every single media response I had. Again as you
>>> mentioned, we had given the percentages explicitly in our previous work.
>>> But of course for obvious reasons journalists are not happy to highlight
>>> this. They like to report on controversies and wars! This is not our
>>> fault
>>> that what they report could be misleading, as long as we had tried our
>>> best
>>> to avoid it. An interview of mine with  BBC Radio Scotland: at 04:00 I
>>> clearly say that there are millions and thousands of articles in
>>> WIkipedia
>>> which are not controversial, is available here:
>>> https://www.dropbox.com/s/**8whovkmipbqdzlv/bbc_radio_**Scotland.mp3.
>>>  I have
>>> done the same in all the others.
>>>
>>> Finally, I wish that the public media coverage

Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-22 Thread Osmar Valdebenito
I was interviewed a few days ago from a Chilean newspaper because of this
paper. For those interested that can read Spanish here is the full article:
http://www.latercera.com/noticia/tendencias/2013/07/659-533645-9-estudio-dice-que-chile-es-el-articulo-de-wikipedia-mas-editado-en-espanol.shtml

I read the paper in full and I have to admit it has very interesting
approaches to remove the "vandalism" effect. Probably it won't be perfect,
especially for a platform where it is impossible to have an exact,
quantitative measure of quality or neutrality. Is there a measure of
controversiality? I will consider controversial those articles where I
usually edit and probably I will ignore several others that are more
controversial and so on...

But besides the particular issue of which is the most controversial
article, I'm more interested in the trends that each Wikipedia has. They
seem consistent and I think there is a lot of things that we can learn from
it.

*Osmar Valdebenito G.*
Director Ejecutivo
A. C. Wikimedia Argentina


2013/7/22 Taha Yasseri 

> Thanks Tilman.
>
> Especially for your effort to resolve the misunderstandings, which most of
> them I suppose are due to a shallow reading: "I had a bit of free time last
> night waiting for trains and I skimmed  through the study and its
> findings."
>
> We had two strategies to get rid of vandalisms, as you mentioned,
> considering only mutual reverts and waiting editors by their maturity, I
> suppose a vandal could not have a large maturity score by definition.
>
> As for the data, this study has been carried out in 2011, and we worked on
> the latest available dump at the time. Someone experienced in academic
> research, especially at this scale well knows that it really takes time to
> get the analysis done, write the reports, get them reviewed, etc.
> Especially that we have published 7-8 other papers during the same period.
> I see no problem in this as long as the metadata and such information about
> the methods and the data under study are mentioned in the manuscript, which
> is clearly the case here. I have seen many Wikipedia studies without any
> mention of the dump they have used!
>
>  Back to your concern for the general impression that the news media give
> on wikipedia being a battlefield, I'd like to mention that I have
> emphasised the small number of controversial articles compare to the total
> number of articles in every single media response I had. Again as you
> mentioned, we had given the percentages explicitly in our previous work.
> But of course for obvious reasons journalists are not happy to highlight
> this. They like to report on controversies and wars! This is not our fault
> that what they report could be misleading, as long as we had tried our best
> to avoid it. An interview of mine with  BBC Radio Scotland: at 04:00 I
> clearly say that there are millions and thousands of articles in WIkipedia
> which are not controversial, is available here:
> https://www.dropbox.com/s/8whovkmipbqdzlv/bbc_radio_Scotland.mp3 . I have
> done the same in all the others.
>
> Finally, I wish that the public media coverage of our research which is
> clearly far from perfect, could also provide the members of the public a
> better understanding of how Wikipedia works and how fascinating it is!
>
> Thanks again,
>
> Taha
>
>
> On 22 Jul 2013 05:58, "Tilman Bayer"  wrote:
>
> > On Sun, Jul 21, 2013 at 2:32 PM, MZMcBride  wrote:
> > > Anders Wennersten wrote:
> > >>A most interesting study looking at findings from 10 different language
> > >>versions.
> > >>
> > >>Jesus and Middle east are the most controversial articles seen over the
> > >>world, but George Bush on en:wp and Chile on es:wp
> > >>
> > >>http://arxiv.org/ftp/arxiv/papers/1305/1305.5566.pdf
> > >
> > FWIW, here is the review by Giovanni Luca Ciampaglia in last month's
> > Wikimedia Research Newsletter:
> >
> >
> https://blog.wikimedia.org/2013/06/28/wikimedia-research-newsletter-june-2013/#.22The_most_controversial_topics_in_Wikipedia:_a_multilingual_and_geographical_analysis.22
> > (also published in the Signpost, the weekly newsletter on the English
> > Wikipedia)
> >
> > > Thanks for sharing this.
> > >
> > > I had a bit of free time last night waiting for trains and I skimmed
> > > through the study and its findings. Two points stuck out at me: a
> > > seemingly fatally flawed methodology and the age of data used.
> > >
> > > The methodology used in this study seems to be pretty inherently
> flawed.
> > > According to the paper, controversiality was measured by full page
> > > reverts, which are fairly trivial to identify and study in a database
> > dump
> > > (using cryptographic hashes, as the study did), but I don't think full
> > > reverts give an accurate impression _at all_ of which articles are the
> > > most controversial.
> > >
> > > Pages with many full reverts are indicative of pages that are heavily
> > > vandalized. For example, the "George W. Bush" article is/was heavil

[Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

2013-07-22 Thread Taha Yasseri
Thanks Tilman.

Especially for your effort to resolve the misunderstandings, which most of
them I suppose are due to a shallow reading: "I had a bit of free time last
night waiting for trains and I skimmed  through the study and its findings."

We had two strategies to get rid of vandalisms, as you mentioned,
considering only mutual reverts and waiting editors by their maturity, I
suppose a vandal could not have a large maturity score by definition.

As for the data, this study has been carried out in 2011, and we worked on
the latest available dump at the time. Someone experienced in academic
research, especially at this scale well knows that it really takes time to
get the analysis done, write the reports, get them reviewed, etc.
Especially that we have published 7-8 other papers during the same period.
I see no problem in this as long as the metadata and such information about
the methods and the data under study are mentioned in the manuscript, which
is clearly the case here. I have seen many Wikipedia studies without any
mention of the dump they have used!

 Back to your concern for the general impression that the news media give
on wikipedia being a battlefield, I'd like to mention that I have
emphasised the small number of controversial articles compare to the total
number of articles in every single media response I had. Again as you
mentioned, we had given the percentages explicitly in our previous work.
But of course for obvious reasons journalists are not happy to highlight
this. They like to report on controversies and wars! This is not our fault
that what they report could be misleading, as long as we had tried our best
to avoid it. An interview of mine with  BBC Radio Scotland: at 04:00 I
clearly say that there are millions and thousands of articles in WIkipedia
which are not controversial, is available here:
https://www.dropbox.com/s/8whovkmipbqdzlv/bbc_radio_Scotland.mp3 . I have
done the same in all the others.

Finally, I wish that the public media coverage of our research which is
clearly far from perfect, could also provide the members of the public a
better understanding of how Wikipedia works and how fascinating it is!

Thanks again,

Taha


On 22 Jul 2013 05:58, "Tilman Bayer"  wrote:

> On Sun, Jul 21, 2013 at 2:32 PM, MZMcBride  wrote:
> > Anders Wennersten wrote:
> >>A most interesting study looking at findings from 10 different language
> >>versions.
> >>
> >>Jesus and Middle east are the most controversial articles seen over the
> >>world, but George Bush on en:wp and Chile on es:wp
> >>
> >>http://arxiv.org/ftp/arxiv/papers/1305/1305.5566.pdf
> >
> FWIW, here is the review by Giovanni Luca Ciampaglia in last month's
> Wikimedia Research Newsletter:
>
> https://blog.wikimedia.org/2013/06/28/wikimedia-research-newsletter-june-2013/#.22The_most_controversial_topics_in_Wikipedia:_a_multilingual_and_geographical_analysis.22
> (also published in the Signpost, the weekly newsletter on the English
> Wikipedia)
>
> > Thanks for sharing this.
> >
> > I had a bit of free time last night waiting for trains and I skimmed
> > through the study and its findings. Two points stuck out at me: a
> > seemingly fatally flawed methodology and the age of data used.
> >
> > The methodology used in this study seems to be pretty inherently flawed.
> > According to the paper, controversiality was measured by full page
> > reverts, which are fairly trivial to identify and study in a database
> dump
> > (using cryptographic hashes, as the study did), but I don't think full
> > reverts give an accurate impression _at all_ of which articles are the
> > most controversial.
> >
> > Pages with many full reverts are indicative of pages that are heavily
> > vandalized. For example, the "George W. Bush" article is/was heavily
> > vandalized for years on the English Wikipedia. Does blanking the article
> > or replacing its contents with the word "penis" mean that it's a very
> > controversial article? Of course not. Measuring only full reverts (as the
> > study seems to have done, though it's certainly possible I've overlooked
> > something) seems to be really misleading and inaccurate.
> They didn't. You may have overlooked the description of the
> methodology on p.5: It's based on "mutual reverts" where user A has
> reverted user B and user B has reverted user A, and gives higher
> weight to disputes between more experienced editors. This should
> exclude most vandalism reverts of the sort you describe. As noted in
> Giovanni's review, this method was proposed in an earlier paper, Sumi
> et al. (
> https://meta.wikimedia.org/wiki/Research:Newsletter/2011/July#Edit_wars_and_conflict_metrics
> ). That paper explains at length how this metric serves to distinguish
> vandalism reverts from edit wars. Of course there are ample
> possibilities to refine it, e.g. taking into account page protection
> logs.
>
> Personally, I'm more concerned that the new paper totally fails to put
> its subject into perspective by statin