Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-28 Thread Steven Walling
On Tue, Nov 26, 2013 at 2:46 AM, Strainu  wrote:

> Hi Steven,
>
> What qualifies as "many"? On ro.wp, Andrebot is creating sections of
> articles about the population of villages/communes in Bulgaria,
> Hungary and Croatia, also creating articles where they do not exist.
> That will probably amount to a few hundred articles by the end of the
> year.
>

This counts, since it seems that in 2012 and more recently in 2013, bot
article creation has exceeded manual creations. See:
http://stats.wikimedia.org/EN/ReportCardTopWikis.htm#lang_ro

Thanks for the info!
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-28 Thread Steven Walling
On Wed, Nov 27, 2013 at 6:50 AM, 梁忠明  wrote:

> In Chinese Wikipedia, there're no such bots to create lots of biological
> articles - Ranyv, one of my colleagues there, said that Chinese is not a
> language adopting the Roman alphabet, thus it is difficult to generate
> translation names for those species - unlike Vietnamese and others, they
> can use the binomial name as the article name. But there are some users use
> AWB to create articles for places, Taipei Metro stations, etc.
>
> As I know, in Vietnamese Wikipedia, such situation happens. I've heard that
> there is a Vietnamese robot user (Cheers!-bot) create lots of articles
> about biological species, making its article count surpassing the Chinese
> (and Portuguese?) one, over 750,000 articles. That sparked a small-scale
> debate among the Chinese Wikipedian community.
>

This was very helpful, thank you!

Can you give me an example of AWB-created articles?

Steven
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-28 Thread Steven Walling
On Thu, Nov 28, 2013 at 4:56 AM, Gryllida  wrote:

> Hopefully your research does not conclude this is a good idea; I had been
> contacted to create such bots multiple times in the past. I had declined
> such queries, as the need in automating this means inefficiency in manual
> content creation. Such inefficiency should be addressed directly instead,
> in the wiki software.


No, we're not exploring doing bot article creation ourselves. I'm simply
trying to understand differences between projects when it comes to article
creation. There are some smaller communities that have relatively large
levels of daily/monthly article creation, and I want to identify which ones
are running bots.

[Using my personal email to follow up, since it's what I usually use on
mailing lists.]

Steven
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-28 Thread Delirium

On 11/27/13 2:01 PM, Fæ wrote:

As well as finding out where this has happened, it would be good to
have some cases of where "bots went bad" explained. My main concern
would be leaving a bot to create thousands of articles but in the
process creating a headache for limited numbers of maintainers, such
as article copy-editors, categorizers, illustrators, inter-linkers or
gnomic contributors.



One example I recently ran across, while using the georeference data 
from Wikipedia World, is that bot-imports of villages on the Hindi 
Wikipedia appear to be creating *thousands* of articles with identical 
coordinates. There are about 1300 articles georeferenced to the 
coordinates (25.611, 85.144), for example. I'm not sure if this is an 
error (default value left in a template?), or has some other 
explanation. I could imagine it also being a deliberate imprecision, for 
example using the coordinate for the center of a district for villages 
where the precise coordinate of the village itself isn't known. In any 
case, it produces a bit of a mess; these could all be fixed up pretty 
easily by volunteers checking on OpenStreetMap and the like, but nobody 
has done so, because there are so many of these stubs.


This particular example: 
https://www.google.com/search?q=site:hi.wikipedia.org+25.611,+85.144


-Mark


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-28 Thread Gryllida
On Tue, 26 Nov 2013, at 8:50, Steven Walling wrote:
> Hi all,
> 
> My team is doing some background research in to Wikipedia article creation
> right now.[1] One question I'd like answer is which Wikipedias are
> currently (i.e. this year) running bots to create many articles.

Hopefully your research does not conclude this is a good idea; I had been 
contacted to create such bots multiple times in the past. I had declined such 
queries, as the need in automating this means inefficiency in manual content 
creation. Such inefficiency should be addressed directly instead, in the wiki 
software.

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-27 Thread Anders Wennersten

Fæ skrev 2013-11-27 15:24

Small error rates are a real challenge. My experience on Commons for
large bot work has been long discussions around quality complaints
where the level of error was *well below 1%*.


Very interesting you also mention this level of problematic articles. We 
has found this magnitude in two bot project and our Wikidata project, 
all because of some problem with source data. We find they can not be 
found during test rounds, as we do not know what type of problems will 
turn up (and the articles generated are correct in relation with the 
source), and it can often take months for the community to spot them 
(like an erroneous name for a river leaving a lake).


When we entered articles manually 0,5% error level would be more then 
excellent, but when semiautomated 0,5% can by be seen by the community  
as problematic (but hardly by the Wikipedia readers).


And to lower the problematic ones semimanually from 0.5 to 0,1 can 
sometime take as much time as the total generation effort


We at sv:wp still have to come to terms with this, but I expect w at the 
end must live with this level of problematic articles (not necesary 
erronous) , and solve them case by case. We are a relative small 
community (1/40 of en:wp) and have to be pragmatic if we want to give 
our readers a lot of valuable information


Anders



___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-27 Thread
On 27 November 2013 13:43, Anders Wennersten  wrote:
...
> And even if this only is relevant for far less then 1% of all generated
> articles it becomes around hundred in total. Many of these cases are quite
> complicated to fix (area of lakes, depths) and there is a debate who should
> fix these, the botowner (who has generated correctly from sources) or
> community people (who have problems finds relevant basedata), or should
> these be deleted or rewritten from scratch?

Small error rates are a real challenge. My experience on Commons for
large bot work has been long discussions around quality complaints
where the level of error was *well below 1%*. The default stance on
the English Wikipedia and Commons is that if you make the mess, then
you need to clean it up.

On the whole, I don't think this is a bad policy, it does however make
bot jobs like this a puzzle to get right, in the case of my Geograph
categorization work, we managed to reduce error levels from below a
known 0.5% to less than a vanishingly small 0.15% (the numbers being
so small it became hard to measure or estimate, so this number is
conservatively pessimistic).

Giving the community easy ways of reporting failures and seeing them
get quickly corrected is a good option. Even better is to run large
uploads as a project team where the three elements of content experts,
bot experts and enthusiastic volunteer editors/re-users are all
represented. It may take longer, but the bot writer is far more likely
to get praised for good work and any occasional problem just absorbed
as part of the project rather than put on the bot-writer's shoulders.

This is how I structured the Airliners project

there are plenty of other interesting examples on the batch upload
project page.

Fae
-- 
fae...@gmail.com http://j.mp/faewm

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-27 Thread Anders Wennersten
On sv:wp our main headaches has not been technical problems in bots but 
inconsistencies (errors) in sources.


For our lakes the hydrological authorities has in some cases called a 
group of nearby and/or conneced lakes with a plural name, like 
"Pikelakes", while the mapauthorities call them different names like 
East Pikelake West Pikelake. And the names in the hydrological 
authorities database has set there for almost 100 years without anyone 
scrutinizing them, like he sv:wp community now does with all names and 
finding errors.


And even if this only is relevant for far less then 1% of all generated 
articles it becomes around hundred in total. Many of these cases are 
quite complicated to fix (area of lakes, depths) and there is a debate 
who should fix these, the botowner (who has generated correctly from 
sources) or community people (who have problems finds relevant 
basedata), or should these be deleted or rewritten from scratch?


Similar issues with articles generated by Lsjot, but here it is easier 
to fix manually/semiautomatic.


The generation of 3 lakes has otherwise been a huge success.
*We have gone from having around 500 articles to a 60 fold increase. We 
have correct figure for things like length of circumsphere, depth, 
volume Ph-level, which is extremly hard to find manually, not to mention 
the exterm challage that there are examples of  some 400 lakes called 
witht the same name, very hard to handle manually (which data goes to 
which article)
*activities has gone up steeply related to lakes, to get photos to them, 
to edit things like nearby town, small island and even more to include 
corre links from all references to lake in other geogrphic articles
*we also believe we have attracted new editors, who are happy enter data 
of local bathing places etc, but do not have had the comeptence to enter 
a lakearticle from scratch (which IS complicated)




Anders




Fæ skrev 2013-11-27 14:01:

As well as finding out where this has happened, it would be good to
have some cases of where "bots went bad" explained. My main concern
would be leaving a bot to create thousands of articles but in the
process creating a headache for limited numbers of maintainers, such
as article copy-editors, categorizers, illustrators, inter-linkers or
gnomic contributors.

For bot-writers like myself, there may be some common patterns to
learn from of which projects this sort of mass creation might be
useful and how to assess a project of this type for potential value
(we might want to fund some volunteer proposed projects along these
lines if they can be measured as effective and with valuable
outcomes).

Fae



___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-27 Thread
As well as finding out where this has happened, it would be good to
have some cases of where "bots went bad" explained. My main concern
would be leaving a bot to create thousands of articles but in the
process creating a headache for limited numbers of maintainers, such
as article copy-editors, categorizers, illustrators, inter-linkers or
gnomic contributors.

For bot-writers like myself, there may be some common patterns to
learn from of which projects this sort of mass creation might be
useful and how to assess a project of this type for potential value
(we might want to fund some volunteer proposed projects along these
lines if they can be measured as effective and with valuable
outcomes).

Fae
-- 
fae...@gmail.com http://j.mp/faewm

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-27 Thread 梁忠明
Hi all,

In Chinese Wikipedia, there're no such bots to create lots of biological
articles - Ranyv, one of my colleagues there, said that Chinese is not a
language adopting the Roman alphabet, thus it is difficult to generate
translation names for those species - unlike Vietnamese and others, they
can use the binomial name as the article name. But there are some users use
AWB to create articles for places, Taipei Metro stations, etc.

As I know, in Vietnamese Wikipedia, such situation happens. I've heard that
there is a Vietnamese robot user (Cheers!-bot) create lots of articles
about biological species, making its article count surpassing the Chinese
(and Portuguese?) one, over 750,000 articles. That sparked a small-scale
debate among the Chinese Wikipedian community.

Cheers,
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-26 Thread Strainu
2013/11/25 Steven Walling :
> Hi all,
>
> My team is doing some background research in to Wikipedia article creation
> right now.[1] One question I'd like answer is which Wikipedias are
> currently (i.e. this year) running bots to create many articles.

Hi Steven,

What qualifies as "many"? On ro.wp, Andrebot is creating sections of
articles about the population of villages/communes in Bulgaria,
Hungary and Croatia, also creating articles where they do not exist.
That will probably amount to a few hundred articles by the end of the
year.

Strainu

>
> I know that Lsjbot has run (or is running) on Swedish (sv), Cebuano (ceb),
> and Waray-Waray (war). It seems to me that, by looking at the stats for new
> articles per day,[2] Dutch (nl) and Vietnamese (vi) Wikipedias might have
> also been running bots? Am I wrong?
>
> I'll be posting more about our article creation research work soon. We'll
> need feedback from non-English Wikipedians in particular, since as a team
> we only have extensive experience creating articles on enwiki.
>
> Many thanks,
>
> --
> Steven Walling,
> Product Manager
> https://wikimediafoundation.org/
>
> 1. https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
> 2. http://stats.wikimedia.org/EN/TablesArticlesNewPerDay.htm
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-26 Thread Craig Franklin
On Irish language Wikipedia, we have had a bot which is creating articles
based on the text of "Fréamh an Eolais", a freely licenced scientific
encyclopaedia.

https://ga.wikipedia.org/wiki/Speisialta:Contributions/HusseyBot

ga.wp is not quite large enough to be included in the automated reporting,
unfortunately.

Cheers,
Craig Franklin


On 26 November 2013 07:50, Steven Walling  wrote:

> Hi all,
>
> My team is doing some background research in to Wikipedia article creation
> right now.[1] One question I'd like answer is which Wikipedias are
> currently (i.e. this year) running bots to create many articles.
>
> I know that Lsjbot has run (or is running) on Swedish (sv), Cebuano (ceb),
> and Waray-Waray (war). It seems to me that, by looking at the stats for new
> articles per day,[2] Dutch (nl) and Vietnamese (vi) Wikipedias might have
> also been running bots? Am I wrong?
>
> I'll be posting more about our article creation research work soon. We'll
> need feedback from non-English Wikipedians in particular, since as a team
> we only have extensive experience creating articles on enwiki.
>
> Many thanks,
>
> --
> Steven Walling,
> Product Manager
> https://wikimediafoundation.org/
>
> 1. https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
> 2. http://stats.wikimedia.org/EN/TablesArticlesNewPerDay.htm
> ___
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-25 Thread Federico Leva (Nemo)

Matthew Flaschen, 26/11/2013 00:21:

On 11/25/2013 05:00 PM, Erik Zachte wrote:

For all time totals per bot there is
http://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm


For the purposes of that chart, how are you defining bots?


For the purposes of all charts, please follow links. ;)
https://www.mediawiki.org/wiki/Analytics/Metric_definitions#Bot

Nemo

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-25 Thread Josh Lim
The Tagalog Wikipedia did not use bots, but we had editors doing it manually.  
Users like Wikiboost, Booster Gold, etc. added hundreds, if not thousands, of 
articles in this manner, and that is not reflected in the statistics.

Luckily though, this has largely stopped due to community opposition.

Josh
 
JAMES JOSHUA G. LIM
Block I1, AB Political Science
Major in Global Politics, Minor in Chinese Studies
Class of 2013, Ateneo de Manila University
Quezon City, Metro Manila, Philippines

Secretary (2013-2014), Wikimedia Philippines
Member, Ateneo Debate Society
Member, The Assembly
Member, Ateneo Lingua Ars Cultura

jamesjoshua...@yahoo.com | +63 (917) 841-5235
Facebook/Twitter: akiestar | Wikimedia: Sky Harbor
http://akira123323.livejournal.com




On Tuesday, November 26, 2013 8:10 AM, John Vandenberg  wrote:
 
On Tue, Nov 26, 2013 at 4:50 AM, Steven Walling  wrote:
> Hi all,
>
> My team is doing some background research in to Wikipedia article creation
> right now.[1] One question I'd like answer is which Wikipedias are
> currently (i.e. this year) running bots to create many articles.
>
> I know that Lsjbot has run (or is running) on Swedish (sv), Cebuano (ceb),
> and Waray-Waray (war). It seems to me that, by looking at the stats for new
> articles per day,[2] Dutch (nl) and Vietnamese (vi) Wikipedias might have
> also been running bots? Am I wrong?

Hi Steven

Indonesia language Minangkabau Wikipedia has also been using bots.
The project was started early 2013, and now has 220,800 articles.
Unfortunately this project, and other new projects, are not being
included in Erik Zachte's reports.

http://min.wikipedia.org/wiki/Special:Statistics
https://meta.wikimedia.org/wiki/Wikipedias_by_size

The same team are using the same bots to add content to Indonesian
Wikipedia. 100,000 new articles created in October.

http://stats.wikimedia.org/EN/ChartsWikipediaID.htm

-- 
John Vandenberg


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-25 Thread John Vandenberg
On Tue, Nov 26, 2013 at 4:50 AM, Steven Walling  wrote:
> Hi all,
>
> My team is doing some background research in to Wikipedia article creation
> right now.[1] One question I'd like answer is which Wikipedias are
> currently (i.e. this year) running bots to create many articles.
>
> I know that Lsjbot has run (or is running) on Swedish (sv), Cebuano (ceb),
> and Waray-Waray (war). It seems to me that, by looking at the stats for new
> articles per day,[2] Dutch (nl) and Vietnamese (vi) Wikipedias might have
> also been running bots? Am I wrong?

Hi Steven

Indonesia language Minangkabau Wikipedia has also been using bots.
The project was started early 2013, and now has 220,800 articles.
Unfortunately this project, and other new projects, are not being
included in Erik Zachte's reports.

http://min.wikipedia.org/wiki/Special:Statistics
https://meta.wikimedia.org/wiki/Wikipedias_by_size

The same team are using the same bots to add content to Indonesian
Wikipedia. 100,000 new articles created in October.

http://stats.wikimedia.org/EN/ChartsWikipediaID.htm

-- 
John Vandenberg

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-25 Thread Matthew Flaschen

On 11/25/2013 05:00 PM, Erik Zachte wrote:

For all time totals per bot there is
http://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm


For the purposes of that chart, how are you defining bots?

Thanks,

Matt Flaschen


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-25 Thread Erik Zachte
http://stats.wikimedia.org/EN/ReportCardTopWikis.htm

blue line on second chart per wiki

For all time totals per bot there is
http://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm

Erik

-Original Message-
From: wikimedia-l-boun...@lists.wikimedia.org 
[mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of Steven Walling
Sent: Monday, November 25, 2013 22:50
To: Wikimedia Mailing List
Subject: [Wikimedia-l] Which Wikipedias have had large scale bot creation of 
articles this year?

Hi all,

My team is doing some background research in to Wikipedia article creation 
right now.[1] One question I'd like answer is which Wikipedias are currently 
(i.e. this year) running bots to create many articles.

I know that Lsjbot has run (or is running) on Swedish (sv), Cebuano (ceb), and 
Waray-Waray (war). It seems to me that, by looking at the stats for new 
articles per day,[2] Dutch (nl) and Vietnamese (vi) Wikipedias might have also 
been running bots? Am I wrong?

I'll be posting more about our article creation research work soon. We'll need 
feedback from non-English Wikipedians in particular, since as a team we only 
have extensive experience creating articles on enwiki.

Many thanks,

--
Steven Walling,
Product Manager
https://wikimediafoundation.org/

1. https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
2. http://stats.wikimedia.org/EN/TablesArticlesNewPerDay.htm
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Which Wikipedias have had large scale bot creation of articles this year?

2013-11-25 Thread Steven Walling
Hi all,

My team is doing some background research in to Wikipedia article creation
right now.[1] One question I'd like answer is which Wikipedias are
currently (i.e. this year) running bots to create many articles.

I know that Lsjbot has run (or is running) on Swedish (sv), Cebuano (ceb),
and Waray-Waray (war). It seems to me that, by looking at the stats for new
articles per day,[2] Dutch (nl) and Vietnamese (vi) Wikipedias might have
also been running bots? Am I wrong?

I'll be posting more about our article creation research work soon. We'll
need feedback from non-English Wikipedians in particular, since as a team
we only have extensive experience creating articles on enwiki.

Many thanks,

-- 
Steven Walling,
Product Manager
https://wikimediafoundation.org/

1. https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
2. http://stats.wikimedia.org/EN/TablesArticlesNewPerDay.htm
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,