The Vietnamese Wikipedia creates 30-40 articles/day manually (with as many 
active contributers), which would put them on par with several medium/smaller 
projects, usually having around 200-400 K articles

But among their limited number of active contributers they have a set of very 
clever people creating and running bots. When I analyze the bot results I find 
they are using first class input (The Plant List and list of cities from like 
Turkey governement). I also see they are understanding the intricacies on how 
to manage the generated articles, them using (in some cases) templates and the 
discussion page. Also of course the code in the bots must be very good, 
managing these diverse set of articles. And no translating.

For me it confirms my belief that it is possible for most versions to set up 
and run clever bots for massgeneration of articles with 100% quality and using 
the best sources

I hope this experience can serve as an inspiration to others medium/smaller 
projects

My learnings from looking into several botgeneration effort is that there are 
three aspects you need to master
1.The infrastructure of generated articles - special templates, Categories, 
discussion pages, the speed they are set up (in order for reviews) how to 
handle already existing articles. I know this is learned on svwp itwp and nlwp. 
Basically it is common sense, but for any new the knowledge exist to learn from
2.The code of the Bots. it is complicated but no "rocket science". The three I 
have looked into in detail where all written by persons not being professional 
programmers and using different programming languages (c, AWB and something resembling 
Basic). You could probably get access to some existing botcode, but in general I would 
expect most communities to be able to find someone who can create these type of 
botsoftware
3.The inputs to generate data from. This I have found to be the most 
challenging aspect, both what lists to use, how to translate these into article 
texts and how to handle ambiguities/errors. And here I do would recommend to 
take in experience from people already done this. Official lists of geographic 
entities exist and are used by several projects (it, nl, vi etc) but why not 
using the same sets and why not involving wikidata in these? For species the 
already exist several good inputs Lsjbot use Catalogue of Life, but others (nl, 
vi) use others.

And while I have no direct contact with the people at viwp, I would welcome if 
any one made contact and made their bot generation knowledge available to 
others.

Anders




Tanweer Morshed skrev 2014-06-24 13:52:
That's a great news that the Vietnamese Wikipedia has crossed 1M articles.
What are the significant reasons behind Vietnamese Wikipedia's such growth?
Is it just the usage of such clever Bots (that you have mentioned) or
contribution by the Vietnamese Wikipedians? And actually how does the
Cheer!-bot generate articles? Does it translate articles from English (or
other) Wikipedia? And apart from translating, can it set and maintain
correctly other aspects of Wikisyntax and coding?

Tanweer Morshed
Board member
Wikimedia Bangladesh


On Tue, Jun 24, 2014 at 11:38 AM, Anders Wennersten <
m...@anderswennersten.se> wrote:

One of our most interesting projects, Vietnamese Wikipedia has now passed
1 M articles and has a growth just now  of almost 100k/month

They use a clever bot named Cheer!-bot to generate a lot of very good
articles. In some ways it is stronger then Lsjbot (covering more then
spececies) but I do prefer that Lsjbot marks the generated articles with a
template indicating they are botgenerated

start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh

Cheer-bot! generated articles (just now working on species like Lsjbot)
https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:
%C4%90%C3%B3ng_g%C3%B3p/Cheers!-bot

Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm
notice active generating around one year from now

As I said a lot of times, I believe it is a weakness we are not making use
of the many excellent inititves taking place on less well known verisons
(like the lithuanian I mentioned some time ago). I am not even sure there
are any from viwp acrtive on this list.

Also I  recommend you to look through the content of viwp by using the
  use the Random article feature Bài vie^'t nga^~u nhiên <
https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%
BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn>

Anders


_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>





_______________________________________________
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Reply via email to