The Vietnamese Wikipedia creates 30-40 articles/day manually (with as many
active contributers), which would put them on par with several medium/smaller
projects, usually having around 200-400 K articles
But among their limited number of active contributers they have a set of very
clever people creating and running bots. When I analyze the bot results I find
they are using first class input (The Plant List and list of cities from like
Turkey governement). I also see they are understanding the intricacies on how
to manage the generated articles, them using (in some cases) templates and the
discussion page. Also of course the code in the bots must be very good,
managing these diverse set of articles. And no translating.
For me it confirms my belief that it is possible for most versions to set up
and run clever bots for massgeneration of articles with 100% quality and using
the best sources
I hope this experience can serve as an inspiration to others medium/smaller
projects
My learnings from looking into several botgeneration effort is that there are
three aspects you need to master
1.The infrastructure of generated articles - special templates, Categories,
discussion pages, the speed they are set up (in order for reviews) how to
handle already existing articles. I know this is learned on svwp itwp and nlwp.
Basically it is common sense, but for any new the knowledge exist to learn from
2.The code of the Bots. it is complicated but no "rocket science". The three I
have looked into in detail where all written by persons not being professional
programmers and using different programming languages (c, AWB and something resembling
Basic). You could probably get access to some existing botcode, but in general I would
expect most communities to be able to find someone who can create these type of
botsoftware
3.The inputs to generate data from. This I have found to be the most
challenging aspect, both what lists to use, how to translate these into article
texts and how to handle ambiguities/errors. And here I do would recommend to
take in experience from people already done this. Official lists of geographic
entities exist and are used by several projects (it, nl, vi etc) but why not
using the same sets and why not involving wikidata in these? For species the
already exist several good inputs Lsjbot use Catalogue of Life, but others (nl,
vi) use others.
And while I have no direct contact with the people at viwp, I would welcome if
any one made contact and made their bot generation knowledge available to
others.
Anders
Tanweer Morshed skrev 2014-06-24 13:52:
That's a great news that the Vietnamese Wikipedia has crossed 1M articles.
What are the significant reasons behind Vietnamese Wikipedia's such growth?
Is it just the usage of such clever Bots (that you have mentioned) or
contribution by the Vietnamese Wikipedians? And actually how does the
Cheer!-bot generate articles? Does it translate articles from English (or
other) Wikipedia? And apart from translating, can it set and maintain
correctly other aspects of Wikisyntax and coding?
Tanweer Morshed
Board member
Wikimedia Bangladesh
On Tue, Jun 24, 2014 at 11:38 AM, Anders Wennersten <
[email protected]> wrote:
One of our most interesting projects, Vietnamese Wikipedia has now passed
1 M articles and has a growth just now of almost 100k/month
They use a clever bot named Cheer!-bot to generate a lot of very good
articles. In some ways it is stronger then Lsjbot (covering more then
spececies) but I do prefer that Lsjbot marks the generated articles with a
template indicating they are botgenerated
start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh
Cheer-bot! generated articles (just now working on species like Lsjbot)
https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:
%C4%90%C3%B3ng_g%C3%B3p/Cheers!-bot
Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm
notice active generating around one year from now
As I said a lot of times, I believe it is a weakness we are not making use
of the many excellent inititves taking place on less well known verisons
(like the lithuanian I mentioned some time ago). I am not even sure there
are any from viwp acrtive on this list.
Also I recommend you to look through the content of viwp by using the
use the Random article feature Bài vie^'t nga^~u nhiên <
https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%
BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn>
Anders
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines
[email protected]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:[email protected]?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
[email protected]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:[email protected]?subject=unsubscribe>