Re: [Wikimedia-l] Vietnamese wp above 1 M articles and growing

2014-06-29 Thread Minh Nguyen

On 2014-06-24 04:52, Tanweer Morshed wrote:

That's a great news that the Vietnamese Wikipedia has crossed 1M articles.
What are the significant reasons behind Vietnamese Wikipedia's such growth?
Is it just the usage of such clever Bots (that you have mentioned) or
contribution by the Vietnamese Wikipedians? And actually how does the
Cheer!-bot generate articles? Does it translate articles from English (or
other) Wikipedia? And apart from translating, can it set and maintain
correctly other aspects of Wikisyntax and coding?


A great many of the Vietnamese Wikipedia's recent articles have been 
created automatically using bots, manually with word processors and mail 
merge, or semi-automatically with machine translators like (presumably) 
Google Translator Toolkit. Nonetheless, Cheers!-bot held a moratorium on 
new articles around the million-article mark, so that day was all about 
writing articles the old fashioned way.


Predictably, our bot articles are more infobox than prose. On the other 
hand, they do have correct grammar and wiki syntax, which cannot be said 
for most machine-translated articles, comprehensive as they may be. 
Cheers! is one of our most experienced editors and has done an admirable 
job correcting errors, whereas some machine translator users have 
uploaded incomprehensible articles anonymously, giving us no opportunity 
to engage and educate.


I can't say for certain how Cheers!-bot generates species stubs, but its 
earlier U.S. geographic stubs were translated from the Spanish 
Wikipedia's own bot-created stubs. I'm in the process of cleaning them 
up, translating the occasional Spanish place name to Vietnamese. We're 
also integrating our [[vi:Template:Infobox settlement]] with Wikidata, 
to provide more current information with minimal maintenance. For 
example, see the infobox at [[vi:Loveland, Ohio]], which passes only 
three parameters but provides 18 rows of information.


The surge in bot-created stubs has alarmed some members of the 
Vietnamese Wikipedia community. One frequent theme in our village pump 
is that our depth at [[m:List of Wikipedias]] has fallen from over a 
hundred (one of the highest) to just 15 (one of the lowest) in a few 
years. Even taking the depth metric with a grain of salt, I think this 
observation has led us to a newfound appreciation for edits, 
non-articles, and maybe even authentic, hand-made articles.


More importantly, the million-article milestone has shed a light on our 
seemingly low number of active editors. Some have expressed concern that 
the steadily rising article count has disincentivized readers from 
creating own articles on their own. So we're discussing some changes to 
our main page and messages to better engage potential contributors. 
We've also integrated tightly with VisualEditor -- the sandbox, no such 
article message, and no search results message all send users to 
VisualEditor by default -- hopefully lowering barriers to entry.


None of the Vietnamese Wikipedia's bot operators are interested in 
inflating our article count for the sake of. We care deeply about the 
future of our wiki and the health of its community, and we welcome 
feedback from the community at large.


--
Minh Nguyen
Administrator [[vi:User:Mxn]] [[m:User:Mxn]]


___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Vietnamese wp above 1 M articles and growing

2014-06-26 Thread Anders Wennersten

The Vietnamese Wikipedia creates 30-40 articles/day manually (with as many 
active contributers), which would put them on par with several medium/smaller 
projects, usually having around 200-400 K articles

But among their limited number of active contributers they have a set of very 
clever people creating and running bots. When I analyze the bot results I find 
they are using first class input (The Plant List and list of cities from like 
Turkey governement). I also see they are understanding the intricacies on how 
to manage the generated articles, them using (in some cases) templates and the 
discussion page. Also of course the code in the bots must be very good, 
managing these diverse set of articles. And no translating.

For me it confirms my belief that it is possible for most versions to set up 
and run clever bots for massgeneration of articles with 100% quality and using 
the best sources

I hope this experience can serve as an inspiration to others medium/smaller 
projects

My learnings from looking into several botgeneration effort is that there are 
three aspects you need to master
1.The infrastructure of generated articles - special templates, Categories, 
discussion pages, the speed they are set up (in order for reviews) how to 
handle already existing articles. I know this is learned on svwp itwp and nlwp. 
Basically it is common sense, but for any new the knowledge exist to learn from
2.The code of the Bots. it is complicated but no rocket science. The three I 
have looked into in detail where all written by persons not being professional 
programmers and using different programming languages (c, AWB and something resembling 
Basic). You could probably get access to some existing botcode, but in general I would 
expect most communities to be able to find someone who can create these type of 
botsoftware
3.The inputs to generate data from. This I have found to be the most 
challenging aspect, both what lists to use, how to translate these into article 
texts and how to handle ambiguities/errors. And here I do would recommend to 
take in experience from people already done this. Official lists of geographic 
entities exist and are used by several projects (it, nl, vi etc) but why not 
using the same sets and why not involving wikidata in these? For species the 
already exist several good inputs Lsjbot use Catalogue of Life, but others (nl, 
vi) use others.

And while I have no direct contact with the people at viwp, I would welcome if 
any one made contact and made their bot generation knowledge available to 
others.

Anders
 
 






Tanweer Morshed skrev 2014-06-24 13:52:

That's a great news that the Vietnamese Wikipedia has crossed 1M articles.
What are the significant reasons behind Vietnamese Wikipedia's such growth?
Is it just the usage of such clever Bots (that you have mentioned) or
contribution by the Vietnamese Wikipedians? And actually how does the
Cheer!-bot generate articles? Does it translate articles from English (or
other) Wikipedia? And apart from translating, can it set and maintain
correctly other aspects of Wikisyntax and coding?

Tanweer Morshed
Board member
Wikimedia Bangladesh


On Tue, Jun 24, 2014 at 11:38 AM, Anders Wennersten 
m...@anderswennersten.se wrote:


One of our most interesting projects, Vietnamese Wikipedia has now passed
1 M articles and has a growth just now  of almost 100k/month

They use a clever bot named Cheer!-bot to generate a lot of very good
articles. In some ways it is stronger then Lsjbot (covering more then
spececies) but I do prefer that Lsjbot marks the generated articles with a
template indicating they are botgenerated

start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh

Cheer-bot! generated articles (just now working on species like Lsjbot)
https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:
%C4%90%C3%B3ng_g%C3%B3p/Cheers!-bot

Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm
notice active generating around one year from now

As I said a lot of times, I believe it is a weakness we are not making use
of the many excellent inititves taking place on less well known verisons
(like the lithuanian I mentioned some time ago). I am not even sure there
are any from viwp acrtive on this list.

Also I  recommend you to look through the content of viwp by using the
  use the Random article feature Bài vie^'t nga^~u nhiên 
https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%
BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn

Anders


___
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe







___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines

Re: [Wikimedia-l] Vietnamese wp above 1 M articles and growing

2014-06-24 Thread Tanweer Morshed
That's a great news that the Vietnamese Wikipedia has crossed 1M articles.
What are the significant reasons behind Vietnamese Wikipedia's such growth?
Is it just the usage of such clever Bots (that you have mentioned) or
contribution by the Vietnamese Wikipedians? And actually how does the
Cheer!-bot generate articles? Does it translate articles from English (or
other) Wikipedia? And apart from translating, can it set and maintain
correctly other aspects of Wikisyntax and coding?

Tanweer Morshed
Board member
Wikimedia Bangladesh


On Tue, Jun 24, 2014 at 11:38 AM, Anders Wennersten 
m...@anderswennersten.se wrote:

 One of our most interesting projects, Vietnamese Wikipedia has now passed
 1 M articles and has a growth just now  of almost 100k/month

 They use a clever bot named Cheer!-bot to generate a lot of very good
 articles. In some ways it is stronger then Lsjbot (covering more then
 spececies) but I do prefer that Lsjbot marks the generated articles with a
 template indicating they are botgenerated

 start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh

 Cheer-bot! generated articles (just now working on species like Lsjbot)
 https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:
 %C4%90%C3%B3ng_g%C3%B3p/Cheers!-bot

 Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm
 notice active generating around one year from now

 As I said a lot of times, I believe it is a weakness we are not making use
 of the many excellent inititves taking place on less well known verisons
 (like the lithuanian I mentioned some time ago). I am not even sure there
 are any from viwp acrtive on this list.

 Also I  recommend you to look through the content of viwp by using the
  use the Random article feature Bài vie^'t nga^~u nhiên 
 https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%
 BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn

 Anders


 ___
 Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
 wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe




-- 
Regards -
Tanweer Morshed
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe