Thanks a bunch to everyone who chimed in here. These hints brought us
forward quite a bit!
On Wed, Sep 21, 2016 at 12:50 AM, Giuseppe Profiti
> [forwarding my answer from analytics ml, I forgot to subscribe to this list
> Hi Robert,
> one solution may be to use a query on Wikidata to retrieve the name
> for the stubs category in all the different languages. Then you could
> use a tool like PetScan to retrive all the pages in such categories,
> or write your own tool by using either a query on the database or
> Mediawiki API.
> You can find a sample solution here:
> I wrote that thing while on a train, so it may be messy and/or sub-optimal.
> I would like to thank Alex Monk and Yuvi Panda for their help with SQL
> on paws today.
> 2016-09-20 11:26 GMT+02:00 Robert West <w...@cs.stanford.edu>:
>> Hi everyone,
>> Does anyone know if there's a straightforward (ideally language-independent)
>> way of identifying stub articles in Wikipedia?
>> Whatever works is ok, whether it's publicly available data or data
>> accessible only on the WMF cluster.
>> I've found lists for various languages (e.g., Italian or English), but the
>> lists are in different formats, so separate code is required for each
>> language, which doesn't scale.
>> I guess in the worst case, I'll have to grep for the respective stub
>> templates in the respective wikitext dumps, but even this requires to know
>> for each language what the respective template is. So if anyone could point
>> me to a list of stub templates in different languages, that would also be
>> Up for a little language game? -- http://www.unfun.me
>> Analytics mailing list
> Wiki-research-l mailing list
Up for a little language game? -- http://www.unfun.me
Wiki-research-l mailing list